SlideShare a Scribd company logo
1 of 24
Download to read offline
生成 AI の基礎と
TaskMatrix (Visual ChatGPT)
自己紹介
松崎 剛
日本マイクロソフト株式会社
パートナー事業本部
クラウドソリューションアーキテクト
ブログ
https://tsmatz.wordpress.com/
GitHub
https://github.com/tsmatz
NLP (natural language processing) Tutorials
https://github.com/tsmatz/nlp-tutorials
OpenAI の基礎
GPT
テキストの生成と理解
Embedding
テキストプロンプトから画像を生成
Codex
コードの生成と理解
DALL·E
テキストの生成と理解
言語モデルの発展
出典 “A Survey of Large Language Models” (Wayne et. al., 2023) https://arxiv.org/abs/2303.18223
LLM の Emergent Abilities
Emergency = 突然出
現するもの
言語モデルが、ある
パラメーター数を超
えると突然持ち始め
る予測不能な現象
出典 : “Emergent Abilities of Large Language Models” (Jason, et. al., 2022)
https://arxiv.org/abs/2206.07682
考え : 会社 C の請求金額を確認する必要がある
行動 : GetInvoice[C]
考え : 会社 F の請求金額を確認する必要がある
行動 : GetInvoice[F]
考え : C と F の合計を計算する必要がある
行動 : Total[20000, 4100]
考え : 会社 A の請求金額を確認する必要がある
行動 : GetInvoice[A]
考え : 会社 E の請求金額を確認する必要がある
行動 : GetInvoice[E]
請求 DB から C の請求金額を検索:
20000
請求 DB から F の請求金額を検索:
4100
計算機 20000 + 4100 = 24100
請求 DB から A の請求金額を検索:
2000
請求 DB から E の請求金額を検索:
1000
言語モデル (OpenAI GPT-3) 外部ツール
考え : A と E の合計を計算する必要がある
行動 : Total[2000, 1000] 計算機 2000 + 1000 = 3000
考え : 会社 C, F の請求金額の合計と会社 A, E の請求
金額の合計との間の差額を計算する必要がある
行動 : Diff[24100 3000]
計算機 | 24100 – 3000 | = 21100
考え : よって、答えは 21100
行動 : Finish[21100]
OpenAI GPT が、思考を担当
(つぎにすべきことを提案)
= Reasoning
外部ツール (関数部分) が
実作業を担当
= Acting
• 下記ツールを使用
• Search
• Lookup
• 回答精度の大幅な向上 (ベンチマーク
結果)
• より高度な方式では、強化学習・模倣
学習などとも連携
出典 “REACT: SYNERGIZING REASONING AND ACTING IN LANGUAGE
MODELS” (Shunyu et. al., 2022)
ReAct / MRKL Systems
Prior knowledge vs Label mapping
(Google がおこなった ICL の実験)
unrelated
labels
flipped
labels
IF あり 1 2
IF なし 3 4
unrelated
labels
flipped
labels
IF あり 1 2
IF なし 3 4
Prior knowledge vs Label mapping
(Google がおこなった ICL の実験)
出典 : “Larger language models do in-context learning differently” (Jerry, et. al., 2023)
https://arxiv.org/abs/2303.03846
Prior knowledge vs Label mapping
(Google がおこなった ICL の実験)
unrelated
labels
flipped
labels
IF あり 1 2
IF なし 3 4
出典 : “Larger language models do in-context learning differently” (Jerry, et. al., 2023)
https://arxiv.org/abs/2303.03846
Symbol Tuning
出典 : “Symbol tuning improves in-context learning in language models” (Jerry, et. al., 2023)
https://arxiv.org/abs/2305.08298
LLM の Emergent Abilities
Emergency = 突然出
現するもの
言語モデルが、ある
パラメーター数を超
えると突然持ち始め
る予測不能な現象
出典 : “Emergent Abilities of Large Language Models” (Jason, et. al., 2022)
https://arxiv.org/abs/2206.07682
Multi Modalities への期待
出典 : “ImageBind: One Embedding Space To Bind Them All” (Rohit, et. al., 2023)
https://arxiv.org/abs/2305.05665
TaskMatrix (Visual ChatGPT)
• 画像の処理は、主に
Hugging Face 上の Visual
Foundation Model が担当
• 言語モデル (ChatGPT) を
使って Reasoning
実行例
• Thought: Do I need to use
a tool? Yes
• Action: Replace Something
From The Photo
• Action Input:
image/9bb5e03b.png,
couch, desk
• Observation:
image/5737_replace-
something_9bb5e03b_9bb
5e03b.png
• Thought: Do I need to use
a tool? Yes
• Action: Instruct Image
Using Text
• Action Input:
image/5737_replace-
something_9bb5e03b_9bb
5e03b.png, make it like a
water-color painting
• Observation:
image/770e_pix2pix_5737
_9bb5e03b.png
• Thought: Do I need to use
a tool? No
• AI: Here is the image you
requested.
• ![image/770e_pix2pix_573
7_9bb5e03b.png](image/7
70e_pix2pix_5737_9bb5e0
3b.png)
“replace the sofa in this image with a desk and then make it like
a water-color painting”
Action 1:
Replace Something From The Photo
“couch, desk”
Action 2:
Instruct Image Using Text
“make it like a water-color painting”
Visual ChatGPT is designed to be able to assist with a wide range of text and visual related tasks,
from answering simple questions to providing in-depth explanations and discussions on a wide range of
topics. Visual ChatGPT is able to generate human-like text based on the input it receives, allowing it
to engage in natural-sounding conversations and provide responses that are coherent and relevant to
the topic at hand.
Visual ChatGPT is able to process and understand large amounts of text and images. As a language
model, Visual ChatGPT can not directly read images, but it has a list of tools to finish different
visual tasks. Each image will have a file name formed as "image/xxx.png", and Visual ChatGPT can
invoke different tools to indirectly understand pictures. When talking about images, Visual ChatGPT is
very strict to the file name and will never fabricate nonexistent files. When using tools to generate
new image files, Visual ChatGPT is also known that the image may not be the same as the user's demand,
and will use other visual question answering tools or description tools to observe the real image.
Visual ChatGPT is able to use tools in a sequence, and is loyal to the tool observation outputs rather
than faking the image content and image file name. It will remember to provide the file name from the
last tool observation, if a new image is generated.
Human may provide new figures to Visual ChatGPT with a description. The description helps Visual
ChatGPT to understand this image, but Visual ChatGPT should use tools to finish following tasks,
rather than directly imagine from the description.
Overall, Visual ChatGPT is a powerful visual dialogue assistant tool that can help with a wide range
of tasks and provide valuable insights and information on a wide range of topics.
TOOLS:
------
Visual ChatGPT has access to the following tools:
> Get Photo Description: useful when you want to know what is inside the photo. receives image_path as
input. The input to this tool should be a string, representing the image_path.
> Remove Something From The Photo: useful when you want to remove and object or something from the
photo from its description or location. The input to this tool should be a comma seperated string of
two, representing the image_path and the object need to be removed.
> Replace Something From The Photo: useful when you want to replace an object from the object
description or location with another object from its description. The input to this tool should be a
comma seperated string of three, representing the image_path, the object to be replaced, the object to
be replaced with
> Instruct Image Using Text: useful when you want to the style of the image to be like the text. like:
make it look like a painting. or make it like a robot. The input to this tool should be a comma
seperated string of two, representing the image_path and the text.
. . . . .
To use a tool, please use the following format:
```
Thought: Do I need to use a tool? Yes
Action: the action to take, should be one of [Get Photo Description, Remove Something From The Photo,
Replace Something From The Photo, Instruct Image Using Text]
Action Input: the input to the action
Observation: the result of the action
```
When you have a response to say to the Human, or if you do not need to use a tool, you MUST use the
format:
```
Thought: Do I need to use a tool? No
AI: [your response here]
```
You are very strict to the filename correctness and will never fake a file name if it does not exist.
You will remember to provide the image file name loyally if it's provided in the last tool
observation.
Begin!
Previous conversation history:
Human: provide a figure named image/9bb5e03b.png. The description is: a living room with a couch and a
couch in the corner. This information helps you to understand this image, but you should use tools to
finish following tasks, rather than directly imagine from my description. If you understand, say
"Received".
AI: Received.
New input: replace the sofa in this image with a desk and then make it like a water-color painting
Since Visual ChatGPT is a text language model, Visual ChatGPT must use tools to observe images rather
than imagination.
The thoughts and observations are only visible for Visual ChatGPT, Visual ChatGPT should remember to
repeat important information in the final response for Human.
Thought: Do I need to use a tool? Yes
Action: Replace Something From The Photo
Action Input: image/9bb5e03b.png, couch, desk
could you
generate a cat
for me ?
could you
replace a cat to
a dog and then
remove the
book ?
could you
generate a
canny edge of
this image ?
generate a yellow
dog based on
......png
Visual Foundation Models
Get Photo Description
Generate Image From User Input Text
Remove Something From The Photo
Replace Something From The Photo
Instruct Image Using Text
Answer Question About The Image
Edge Detection On Image
Generate Image Condition On Canny Image
Line Detection On Image
Generate Image Condition On Line Image
Hed Detection On Image
Generate Image Condition On Soft Hed
Boundary Image
Segmentation On Image
Generate Image Condition On Segmentations
Predict Depth On Image
Generate Image Condition On Depth
Predict Normal Map On Image
Generate Image Condition On Normal Map
Sketch Detection On Image
Generate Image Condition On Sketch Image
Pose Detection On Image
Generate Image Condition On Pose Image
Diffusions
出典 : “Denoising Diffusion Probabilistic Models” (Jonathan, et. al., 2020) https://arxiv.org/abs/2006.11239
出典 : “Learning Transferable Visual Models From
Natural Language Supervision” (Alec, et. al., 2021)
https://arxiv.org/abs/2103.00020
Related Projects
• JARVIS (HuggingGPT) - Microsoft
https://github.com/microsoft/JARVIS
• LLM-Augmenter system - Microsoft
https://github.com/pengbaolin/LLM-Augmenter
• Transformers Agent – Hugging Face
https://huggingface.co/docs/transformers/transformers_
agents
• GPT-4 + Stable-Diffusion - Berkeley AI Research
https://llm-grounded-diffusion.github.io/

More Related Content

What's hot

[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence ModelingDeep Learning JP
 
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...Deep Learning JP
 
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Yusuke Uchida
 
Decision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence ModelingDecision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence ModelingYasunori Ozaki
 
多様な強化学習の概念と課題認識
多様な強化学習の概念と課題認識多様な強化学習の概念と課題認識
多様な強化学習の概念と課題認識佑 甲野
 
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learningゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement LearningPreferred Networks
 
非同期処理の基礎
非同期処理の基礎非同期処理の基礎
非同期処理の基礎信之 岩永
 
自然言語処理による議論マイニング
自然言語処理による議論マイニング自然言語処理による議論マイニング
自然言語処理による議論マイニングNaoaki Okazaki
 
自動運転におけるCNNの信頼性
自動運転におけるCNNの信頼性自動運転におけるCNNの信頼性
自動運転におけるCNNの信頼性Fixstars Corporation
 
2019年度チュートリアルBPE
2019年度チュートリアルBPE2019年度チュートリアルBPE
2019年度チュートリアルBPE広樹 本間
 
【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings (EMNLP 2021)
【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings  (EMNLP 2021)【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings  (EMNLP 2021)
【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings (EMNLP 2021)Deep Learning JP
 
大規模言語モデル開発を支える分散学習技術 - 東京工業大学横田理央研究室の藤井一喜さん
大規模言語モデル開発を支える分散学習技術 - 東京工業大学横田理央研究室の藤井一喜さん大規模言語モデル開発を支える分散学習技術 - 東京工業大学横田理央研究室の藤井一喜さん
大規模言語モデル開発を支える分散学習技術 - 東京工業大学横田理央研究室の藤井一喜さんAkira Shibata
 
グラフィカルモデル入門
グラフィカルモデル入門グラフィカルモデル入門
グラフィカルモデル入門Kawamoto_Kazuhiko
 
DeNAゲーム事業におけるデータエンジニアの貢献 [DeNA TechCon 2019]
DeNAゲーム事業におけるデータエンジニアの貢献 [DeNA TechCon 2019]DeNAゲーム事業におけるデータエンジニアの貢献 [DeNA TechCon 2019]
DeNAゲーム事業におけるデータエンジニアの貢献 [DeNA TechCon 2019]DeNA
 
多人数不完全情報ゲームにおけるAI ~ポーカーと麻雀を例として~
多人数不完全情報ゲームにおけるAI ~ポーカーと麻雀を例として~多人数不完全情報ゲームにおけるAI ~ポーカーと麻雀を例として~
多人数不完全情報ゲームにおけるAI ~ポーカーと麻雀を例として~Kenshi Abe
 

What's hot (20)

[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
 
Semantic segmentation
Semantic segmentationSemantic segmentation
Semantic segmentation
 
Marp Tutorial
Marp TutorialMarp Tutorial
Marp Tutorial
 
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
 
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
 
BlackBox モデルの説明性・解釈性技術の実装
BlackBox モデルの説明性・解釈性技術の実装BlackBox モデルの説明性・解釈性技術の実装
BlackBox モデルの説明性・解釈性技術の実装
 
Decision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence ModelingDecision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence Modeling
 
多様な強化学習の概念と課題認識
多様な強化学習の概念と課題認識多様な強化学習の概念と課題認識
多様な強化学習の概念と課題認識
 
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learningゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
 
非同期処理の基礎
非同期処理の基礎非同期処理の基礎
非同期処理の基礎
 
BERT+XLNet+RoBERTa
BERT+XLNet+RoBERTaBERT+XLNet+RoBERTa
BERT+XLNet+RoBERTa
 
自然言語処理による議論マイニング
自然言語処理による議論マイニング自然言語処理による議論マイニング
自然言語処理による議論マイニング
 
自動運転におけるCNNの信頼性
自動運転におけるCNNの信頼性自動運転におけるCNNの信頼性
自動運転におけるCNNの信頼性
 
2019年度チュートリアルBPE
2019年度チュートリアルBPE2019年度チュートリアルBPE
2019年度チュートリアルBPE
 
【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings (EMNLP 2021)
【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings  (EMNLP 2021)【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings  (EMNLP 2021)
【DL輪読会】SimCSE: Simple Contrastive Learning of Sentence Embeddings (EMNLP 2021)
 
深層強化学習と実装例
深層強化学習と実装例深層強化学習と実装例
深層強化学習と実装例
 
大規模言語モデル開発を支える分散学習技術 - 東京工業大学横田理央研究室の藤井一喜さん
大規模言語モデル開発を支える分散学習技術 - 東京工業大学横田理央研究室の藤井一喜さん大規模言語モデル開発を支える分散学習技術 - 東京工業大学横田理央研究室の藤井一喜さん
大規模言語モデル開発を支える分散学習技術 - 東京工業大学横田理央研究室の藤井一喜さん
 
グラフィカルモデル入門
グラフィカルモデル入門グラフィカルモデル入門
グラフィカルモデル入門
 
DeNAゲーム事業におけるデータエンジニアの貢献 [DeNA TechCon 2019]
DeNAゲーム事業におけるデータエンジニアの貢献 [DeNA TechCon 2019]DeNAゲーム事業におけるデータエンジニアの貢献 [DeNA TechCon 2019]
DeNAゲーム事業におけるデータエンジニアの貢献 [DeNA TechCon 2019]
 
多人数不完全情報ゲームにおけるAI ~ポーカーと麻雀を例として~
多人数不完全情報ゲームにおけるAI ~ポーカーと麻雀を例として~多人数不完全情報ゲームにおけるAI ~ポーカーと麻雀を例として~
多人数不完全情報ゲームにおけるAI ~ポーカーと麻雀を例として~
 

Similar to 2023/06/01 IoT ALGYAN ChatGPT研究会第9弾 資料

STEAMy AI in Education_VWEC AI Expert Series.pdf
STEAMy AI in Education_VWEC AI Expert Series.pdfSTEAMy AI in Education_VWEC AI Expert Series.pdf
STEAMy AI in Education_VWEC AI Expert Series.pdfCynthia Calongne
 
Why AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docx
Why AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docxWhy AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docx
Why AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docxShakuro
 
The power of CSS pseudo-elements
The power of CSS pseudo-elementsThe power of CSS pseudo-elements
The power of CSS pseudo-elementsGeoffrey Croftє
 
leewayhertz.com-Visual ChatGPT The next frontier of conversational AI.pdf
leewayhertz.com-Visual ChatGPT The next frontier of conversational AI.pdfleewayhertz.com-Visual ChatGPT The next frontier of conversational AI.pdf
leewayhertz.com-Visual ChatGPT The next frontier of conversational AI.pdfrobertsamuel23
 
Use of new media
Use of new mediaUse of new media
Use of new mediaMarianD
 
leewayhertz.com-Visual ChatGPT Conversational AI.pdf
leewayhertz.com-Visual ChatGPT  Conversational AI.pdfleewayhertz.com-Visual ChatGPT  Conversational AI.pdf
leewayhertz.com-Visual ChatGPT Conversational AI.pdfKristiLBurns
 
DynamicsPower! Melbourne AI for everyone: Virtual Agent & AI Builder
DynamicsPower! Melbourne AI for everyone: Virtual Agent & AI BuilderDynamicsPower! Melbourne AI for everyone: Virtual Agent & AI Builder
DynamicsPower! Melbourne AI for everyone: Virtual Agent & AI BuilderAndre Margono
 
Generative AI and ChatGPT - Scope of AI and advance Generative AI
Generative AI and ChatGPT - Scope of AI and advance Generative AIGenerative AI and ChatGPT - Scope of AI and advance Generative AI
Generative AI and ChatGPT - Scope of AI and advance Generative AIKumaresan K
 
Adobe photoshop program1
Adobe photoshop program1Adobe photoshop program1
Adobe photoshop program1u2803355Huda
 
Deep Learning Jump Start
Deep Learning Jump StartDeep Learning Jump Start
Deep Learning Jump StartMichele Toni
 
Data Scenarios 2020: 6 Amazing Transformations
Data Scenarios 2020: 6 Amazing TransformationsData Scenarios 2020: 6 Amazing Transformations
Data Scenarios 2020: 6 Amazing TransformationsSafe Software
 
Easy path to machine learning
Easy path to machine learningEasy path to machine learning
Easy path to machine learningwesley chun
 
Work In Progress
Work In ProgressWork In Progress
Work In Progresssamluk
 
PHOTOSHOP-REPORT-3.pptx
PHOTOSHOP-REPORT-3.pptxPHOTOSHOP-REPORT-3.pptx
PHOTOSHOP-REPORT-3.pptxNormanSoriano1
 
Short Talk on Graphiti at EclipseCon 2010
Short Talk on Graphiti at EclipseCon 2010Short Talk on Graphiti at EclipseCon 2010
Short Talk on Graphiti at EclipseCon 2010michaelwenz
 

Similar to 2023/06/01 IoT ALGYAN ChatGPT研究会第9弾 資料 (20)

STEAMy AI in Education_VWEC AI Expert Series.pdf
STEAMy AI in Education_VWEC AI Expert Series.pdfSTEAMy AI in Education_VWEC AI Expert Series.pdf
STEAMy AI in Education_VWEC AI Expert Series.pdf
 
Why AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docx
Why AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docxWhy AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docx
Why AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docx
 
The power of CSS pseudo-elements
The power of CSS pseudo-elementsThe power of CSS pseudo-elements
The power of CSS pseudo-elements
 
Question6
Question6Question6
Question6
 
leewayhertz.com-Visual ChatGPT The next frontier of conversational AI.pdf
leewayhertz.com-Visual ChatGPT The next frontier of conversational AI.pdfleewayhertz.com-Visual ChatGPT The next frontier of conversational AI.pdf
leewayhertz.com-Visual ChatGPT The next frontier of conversational AI.pdf
 
Use of new media
Use of new mediaUse of new media
Use of new media
 
leewayhertz.com-Visual ChatGPT Conversational AI.pdf
leewayhertz.com-Visual ChatGPT  Conversational AI.pdfleewayhertz.com-Visual ChatGPT  Conversational AI.pdf
leewayhertz.com-Visual ChatGPT Conversational AI.pdf
 
Apple Machine Learning
Apple Machine LearningApple Machine Learning
Apple Machine Learning
 
Evaluation, question 6
Evaluation, question 6Evaluation, question 6
Evaluation, question 6
 
DynamicsPower! Melbourne AI for everyone: Virtual Agent & AI Builder
DynamicsPower! Melbourne AI for everyone: Virtual Agent & AI BuilderDynamicsPower! Melbourne AI for everyone: Virtual Agent & AI Builder
DynamicsPower! Melbourne AI for everyone: Virtual Agent & AI Builder
 
Presentation
PresentationPresentation
Presentation
 
Generative AI and ChatGPT - Scope of AI and advance Generative AI
Generative AI and ChatGPT - Scope of AI and advance Generative AIGenerative AI and ChatGPT - Scope of AI and advance Generative AI
Generative AI and ChatGPT - Scope of AI and advance Generative AI
 
Adobe photoshop program1
Adobe photoshop program1Adobe photoshop program1
Adobe photoshop program1
 
Deep Learning Jump Start
Deep Learning Jump StartDeep Learning Jump Start
Deep Learning Jump Start
 
Adobe Photoshop Report
Adobe Photoshop ReportAdobe Photoshop Report
Adobe Photoshop Report
 
Data Scenarios 2020: 6 Amazing Transformations
Data Scenarios 2020: 6 Amazing TransformationsData Scenarios 2020: 6 Amazing Transformations
Data Scenarios 2020: 6 Amazing Transformations
 
Easy path to machine learning
Easy path to machine learningEasy path to machine learning
Easy path to machine learning
 
Work In Progress
Work In ProgressWork In Progress
Work In Progress
 
PHOTOSHOP-REPORT-3.pptx
PHOTOSHOP-REPORT-3.pptxPHOTOSHOP-REPORT-3.pptx
PHOTOSHOP-REPORT-3.pptx
 
Short Talk on Graphiti at EclipseCon 2010
Short Talk on Graphiti at EclipseCon 2010Short Talk on Graphiti at EclipseCon 2010
Short Talk on Graphiti at EclipseCon 2010
 

Recently uploaded

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

2023/06/01 IoT ALGYAN ChatGPT研究会第9弾 資料

  • 3. NLP (natural language processing) Tutorials https://github.com/tsmatz/nlp-tutorials
  • 5. 言語モデルの発展 出典 “A Survey of Large Language Models” (Wayne et. al., 2023) https://arxiv.org/abs/2303.18223
  • 6. LLM の Emergent Abilities Emergency = 突然出 現するもの 言語モデルが、ある パラメーター数を超 えると突然持ち始め る予測不能な現象 出典 : “Emergent Abilities of Large Language Models” (Jason, et. al., 2022) https://arxiv.org/abs/2206.07682
  • 7. 考え : 会社 C の請求金額を確認する必要がある 行動 : GetInvoice[C] 考え : 会社 F の請求金額を確認する必要がある 行動 : GetInvoice[F] 考え : C と F の合計を計算する必要がある 行動 : Total[20000, 4100] 考え : 会社 A の請求金額を確認する必要がある 行動 : GetInvoice[A] 考え : 会社 E の請求金額を確認する必要がある 行動 : GetInvoice[E] 請求 DB から C の請求金額を検索: 20000 請求 DB から F の請求金額を検索: 4100 計算機 20000 + 4100 = 24100 請求 DB から A の請求金額を検索: 2000 請求 DB から E の請求金額を検索: 1000 言語モデル (OpenAI GPT-3) 外部ツール 考え : A と E の合計を計算する必要がある 行動 : Total[2000, 1000] 計算機 2000 + 1000 = 3000 考え : 会社 C, F の請求金額の合計と会社 A, E の請求 金額の合計との間の差額を計算する必要がある 行動 : Diff[24100 3000] 計算機 | 24100 – 3000 | = 21100 考え : よって、答えは 21100 行動 : Finish[21100] OpenAI GPT が、思考を担当 (つぎにすべきことを提案) = Reasoning 外部ツール (関数部分) が 実作業を担当 = Acting
  • 8. • 下記ツールを使用 • Search • Lookup • 回答精度の大幅な向上 (ベンチマーク 結果) • より高度な方式では、強化学習・模倣 学習などとも連携 出典 “REACT: SYNERGIZING REASONING AND ACTING IN LANGUAGE MODELS” (Shunyu et. al., 2022) ReAct / MRKL Systems
  • 9. Prior knowledge vs Label mapping (Google がおこなった ICL の実験) unrelated labels flipped labels IF あり 1 2 IF なし 3 4
  • 10. unrelated labels flipped labels IF あり 1 2 IF なし 3 4 Prior knowledge vs Label mapping (Google がおこなった ICL の実験) 出典 : “Larger language models do in-context learning differently” (Jerry, et. al., 2023) https://arxiv.org/abs/2303.03846
  • 11. Prior knowledge vs Label mapping (Google がおこなった ICL の実験) unrelated labels flipped labels IF あり 1 2 IF なし 3 4 出典 : “Larger language models do in-context learning differently” (Jerry, et. al., 2023) https://arxiv.org/abs/2303.03846
  • 12. Symbol Tuning 出典 : “Symbol tuning improves in-context learning in language models” (Jerry, et. al., 2023) https://arxiv.org/abs/2305.08298
  • 13. LLM の Emergent Abilities Emergency = 突然出 現するもの 言語モデルが、ある パラメーター数を超 えると突然持ち始め る予測不能な現象 出典 : “Emergent Abilities of Large Language Models” (Jason, et. al., 2022) https://arxiv.org/abs/2206.07682
  • 14. Multi Modalities への期待 出典 : “ImageBind: One Embedding Space To Bind Them All” (Rohit, et. al., 2023) https://arxiv.org/abs/2305.05665
  • 15. TaskMatrix (Visual ChatGPT) • 画像の処理は、主に Hugging Face 上の Visual Foundation Model が担当 • 言語モデル (ChatGPT) を 使って Reasoning
  • 16. 実行例 • Thought: Do I need to use a tool? Yes • Action: Replace Something From The Photo • Action Input: image/9bb5e03b.png, couch, desk • Observation: image/5737_replace- something_9bb5e03b_9bb 5e03b.png • Thought: Do I need to use a tool? Yes • Action: Instruct Image Using Text • Action Input: image/5737_replace- something_9bb5e03b_9bb 5e03b.png, make it like a water-color painting • Observation: image/770e_pix2pix_5737 _9bb5e03b.png • Thought: Do I need to use a tool? No • AI: Here is the image you requested. • ![image/770e_pix2pix_573 7_9bb5e03b.png](image/7 70e_pix2pix_5737_9bb5e0 3b.png) “replace the sofa in this image with a desk and then make it like a water-color painting” Action 1: Replace Something From The Photo “couch, desk” Action 2: Instruct Image Using Text “make it like a water-color painting”
  • 17.
  • 18. Visual ChatGPT is designed to be able to assist with a wide range of text and visual related tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. Visual ChatGPT is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand. Visual ChatGPT is able to process and understand large amounts of text and images. As a language model, Visual ChatGPT can not directly read images, but it has a list of tools to finish different visual tasks. Each image will have a file name formed as "image/xxx.png", and Visual ChatGPT can invoke different tools to indirectly understand pictures. When talking about images, Visual ChatGPT is very strict to the file name and will never fabricate nonexistent files. When using tools to generate new image files, Visual ChatGPT is also known that the image may not be the same as the user's demand, and will use other visual question answering tools or description tools to observe the real image. Visual ChatGPT is able to use tools in a sequence, and is loyal to the tool observation outputs rather than faking the image content and image file name. It will remember to provide the file name from the last tool observation, if a new image is generated. Human may provide new figures to Visual ChatGPT with a description. The description helps Visual ChatGPT to understand this image, but Visual ChatGPT should use tools to finish following tasks, rather than directly imagine from the description. Overall, Visual ChatGPT is a powerful visual dialogue assistant tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. TOOLS: ------
  • 19. Visual ChatGPT has access to the following tools: > Get Photo Description: useful when you want to know what is inside the photo. receives image_path as input. The input to this tool should be a string, representing the image_path. > Remove Something From The Photo: useful when you want to remove and object or something from the photo from its description or location. The input to this tool should be a comma seperated string of two, representing the image_path and the object need to be removed. > Replace Something From The Photo: useful when you want to replace an object from the object description or location with another object from its description. The input to this tool should be a comma seperated string of three, representing the image_path, the object to be replaced, the object to be replaced with > Instruct Image Using Text: useful when you want to the style of the image to be like the text. like: make it look like a painting. or make it like a robot. The input to this tool should be a comma seperated string of two, representing the image_path and the text. . . . . . To use a tool, please use the following format: ``` Thought: Do I need to use a tool? Yes Action: the action to take, should be one of [Get Photo Description, Remove Something From The Photo, Replace Something From The Photo, Instruct Image Using Text] Action Input: the input to the action Observation: the result of the action ``` When you have a response to say to the Human, or if you do not need to use a tool, you MUST use the format:
  • 20. ``` Thought: Do I need to use a tool? No AI: [your response here] ``` You are very strict to the filename correctness and will never fake a file name if it does not exist. You will remember to provide the image file name loyally if it's provided in the last tool observation. Begin! Previous conversation history: Human: provide a figure named image/9bb5e03b.png. The description is: a living room with a couch and a couch in the corner. This information helps you to understand this image, but you should use tools to finish following tasks, rather than directly imagine from my description. If you understand, say "Received". AI: Received. New input: replace the sofa in this image with a desk and then make it like a water-color painting Since Visual ChatGPT is a text language model, Visual ChatGPT must use tools to observe images rather than imagination. The thoughts and observations are only visible for Visual ChatGPT, Visual ChatGPT should remember to repeat important information in the final response for Human. Thought: Do I need to use a tool? Yes Action: Replace Something From The Photo Action Input: image/9bb5e03b.png, couch, desk
  • 21. could you generate a cat for me ? could you replace a cat to a dog and then remove the book ? could you generate a canny edge of this image ? generate a yellow dog based on ......png
  • 22. Visual Foundation Models Get Photo Description Generate Image From User Input Text Remove Something From The Photo Replace Something From The Photo Instruct Image Using Text Answer Question About The Image Edge Detection On Image Generate Image Condition On Canny Image Line Detection On Image Generate Image Condition On Line Image Hed Detection On Image Generate Image Condition On Soft Hed Boundary Image Segmentation On Image Generate Image Condition On Segmentations Predict Depth On Image Generate Image Condition On Depth Predict Normal Map On Image Generate Image Condition On Normal Map Sketch Detection On Image Generate Image Condition On Sketch Image Pose Detection On Image Generate Image Condition On Pose Image
  • 23. Diffusions 出典 : “Denoising Diffusion Probabilistic Models” (Jonathan, et. al., 2020) https://arxiv.org/abs/2006.11239 出典 : “Learning Transferable Visual Models From Natural Language Supervision” (Alec, et. al., 2021) https://arxiv.org/abs/2103.00020
  • 24. Related Projects • JARVIS (HuggingGPT) - Microsoft https://github.com/microsoft/JARVIS • LLM-Augmenter system - Microsoft https://github.com/pengbaolin/LLM-Augmenter • Transformers Agent – Hugging Face https://huggingface.co/docs/transformers/transformers_ agents • GPT-4 + Stable-Diffusion - Berkeley AI Research https://llm-grounded-diffusion.github.io/