SlideShare a Scribd company logo
生成 AI の基礎と
TaskMatrix (Visual ChatGPT)
自己紹介
松崎 剛
日本マイクロソフト株式会社
パートナー事業本部
クラウドソリューションアーキテクト
ブログ
https://tsmatz.wordpress.com/
GitHub
https://github.com/tsmatz
NLP (natural language processing) Tutorials
https://github.com/tsmatz/nlp-tutorials
OpenAI の基礎
GPT
テキストの生成と理解
Embedding
テキストプロンプトから画像を生成
Codex
コードの生成と理解
DALL·E
テキストの生成と理解
言語モデルの発展
出典 “A Survey of Large Language Models” (Wayne et. al., 2023) https://arxiv.org/abs/2303.18223
LLM の Emergent Abilities
Emergency = 突然出
現するもの
言語モデルが、ある
パラメーター数を超
えると突然持ち始め
る予測不能な現象
出典 : “Emergent Abilities of Large Language Models” (Jason, et. al., 2022)
https://arxiv.org/abs/2206.07682
考え : 会社 C の請求金額を確認する必要がある
行動 : GetInvoice[C]
考え : 会社 F の請求金額を確認する必要がある
行動 : GetInvoice[F]
考え : C と F の合計を計算する必要がある
行動 : Total[20000, 4100]
考え : 会社 A の請求金額を確認する必要がある
行動 : GetInvoice[A]
考え : 会社 E の請求金額を確認する必要がある
行動 : GetInvoice[E]
請求 DB から C の請求金額を検索:
20000
請求 DB から F の請求金額を検索:
4100
計算機 20000 + 4100 = 24100
請求 DB から A の請求金額を検索:
2000
請求 DB から E の請求金額を検索:
1000
言語モデル (OpenAI GPT-3) 外部ツール
考え : A と E の合計を計算する必要がある
行動 : Total[2000, 1000] 計算機 2000 + 1000 = 3000
考え : 会社 C, F の請求金額の合計と会社 A, E の請求
金額の合計との間の差額を計算する必要がある
行動 : Diff[24100 3000]
計算機 | 24100 – 3000 | = 21100
考え : よって、答えは 21100
行動 : Finish[21100]
OpenAI GPT が、思考を担当
(つぎにすべきことを提案)
= Reasoning
外部ツール (関数部分) が
実作業を担当
= Acting
• 下記ツールを使用
• Search
• Lookup
• 回答精度の大幅な向上 (ベンチマーク
結果)
• より高度な方式では、強化学習・模倣
学習などとも連携
出典 “REACT: SYNERGIZING REASONING AND ACTING IN LANGUAGE
MODELS” (Shunyu et. al., 2022)
ReAct / MRKL Systems
Prior knowledge vs Label mapping
(Google がおこなった ICL の実験)
unrelated
labels
flipped
labels
IF あり 1 2
IF なし 3 4
unrelated
labels
flipped
labels
IF あり 1 2
IF なし 3 4
Prior knowledge vs Label mapping
(Google がおこなった ICL の実験)
出典 : “Larger language models do in-context learning differently” (Jerry, et. al., 2023)
https://arxiv.org/abs/2303.03846
Prior knowledge vs Label mapping
(Google がおこなった ICL の実験)
unrelated
labels
flipped
labels
IF あり 1 2
IF なし 3 4
出典 : “Larger language models do in-context learning differently” (Jerry, et. al., 2023)
https://arxiv.org/abs/2303.03846
Symbol Tuning
出典 : “Symbol tuning improves in-context learning in language models” (Jerry, et. al., 2023)
https://arxiv.org/abs/2305.08298
LLM の Emergent Abilities
Emergency = 突然出
現するもの
言語モデルが、ある
パラメーター数を超
えると突然持ち始め
る予測不能な現象
出典 : “Emergent Abilities of Large Language Models” (Jason, et. al., 2022)
https://arxiv.org/abs/2206.07682
Multi Modalities への期待
出典 : “ImageBind: One Embedding Space To Bind Them All” (Rohit, et. al., 2023)
https://arxiv.org/abs/2305.05665
TaskMatrix (Visual ChatGPT)
• 画像の処理は、主に
Hugging Face 上の Visual
Foundation Model が担当
• 言語モデル (ChatGPT) を
使って Reasoning
実行例
• Thought: Do I need to use
a tool? Yes
• Action: Replace Something
From The Photo
• Action Input:
image/9bb5e03b.png,
couch, desk
• Observation:
image/5737_replace-
something_9bb5e03b_9bb
5e03b.png
• Thought: Do I need to use
a tool? Yes
• Action: Instruct Image
Using Text
• Action Input:
image/5737_replace-
something_9bb5e03b_9bb
5e03b.png, make it like a
water-color painting
• Observation:
image/770e_pix2pix_5737
_9bb5e03b.png
• Thought: Do I need to use
a tool? No
• AI: Here is the image you
requested.
• ![image/770e_pix2pix_573
7_9bb5e03b.png](image/7
70e_pix2pix_5737_9bb5e0
3b.png)
“replace the sofa in this image with a desk and then make it like
a water-color painting”
Action 1:
Replace Something From The Photo
“couch, desk”
Action 2:
Instruct Image Using Text
“make it like a water-color painting”
Visual ChatGPT is designed to be able to assist with a wide range of text and visual related tasks,
from answering simple questions to providing in-depth explanations and discussions on a wide range of
topics. Visual ChatGPT is able to generate human-like text based on the input it receives, allowing it
to engage in natural-sounding conversations and provide responses that are coherent and relevant to
the topic at hand.
Visual ChatGPT is able to process and understand large amounts of text and images. As a language
model, Visual ChatGPT can not directly read images, but it has a list of tools to finish different
visual tasks. Each image will have a file name formed as "image/xxx.png", and Visual ChatGPT can
invoke different tools to indirectly understand pictures. When talking about images, Visual ChatGPT is
very strict to the file name and will never fabricate nonexistent files. When using tools to generate
new image files, Visual ChatGPT is also known that the image may not be the same as the user's demand,
and will use other visual question answering tools or description tools to observe the real image.
Visual ChatGPT is able to use tools in a sequence, and is loyal to the tool observation outputs rather
than faking the image content and image file name. It will remember to provide the file name from the
last tool observation, if a new image is generated.
Human may provide new figures to Visual ChatGPT with a description. The description helps Visual
ChatGPT to understand this image, but Visual ChatGPT should use tools to finish following tasks,
rather than directly imagine from the description.
Overall, Visual ChatGPT is a powerful visual dialogue assistant tool that can help with a wide range
of tasks and provide valuable insights and information on a wide range of topics.
TOOLS:
------
Visual ChatGPT has access to the following tools:
> Get Photo Description: useful when you want to know what is inside the photo. receives image_path as
input. The input to this tool should be a string, representing the image_path.
> Remove Something From The Photo: useful when you want to remove and object or something from the
photo from its description or location. The input to this tool should be a comma seperated string of
two, representing the image_path and the object need to be removed.
> Replace Something From The Photo: useful when you want to replace an object from the object
description or location with another object from its description. The input to this tool should be a
comma seperated string of three, representing the image_path, the object to be replaced, the object to
be replaced with
> Instruct Image Using Text: useful when you want to the style of the image to be like the text. like:
make it look like a painting. or make it like a robot. The input to this tool should be a comma
seperated string of two, representing the image_path and the text.
. . . . .
To use a tool, please use the following format:
```
Thought: Do I need to use a tool? Yes
Action: the action to take, should be one of [Get Photo Description, Remove Something From The Photo,
Replace Something From The Photo, Instruct Image Using Text]
Action Input: the input to the action
Observation: the result of the action
```
When you have a response to say to the Human, or if you do not need to use a tool, you MUST use the
format:
```
Thought: Do I need to use a tool? No
AI: [your response here]
```
You are very strict to the filename correctness and will never fake a file name if it does not exist.
You will remember to provide the image file name loyally if it's provided in the last tool
observation.
Begin!
Previous conversation history:
Human: provide a figure named image/9bb5e03b.png. The description is: a living room with a couch and a
couch in the corner. This information helps you to understand this image, but you should use tools to
finish following tasks, rather than directly imagine from my description. If you understand, say
"Received".
AI: Received.
New input: replace the sofa in this image with a desk and then make it like a water-color painting
Since Visual ChatGPT is a text language model, Visual ChatGPT must use tools to observe images rather
than imagination.
The thoughts and observations are only visible for Visual ChatGPT, Visual ChatGPT should remember to
repeat important information in the final response for Human.
Thought: Do I need to use a tool? Yes
Action: Replace Something From The Photo
Action Input: image/9bb5e03b.png, couch, desk
could you
generate a cat
for me ?
could you
replace a cat to
a dog and then
remove the
book ?
could you
generate a
canny edge of
this image ?
generate a yellow
dog based on
......png
Visual Foundation Models
Get Photo Description
Generate Image From User Input Text
Remove Something From The Photo
Replace Something From The Photo
Instruct Image Using Text
Answer Question About The Image
Edge Detection On Image
Generate Image Condition On Canny Image
Line Detection On Image
Generate Image Condition On Line Image
Hed Detection On Image
Generate Image Condition On Soft Hed
Boundary Image
Segmentation On Image
Generate Image Condition On Segmentations
Predict Depth On Image
Generate Image Condition On Depth
Predict Normal Map On Image
Generate Image Condition On Normal Map
Sketch Detection On Image
Generate Image Condition On Sketch Image
Pose Detection On Image
Generate Image Condition On Pose Image
Diffusions
出典 : “Denoising Diffusion Probabilistic Models” (Jonathan, et. al., 2020) https://arxiv.org/abs/2006.11239
出典 : “Learning Transferable Visual Models From
Natural Language Supervision” (Alec, et. al., 2021)
https://arxiv.org/abs/2103.00020
Related Projects
• JARVIS (HuggingGPT) - Microsoft
https://github.com/microsoft/JARVIS
• LLM-Augmenter system - Microsoft
https://github.com/pengbaolin/LLM-Augmenter
• Transformers Agent – Hugging Face
https://huggingface.co/docs/transformers/transformers_
agents
• GPT-4 + Stable-Diffusion - Berkeley AI Research
https://llm-grounded-diffusion.github.io/

More Related Content

What's hot

言語表現モデルBERTで文章生成してみた
言語表現モデルBERTで文章生成してみた言語表現モデルBERTで文章生成してみた
言語表現モデルBERTで文章生成してみた
Takuya Koumura
 
AI動向 6月8日改訂.pdf
AI動向 6月8日改訂.pdfAI動向 6月8日改訂.pdf
AI動向 6月8日改訂.pdf
patentshobayashi
 
ChatGPTは思ったほど賢くない
ChatGPTは思ったほど賢くないChatGPTは思ったほど賢くない
ChatGPTは思ったほど賢くない
Carnot Inc.
 
機械学習システム開発案件の事例紹介
機械学習システム開発案件の事例紹介機械学習システム開発案件の事例紹介
機械学習システム開発案件の事例紹介
BrainPad Inc.
 
2019年度チュートリアルBPE
2019年度チュートリアルBPE2019年度チュートリアルBPE
2019年度チュートリアルBPE
広樹 本間
 
子供の言語獲得と機械の言語獲得
子供の言語獲得と機械の言語獲得子供の言語獲得と機械の言語獲得
子供の言語獲得と機械の言語獲得
Yuya Unno
 
探索と活用の戦略 ベイズ最適化と多腕バンディット
探索と活用の戦略 ベイズ最適化と多腕バンディット探索と活用の戦略 ベイズ最適化と多腕バンディット
探索と活用の戦略 ベイズ最適化と多腕バンディット
H Okazaki
 
チャットコミュニケーションの問題と心理的安全性の課題 #EOF2019
チャットコミュニケーションの問題と心理的安全性の課題 #EOF2019チャットコミュニケーションの問題と心理的安全性の課題 #EOF2019
チャットコミュニケーションの問題と心理的安全性の課題 #EOF2019
Tokoroten Nakayama
 
[DL輪読会]High-Quality Self-Supervised Deep Image Denoising
[DL輪読会]High-Quality Self-Supervised Deep Image Denoising[DL輪読会]High-Quality Self-Supervised Deep Image Denoising
[DL輪読会]High-Quality Self-Supervised Deep Image Denoising
Deep Learning JP
 
人工知能の可能性と限界 ーこの5年間の発展ー
人工知能の可能性と限界 ーこの5年間の発展ー人工知能の可能性と限界 ーこの5年間の発展ー
人工知能の可能性と限界 ーこの5年間の発展ー
Takanobu Mizuta
 
深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -
tmtm otm
 
【DL輪読会】Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Mo...
【DL輪読会】Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Mo...【DL輪読会】Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Mo...
【DL輪読会】Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Mo...
Deep Learning JP
 
深層学習による自然言語処理入門: word2vecからBERT, GPT-3まで
深層学習による自然言語処理入門: word2vecからBERT, GPT-3まで深層学習による自然言語処理入門: word2vecからBERT, GPT-3まで
深層学習による自然言語処理入門: word2vecからBERT, GPT-3まで
Yahoo!デベロッパーネットワーク
 
失敗から学ぶ機械学習応用
失敗から学ぶ機械学習応用失敗から学ぶ機械学習応用
失敗から学ぶ機械学習応用
Hiroyuki Masuda
 
ナレッジグラフとオントロジー
ナレッジグラフとオントロジーナレッジグラフとオントロジー
ナレッジグラフとオントロジー
University of Tsukuba
 
強化学習技術とゲーム AI 〜 今できる事と今後できて欲しい事 〜
強化学習技術とゲーム AI  〜 今できる事と今後できて欲しい事 〜強化学習技術とゲーム AI  〜 今できる事と今後できて欲しい事 〜
強化学習技術とゲーム AI 〜 今できる事と今後できて欲しい事 〜
佑 甲野
 
Transformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法についてTransformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法について
Sho Takase
 
Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向
Ohnishi Katsunori
 
TransPose: Towards Explainable Human Pose Estimation by Transformer
TransPose: Towards Explainable Human Pose Estimation by TransformerTransPose: Towards Explainable Human Pose Estimation by Transformer
TransPose: Towards Explainable Human Pose Estimation by Transformer
Yasutomo Kawanishi
 
もしその単語がなかったら
もしその単語がなかったらもしその単語がなかったら
もしその単語がなかったら
Hiroshi Nakagawa
 

What's hot (20)

言語表現モデルBERTで文章生成してみた
言語表現モデルBERTで文章生成してみた言語表現モデルBERTで文章生成してみた
言語表現モデルBERTで文章生成してみた
 
AI動向 6月8日改訂.pdf
AI動向 6月8日改訂.pdfAI動向 6月8日改訂.pdf
AI動向 6月8日改訂.pdf
 
ChatGPTは思ったほど賢くない
ChatGPTは思ったほど賢くないChatGPTは思ったほど賢くない
ChatGPTは思ったほど賢くない
 
機械学習システム開発案件の事例紹介
機械学習システム開発案件の事例紹介機械学習システム開発案件の事例紹介
機械学習システム開発案件の事例紹介
 
2019年度チュートリアルBPE
2019年度チュートリアルBPE2019年度チュートリアルBPE
2019年度チュートリアルBPE
 
子供の言語獲得と機械の言語獲得
子供の言語獲得と機械の言語獲得子供の言語獲得と機械の言語獲得
子供の言語獲得と機械の言語獲得
 
探索と活用の戦略 ベイズ最適化と多腕バンディット
探索と活用の戦略 ベイズ最適化と多腕バンディット探索と活用の戦略 ベイズ最適化と多腕バンディット
探索と活用の戦略 ベイズ最適化と多腕バンディット
 
チャットコミュニケーションの問題と心理的安全性の課題 #EOF2019
チャットコミュニケーションの問題と心理的安全性の課題 #EOF2019チャットコミュニケーションの問題と心理的安全性の課題 #EOF2019
チャットコミュニケーションの問題と心理的安全性の課題 #EOF2019
 
[DL輪読会]High-Quality Self-Supervised Deep Image Denoising
[DL輪読会]High-Quality Self-Supervised Deep Image Denoising[DL輪読会]High-Quality Self-Supervised Deep Image Denoising
[DL輪読会]High-Quality Self-Supervised Deep Image Denoising
 
人工知能の可能性と限界 ーこの5年間の発展ー
人工知能の可能性と限界 ーこの5年間の発展ー人工知能の可能性と限界 ーこの5年間の発展ー
人工知能の可能性と限界 ーこの5年間の発展ー
 
深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -深層学習の不確実性 - Uncertainty in Deep Neural Networks -
深層学習の不確実性 - Uncertainty in Deep Neural Networks -
 
【DL輪読会】Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Mo...
【DL輪読会】Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Mo...【DL輪読会】Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Mo...
【DL輪読会】Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Mo...
 
深層学習による自然言語処理入門: word2vecからBERT, GPT-3まで
深層学習による自然言語処理入門: word2vecからBERT, GPT-3まで深層学習による自然言語処理入門: word2vecからBERT, GPT-3まで
深層学習による自然言語処理入門: word2vecからBERT, GPT-3まで
 
失敗から学ぶ機械学習応用
失敗から学ぶ機械学習応用失敗から学ぶ機械学習応用
失敗から学ぶ機械学習応用
 
ナレッジグラフとオントロジー
ナレッジグラフとオントロジーナレッジグラフとオントロジー
ナレッジグラフとオントロジー
 
強化学習技術とゲーム AI 〜 今できる事と今後できて欲しい事 〜
強化学習技術とゲーム AI  〜 今できる事と今後できて欲しい事 〜強化学習技術とゲーム AI  〜 今できる事と今後できて欲しい事 〜
強化学習技術とゲーム AI 〜 今できる事と今後できて欲しい事 〜
 
Transformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法についてTransformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法について
 
Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向
 
TransPose: Towards Explainable Human Pose Estimation by Transformer
TransPose: Towards Explainable Human Pose Estimation by TransformerTransPose: Towards Explainable Human Pose Estimation by Transformer
TransPose: Towards Explainable Human Pose Estimation by Transformer
 
もしその単語がなかったら
もしその単語がなかったらもしその単語がなかったら
もしその単語がなかったら
 

Similar to 2023/06/01 IoT ALGYAN ChatGPT研究会第9弾 資料

STEAMy AI in Education_VWEC AI Expert Series.pdf
STEAMy AI in Education_VWEC AI Expert Series.pdfSTEAMy AI in Education_VWEC AI Expert Series.pdf
STEAMy AI in Education_VWEC AI Expert Series.pdf
Cynthia Calongne
 
Why AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docx
Why AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docxWhy AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docx
Why AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docx
Shakuro
 
The power of CSS pseudo-elements
The power of CSS pseudo-elementsThe power of CSS pseudo-elements
The power of CSS pseudo-elements
Geoffrey Croftє
 
Question6
Question6Question6
leewayhertz.com-Visual ChatGPT The next frontier of conversational AI.pdf
leewayhertz.com-Visual ChatGPT The next frontier of conversational AI.pdfleewayhertz.com-Visual ChatGPT The next frontier of conversational AI.pdf
leewayhertz.com-Visual ChatGPT The next frontier of conversational AI.pdf
robertsamuel23
 
Use of new media
Use of new mediaUse of new media
Use of new media
MarianD
 
leewayhertz.com-Visual ChatGPT Conversational AI.pdf
leewayhertz.com-Visual ChatGPT  Conversational AI.pdfleewayhertz.com-Visual ChatGPT  Conversational AI.pdf
leewayhertz.com-Visual ChatGPT Conversational AI.pdf
KristiLBurns
 
Apple Machine Learning
Apple Machine LearningApple Machine Learning
Apple Machine Learning
Denise Nepraunig
 
Evaluation, question 6
Evaluation, question 6Evaluation, question 6
Evaluation, question 6
Jordan Rafferty
 
DynamicsPower! Melbourne AI for everyone: Virtual Agent & AI Builder
DynamicsPower! Melbourne AI for everyone: Virtual Agent & AI BuilderDynamicsPower! Melbourne AI for everyone: Virtual Agent & AI Builder
DynamicsPower! Melbourne AI for everyone: Virtual Agent & AI Builder
Andre Margono
 
Presentation
PresentationPresentation
Presentation
vividproduction
 
Generative AI and ChatGPT - Scope of AI and advance Generative AI
Generative AI and ChatGPT - Scope of AI and advance Generative AIGenerative AI and ChatGPT - Scope of AI and advance Generative AI
Generative AI and ChatGPT - Scope of AI and advance Generative AI
Kumaresan K
 
Adobe photoshop program1
Adobe photoshop program1Adobe photoshop program1
Adobe photoshop program1
u2803355Huda
 
Deep Learning Jump Start
Deep Learning Jump StartDeep Learning Jump Start
Deep Learning Jump Start
Michele Toni
 
Adobe Photoshop Report
Adobe Photoshop ReportAdobe Photoshop Report
Adobe Photoshop Report
NURUL ATIKAH ABDUL KHALID
 
Data Scenarios 2020: 6 Amazing Transformations
Data Scenarios 2020: 6 Amazing TransformationsData Scenarios 2020: 6 Amazing Transformations
Data Scenarios 2020: 6 Amazing Transformations
Safe Software
 
Easy path to machine learning
Easy path to machine learningEasy path to machine learning
Easy path to machine learning
wesley chun
 
Work In Progress
Work In ProgressWork In Progress
Work In Progress
samluk
 
PHOTOSHOP-REPORT-3.pptx
PHOTOSHOP-REPORT-3.pptxPHOTOSHOP-REPORT-3.pptx
PHOTOSHOP-REPORT-3.pptx
NormanSoriano1
 
Short Talk on Graphiti at EclipseCon 2010
Short Talk on Graphiti at EclipseCon 2010Short Talk on Graphiti at EclipseCon 2010
Short Talk on Graphiti at EclipseCon 2010
michaelwenz
 

Similar to 2023/06/01 IoT ALGYAN ChatGPT研究会第9弾 資料 (20)

STEAMy AI in Education_VWEC AI Expert Series.pdf
STEAMy AI in Education_VWEC AI Expert Series.pdfSTEAMy AI in Education_VWEC AI Expert Series.pdf
STEAMy AI in Education_VWEC AI Expert Series.pdf
 
Why AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docx
Why AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docxWhy AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docx
Why AI Image Generators Won’t Replace UI_UX Designers & Illustrators.docx
 
The power of CSS pseudo-elements
The power of CSS pseudo-elementsThe power of CSS pseudo-elements
The power of CSS pseudo-elements
 
Question6
Question6Question6
Question6
 
leewayhertz.com-Visual ChatGPT The next frontier of conversational AI.pdf
leewayhertz.com-Visual ChatGPT The next frontier of conversational AI.pdfleewayhertz.com-Visual ChatGPT The next frontier of conversational AI.pdf
leewayhertz.com-Visual ChatGPT The next frontier of conversational AI.pdf
 
Use of new media
Use of new mediaUse of new media
Use of new media
 
leewayhertz.com-Visual ChatGPT Conversational AI.pdf
leewayhertz.com-Visual ChatGPT  Conversational AI.pdfleewayhertz.com-Visual ChatGPT  Conversational AI.pdf
leewayhertz.com-Visual ChatGPT Conversational AI.pdf
 
Apple Machine Learning
Apple Machine LearningApple Machine Learning
Apple Machine Learning
 
Evaluation, question 6
Evaluation, question 6Evaluation, question 6
Evaluation, question 6
 
DynamicsPower! Melbourne AI for everyone: Virtual Agent & AI Builder
DynamicsPower! Melbourne AI for everyone: Virtual Agent & AI BuilderDynamicsPower! Melbourne AI for everyone: Virtual Agent & AI Builder
DynamicsPower! Melbourne AI for everyone: Virtual Agent & AI Builder
 
Presentation
PresentationPresentation
Presentation
 
Generative AI and ChatGPT - Scope of AI and advance Generative AI
Generative AI and ChatGPT - Scope of AI and advance Generative AIGenerative AI and ChatGPT - Scope of AI and advance Generative AI
Generative AI and ChatGPT - Scope of AI and advance Generative AI
 
Adobe photoshop program1
Adobe photoshop program1Adobe photoshop program1
Adobe photoshop program1
 
Deep Learning Jump Start
Deep Learning Jump StartDeep Learning Jump Start
Deep Learning Jump Start
 
Adobe Photoshop Report
Adobe Photoshop ReportAdobe Photoshop Report
Adobe Photoshop Report
 
Data Scenarios 2020: 6 Amazing Transformations
Data Scenarios 2020: 6 Amazing TransformationsData Scenarios 2020: 6 Amazing Transformations
Data Scenarios 2020: 6 Amazing Transformations
 
Easy path to machine learning
Easy path to machine learningEasy path to machine learning
Easy path to machine learning
 
Work In Progress
Work In ProgressWork In Progress
Work In Progress
 
PHOTOSHOP-REPORT-3.pptx
PHOTOSHOP-REPORT-3.pptxPHOTOSHOP-REPORT-3.pptx
PHOTOSHOP-REPORT-3.pptx
 
Short Talk on Graphiti at EclipseCon 2010
Short Talk on Graphiti at EclipseCon 2010Short Talk on Graphiti at EclipseCon 2010
Short Talk on Graphiti at EclipseCon 2010
 

Recently uploaded

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 

Recently uploaded (20)

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 

2023/06/01 IoT ALGYAN ChatGPT研究会第9弾 資料

  • 3. NLP (natural language processing) Tutorials https://github.com/tsmatz/nlp-tutorials
  • 5. 言語モデルの発展 出典 “A Survey of Large Language Models” (Wayne et. al., 2023) https://arxiv.org/abs/2303.18223
  • 6. LLM の Emergent Abilities Emergency = 突然出 現するもの 言語モデルが、ある パラメーター数を超 えると突然持ち始め る予測不能な現象 出典 : “Emergent Abilities of Large Language Models” (Jason, et. al., 2022) https://arxiv.org/abs/2206.07682
  • 7. 考え : 会社 C の請求金額を確認する必要がある 行動 : GetInvoice[C] 考え : 会社 F の請求金額を確認する必要がある 行動 : GetInvoice[F] 考え : C と F の合計を計算する必要がある 行動 : Total[20000, 4100] 考え : 会社 A の請求金額を確認する必要がある 行動 : GetInvoice[A] 考え : 会社 E の請求金額を確認する必要がある 行動 : GetInvoice[E] 請求 DB から C の請求金額を検索: 20000 請求 DB から F の請求金額を検索: 4100 計算機 20000 + 4100 = 24100 請求 DB から A の請求金額を検索: 2000 請求 DB から E の請求金額を検索: 1000 言語モデル (OpenAI GPT-3) 外部ツール 考え : A と E の合計を計算する必要がある 行動 : Total[2000, 1000] 計算機 2000 + 1000 = 3000 考え : 会社 C, F の請求金額の合計と会社 A, E の請求 金額の合計との間の差額を計算する必要がある 行動 : Diff[24100 3000] 計算機 | 24100 – 3000 | = 21100 考え : よって、答えは 21100 行動 : Finish[21100] OpenAI GPT が、思考を担当 (つぎにすべきことを提案) = Reasoning 外部ツール (関数部分) が 実作業を担当 = Acting
  • 8. • 下記ツールを使用 • Search • Lookup • 回答精度の大幅な向上 (ベンチマーク 結果) • より高度な方式では、強化学習・模倣 学習などとも連携 出典 “REACT: SYNERGIZING REASONING AND ACTING IN LANGUAGE MODELS” (Shunyu et. al., 2022) ReAct / MRKL Systems
  • 9. Prior knowledge vs Label mapping (Google がおこなった ICL の実験) unrelated labels flipped labels IF あり 1 2 IF なし 3 4
  • 10. unrelated labels flipped labels IF あり 1 2 IF なし 3 4 Prior knowledge vs Label mapping (Google がおこなった ICL の実験) 出典 : “Larger language models do in-context learning differently” (Jerry, et. al., 2023) https://arxiv.org/abs/2303.03846
  • 11. Prior knowledge vs Label mapping (Google がおこなった ICL の実験) unrelated labels flipped labels IF あり 1 2 IF なし 3 4 出典 : “Larger language models do in-context learning differently” (Jerry, et. al., 2023) https://arxiv.org/abs/2303.03846
  • 12. Symbol Tuning 出典 : “Symbol tuning improves in-context learning in language models” (Jerry, et. al., 2023) https://arxiv.org/abs/2305.08298
  • 13. LLM の Emergent Abilities Emergency = 突然出 現するもの 言語モデルが、ある パラメーター数を超 えると突然持ち始め る予測不能な現象 出典 : “Emergent Abilities of Large Language Models” (Jason, et. al., 2022) https://arxiv.org/abs/2206.07682
  • 14. Multi Modalities への期待 出典 : “ImageBind: One Embedding Space To Bind Them All” (Rohit, et. al., 2023) https://arxiv.org/abs/2305.05665
  • 15. TaskMatrix (Visual ChatGPT) • 画像の処理は、主に Hugging Face 上の Visual Foundation Model が担当 • 言語モデル (ChatGPT) を 使って Reasoning
  • 16. 実行例 • Thought: Do I need to use a tool? Yes • Action: Replace Something From The Photo • Action Input: image/9bb5e03b.png, couch, desk • Observation: image/5737_replace- something_9bb5e03b_9bb 5e03b.png • Thought: Do I need to use a tool? Yes • Action: Instruct Image Using Text • Action Input: image/5737_replace- something_9bb5e03b_9bb 5e03b.png, make it like a water-color painting • Observation: image/770e_pix2pix_5737 _9bb5e03b.png • Thought: Do I need to use a tool? No • AI: Here is the image you requested. • ![image/770e_pix2pix_573 7_9bb5e03b.png](image/7 70e_pix2pix_5737_9bb5e0 3b.png) “replace the sofa in this image with a desk and then make it like a water-color painting” Action 1: Replace Something From The Photo “couch, desk” Action 2: Instruct Image Using Text “make it like a water-color painting”
  • 17.
  • 18. Visual ChatGPT is designed to be able to assist with a wide range of text and visual related tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. Visual ChatGPT is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand. Visual ChatGPT is able to process and understand large amounts of text and images. As a language model, Visual ChatGPT can not directly read images, but it has a list of tools to finish different visual tasks. Each image will have a file name formed as "image/xxx.png", and Visual ChatGPT can invoke different tools to indirectly understand pictures. When talking about images, Visual ChatGPT is very strict to the file name and will never fabricate nonexistent files. When using tools to generate new image files, Visual ChatGPT is also known that the image may not be the same as the user's demand, and will use other visual question answering tools or description tools to observe the real image. Visual ChatGPT is able to use tools in a sequence, and is loyal to the tool observation outputs rather than faking the image content and image file name. It will remember to provide the file name from the last tool observation, if a new image is generated. Human may provide new figures to Visual ChatGPT with a description. The description helps Visual ChatGPT to understand this image, but Visual ChatGPT should use tools to finish following tasks, rather than directly imagine from the description. Overall, Visual ChatGPT is a powerful visual dialogue assistant tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. TOOLS: ------
  • 19. Visual ChatGPT has access to the following tools: > Get Photo Description: useful when you want to know what is inside the photo. receives image_path as input. The input to this tool should be a string, representing the image_path. > Remove Something From The Photo: useful when you want to remove and object or something from the photo from its description or location. The input to this tool should be a comma seperated string of two, representing the image_path and the object need to be removed. > Replace Something From The Photo: useful when you want to replace an object from the object description or location with another object from its description. The input to this tool should be a comma seperated string of three, representing the image_path, the object to be replaced, the object to be replaced with > Instruct Image Using Text: useful when you want to the style of the image to be like the text. like: make it look like a painting. or make it like a robot. The input to this tool should be a comma seperated string of two, representing the image_path and the text. . . . . . To use a tool, please use the following format: ``` Thought: Do I need to use a tool? Yes Action: the action to take, should be one of [Get Photo Description, Remove Something From The Photo, Replace Something From The Photo, Instruct Image Using Text] Action Input: the input to the action Observation: the result of the action ``` When you have a response to say to the Human, or if you do not need to use a tool, you MUST use the format:
  • 20. ``` Thought: Do I need to use a tool? No AI: [your response here] ``` You are very strict to the filename correctness and will never fake a file name if it does not exist. You will remember to provide the image file name loyally if it's provided in the last tool observation. Begin! Previous conversation history: Human: provide a figure named image/9bb5e03b.png. The description is: a living room with a couch and a couch in the corner. This information helps you to understand this image, but you should use tools to finish following tasks, rather than directly imagine from my description. If you understand, say "Received". AI: Received. New input: replace the sofa in this image with a desk and then make it like a water-color painting Since Visual ChatGPT is a text language model, Visual ChatGPT must use tools to observe images rather than imagination. The thoughts and observations are only visible for Visual ChatGPT, Visual ChatGPT should remember to repeat important information in the final response for Human. Thought: Do I need to use a tool? Yes Action: Replace Something From The Photo Action Input: image/9bb5e03b.png, couch, desk
  • 21. could you generate a cat for me ? could you replace a cat to a dog and then remove the book ? could you generate a canny edge of this image ? generate a yellow dog based on ......png
  • 22. Visual Foundation Models Get Photo Description Generate Image From User Input Text Remove Something From The Photo Replace Something From The Photo Instruct Image Using Text Answer Question About The Image Edge Detection On Image Generate Image Condition On Canny Image Line Detection On Image Generate Image Condition On Line Image Hed Detection On Image Generate Image Condition On Soft Hed Boundary Image Segmentation On Image Generate Image Condition On Segmentations Predict Depth On Image Generate Image Condition On Depth Predict Normal Map On Image Generate Image Condition On Normal Map Sketch Detection On Image Generate Image Condition On Sketch Image Pose Detection On Image Generate Image Condition On Pose Image
  • 23. Diffusions 出典 : “Denoising Diffusion Probabilistic Models” (Jonathan, et. al., 2020) https://arxiv.org/abs/2006.11239 出典 : “Learning Transferable Visual Models From Natural Language Supervision” (Alec, et. al., 2021) https://arxiv.org/abs/2103.00020
  • 24. Related Projects • JARVIS (HuggingGPT) - Microsoft https://github.com/microsoft/JARVIS • LLM-Augmenter system - Microsoft https://github.com/pengbaolin/LLM-Augmenter • Transformers Agent – Hugging Face https://huggingface.co/docs/transformers/transformers_ agents • GPT-4 + Stable-Diffusion - Berkeley AI Research https://llm-grounded-diffusion.github.io/