LLM+LangChainで特許調査・分析に取り組んでみた

LLM+LangChainで
特許調査・分析に取り組んでみた
2023/10/17
株式会社エンライトオン
Generative AI Study Group 9th 発表資料

目次
１．自己紹介
２．知財×AIについて
３．特許調査・分析について
４．LangChainについて
５．特許調査・分析で使えそうなもの探し（実験）
2

１．自己紹介
3
西尾啓（にしおけい）
Founder&Coder@株式会社エン
ライトオン（知財分析）
弁理士／特許サーチャー @ミノル国
際特許事務所
MLエンジニア@株式会社Zuva(海外
スタートアップデータベース )
2008年 2023年
弁理士／特許サーチャー @創成国
際特許事務所
長野県南部
2017年 2018年
文献調べるの面倒だな
…⇒自然言語処理（NLP）
へ
特許以外の分野も
面白そう⇒Zuva参
加
知財分野でNLP頑張る
経歴
1984年

知財業界はAI（というか結構DX）が遅れ気味
・特許管理システムが知的財産部（会社）と
特許事務所でバラバラ（N対N)
・「手打ち」で各システムに同じ情報を入力＆チェック
・紙での情報やりとり
・手めくり（１件づつ）特許文献を確認・・・など
２．知財分野×AIについて
4

知財分野×AIについて
一方、AIが得意な分野もある
特許文献のスクリーニングや技術別分類付与
商標（トレードマーク）の類似画像検索
特許文章の生成補助
・・・など
5

研究
分析・活用
出願管理・特許管理
特許明細書作成支援
模倣品対策
商標調査
商標出願
アイデア生成
特許・技術調査支援
知財AIサービスマップ２０２３
意匠調査意匠出願支援著作権ブランド保護
翻訳
最近、知財×AIのサービスは結構出てきている
6

生成AI前にもこんなのもありました
2019年
https://www.cloudsign.jp/media/201
90904-aivsbenrishi/
2022年
https://www.jpo.go.jp/system/laws/sesaku/
ai_action_plan/ai_action_plan-image.html
特許対決はまだないです。
2020年
https://www.jpo.go.jp/resources/report/gidou-houk
oku/tokkyo/document/index/gido_machine_learnin
g.pdf
https://www.anlp.jp/proceedings/an
nual_meeting/2021/pdf_dir/C1-3.pd
f
7

2020年
https://www.sciencedirect.com/scie
nce/article/pii/S0172219019300742
?via%3Dihub
2020年
https://www.japio.or.jp/00yearbook/f
iles/2020book/20_3_03.pdf
2018年
https://cloud.google.com/blog/produ
cts/ai-machine-learning/measuring-
patent-claim-breadth-using-google-
patents-public-datasets
生成AI前にもこんなのもありました
8

特許１件ずつにスポットを当てて調
べるもの
ex:アイデアが公知か
ex:他社特許権のクリアランス
⇒自社の事業の実施上障害とな
る他社特許権が存在しないかの確
認
9
特許調査よりもマクロ（俯瞰的）に特
許を分析するもの
ex:ある技術分野全体の動向
ex:自社／他社特許ポートフォリオ
特許調査特許分析
・類似する特許や文献を探すのが大変。
調査範囲の設定／スクリーニングなど
・毎回同じような調査の場合は自動化したくなる。
・特許データはいわゆるビッグデータにもなりうる
ので取得や処理〜可視化までの負担
・特許文献（文章）からの分析に必要な情報の抽
出作業が負担
・分析のための仮説作成と検証が大変

特許調査の例１ー自社アイデアと先行技術との比較
10
自社アイデア先行技術の内容
比較

11
特許調査の例１ー他社特許と自社製品との比較
比較
他社特許の内容自社製品の内容

特許分析の例
12

４．LangChainとは
https://www.langchain.com/
13
・素の大規模言語モデル（LLM)をより使
いやすくするための糊のような役割のラ
イブラリ
・python版とjavascript版がある。

主要機能
Model I/O
様々な言語モデルを１つのインター
フェースで扱う
Retrieval
テキスト情報を埋め込み表現に変換
＆利用する
Chains
モデルや処理を複数連結させる
Memory
チャット機能（過去のやりとりを記憶）
Agents
素のLLMに道具を渡す。 14
素のLLMを使うより複
雑で多様なことができ
る。
４．LangChainとは

５．特許調査・分析で使えそうなもの探し（実験）
No. 機能概要ブロク記事
1 特許文献の要約読みにくい特許文献（海外文献など）を分
かりやすくする
リンク
2 特許文献からの情報抽出内容について自然文で質問する。根拠つ
きで回答してもらう。
リンク
3 特許文献中に書いてある
（かもしれない）ことを抽出
特定の物質や物性などを抽出するリンク
4 課題・解決手段マップ作成特許文献の記載内容のうち、課題と解決
手段を抜き出す
リンク
6 特許調査の自動化調査対象の把握〜調査まで自動でリンク
7 分析仮説の作成アブダクション（仮説推論）リンク
8 LLM同士で議論 LLMに役割を複数持たせて議論リンク
9 有価証券報告書からデー
タ抽出
米国の企業の知財戦略を見てみるー
15
その他：特許明細書を書く／弁理士試験に合格する・・・など

①特許文献の要約
必要性特許文献は読みにくいた
め、分かりやすく書き直して
ほしい
LangChainの
機能
なし（WindowsCopilot）
サンプル USのソフトウェア特許を解
読してもらう⇒
16
５．使えそうなもの探し（実験）

②特許文献からの情報抽出
17
必要性特許文献を全部（すみからすみまで）読みたくない
利用関数 UnstructuredURLLoader
CharacterTextSplitter
Chroma
サンプル chain = RetrievalQAWithSourcesChain.from_chain_type(model,
chain_type="map_reduce", retriever=docsearch.as_retriever())
q1="この発明に記載の方法で処理できる物質を教えてください。
"
chain({"question": q1}, return_only_outputs=True)
output:
{'answer': 'Types of waste that can be processed include
normal garbage, toilet solid waste, organic waste, and small
amounts of PVC plastic containing chlorine. Glass and metal
are not melted but are recovered as completely sterilized at
the end of the process cycle for recycling.n', 'sources':
'https://patents.google.com/patent/WO2020118236A1/en,
https://patents.google.com/patent/US7998226B2/en'}

③特許文献中に書いてある（かもしれない）ことを抽出
18
必要性特許文献を全部（すみからすみまで）読みたくな
い。しかも書いてない可能性もある
利用関数 create_extraction_chain
recursiveCharacterTextSplitter
AsyncHtmlLoader
BeautifulSoupTransformer
サンプル urls =
["https://patents.google.com/patent/US20190078220A1/en"]#,"https://pate
nts.google.com/patent/US20160237578A1/en","https://patents.google.com
/patent/US20160237578A1/en?"]
extracted_content = scrape_with_playwright(urls, schema=schema)
Fetching pages: 100%|##########| 1/1 [00:00<00:00,
13.23it/s]
Extracting content with LLM [{'Anode initial overvoltage': '1.7
to 1.9 V', 'Anode overvoltage': '0.3 to 0.4 Acm−2',
'durability': 'more than several tens of years'}]
問：文献中に初期
過電圧と過電圧
について記載が
あるか？

④課題・解決手段マップ作成
19
必要性課題と解決手段のクロス集計する（ために、文献１件
毎に「課題」と「解決手段」を抽出したい）
利用関数 create_tagging_chain
create_tagging_chain_pydantic
サンプル
schema = {
"properties": {
"課題": {"type": "string","description":"発明が解決しようとする課題を抽出"
,"enum":["生産性","滋養強壮","ダイエット","安全性","抗酸化","血中脂質改善、血圧降下","美容"]},
"解決手段": {"type": "string","description":"課題を解決するための方法や手段を抽出"
,"enum":["製造-酵素処理","製造-造粒・成形","製造-粉砕","製造-発酵","製造-混合・攪拌","製造-洗浄","製造-化学的処理","
製造-加熱","製造-分離抽出","製造-冷却","製造-乾燥","製造-その他"]},
"効果": {"type": "string","description":"発明によるメリットや効果"
},
},
"required": ["課題", "解決手段"],}
chain = create_tagging_chain(schema=schema, llm=llm)

⑥特許調査の自動化
20
必要性調査観点の偏りの補完・（楽したい）
利用関数 AutGPTで実験
サンプル（ポイント部分）
git clone -b stable https://github.com/Significant-Gravitas/Auto-GPT.git
cd Auto-GPT
pip install -r requirements.txt
cp .env.template .env
#コピー後の.envを公式どおり編集。
./run.sh #実行
いいことは言っているが、実験時点では
利用できる特許情報が少ないことで、上
手く実行できず。
⇒LangChain+独自APIで特許情報を簡
単に扱えるようにして、調査を自動化した
い

21
必要性分析の観点のアイデア出し
利用関数 LLMChain、human、AgentExecutor、ZeroShotAgent
tools = [ Tool( name="Search", func=search.run, description="useful for when
you need to answer questions about current events" , ),
Tool( name="Human", func=human.HumanInputRun(description="useful for when you
need to ask human yes or no about questions." ) ) ]
llm_chain = LLMChain(llm=llm, prompt=prompt)
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_chain = AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
verbose=True,
memory=memory
)
agent_chain.run(input="make an abduction.",compA="A株式会社",compB="B株式会社",technology="内視
鏡")
⑦分析仮説の作成(1/2)
問：A社とB社が技術的に＊＊分野
で繋がっている理由について仮説を
作る
人間を道具にできる

22
⑦分析仮説の作成(1/2)
humanの回答
humanの回答

23
必要性分析の観点のアイデア出し・自分の知識・考えの偏り
の補完
利用関数 Agents
names = {
"scientist": ["arxiv", "ddg-search", "wikipedia"],
"marketer": ["arxiv", "ddg-search", "wikipedia"],
"desingner": ["arxiv", "ddg-search", "wikipedia"], }
topic = "metamaterialsの可能性"
max_iters = 6
n = 0
simulator = DialogueSimulator(agents=agents, selection_function=select_next_speaker)
simulator.reset()
simulator.inject("Moderator", specified_topic) print(f"(Moderator): {specified_topic}")
print("n") while n < max_iters: name, message = simulator.step() print(f"({name}):
{message}") print("n") n += 1
⑧LLM同士で議論
Githubにも類似のものは色々とあります。
役割
使える道具
https://github.com/microsoft/autogen
https://github.com/dinobby/ReConcile
https://github.com/geekan/MetaGPT
https://github.com/aiwaves-cn/agents

24
必要性特許以外の関連情報を参照したい
利用関数 ConversationalRetrievalChain
KayAiRetriever
retriever = KayAiRetriever.create(dataset_id="company", data_types=["10-K",
"10-Q","PressRelease"], num_contexts=6)
qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)
questions = [
"What is IBM's patent strategy?",
]
chat_history = []
for question in questions:
result = qa({"question": question, "chat_history": chat_history})
chat_history.append((question, result["answer"]))
docs = retriever.get_relevant_documents(question)
print(f"-> **Question**: {question} n")
print(f"**Answer**: {result['answer']} n")
⑨有価証券報告書からデータ抽出
-> **Question**: What is IBM's patent strategy?
**Answer**: IBM's patent strategy focuses on seeking IP protection for its innovations while also
emphasizing other initiatives designed to leverage its IP leadership. The company actively pursues
intellectual property and invests approximately 8 percent of its total revenue in research and development
(R&D). IBM Research works with clients and business units to deliver new technologies and address
challenges in areas such as artificial intelligence, quantum computing, security, cloud, and systems. In
2019, IBM was awarded more U.S. patents than any other company, with a total of 9,262 patents,
including patents related to artificial intelligence, cloud, cybersecurity, and quantum computing.
問：IBMの特許戦略は？
⇒SCE Filings（有価証券報告書みたいな情
報）から情報を収集・まとめてくれる。

LLM+LangChainで特許調査・分析に取り組んでみた

More Related Content

What's hot

Similar to LLM+LangChainで特許調査・分析に取り組んでみた

More from KunihiroSugiyama1

LLM+LangChainで特許調査・分析に取り組んでみた