SlideShare a Scribd company logo
1 of 27
MM text team
蔡捷恩
莊文立
溫鈺瑋
2015@Delta Research Center
Fully automatic F/T matrix
analysis from patent data
蔡捷恩
Function/Technology MatrixUsing keyword “ ”
“The Patent-Classification Technology/Function Matrix - A Systematic Method for Design Around”, Cheng et al. Mar-2013, CSIR
Problem reduce
• detecting problem/solution pairs in a patent
document
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Problem term detection
• Step1. finding key frames
• Step2. feature extraction
– Unsupervised feature
– Supervised feature
• Step3. classifier training
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Step1. key frames detection
• We define key frames to be “
”
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Step2 – unsupervised feature
(language model)
• The model:
Maximize likelihood evaluation(MLE)
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Step2 – supervised feature
(linguistic model)
• By part-of-speech(POS) statistic on labeled
patents
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Step2 – supervised feature
(linguistic model)
• The model:
Delta function = 1 only when the current key frame
matches the given pattern
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Step3. classifier training
• Simply concatenate the features mention
above => LIBSVM
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Solution term detection
• Step1. key frame detection
• Step2. feature extraction
– Unsupervised feature
– Supervised feature: based on problem terms
• Step3. classifier training
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Problems
• Lacked of labeled data => the linguistic
model proposed in the paper seems general
enough => believe it directly with porter
stemming
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Further improvement
• Coreference resolution
– “the method solves the problem of overfitting.”
• Semantic based clustering
– Okapi BM25 ”The Probabilistic Relevance Framework: BM25 and Beyond”, Robertson et al., 2009
– Word vector “Efficient Estimation of Word Representations in Vector Space” T. Mikolov, ICLR, 2013.
– Document vector “Distributed Representations of Words and Phrases and their
Compositionality”,NIPS, 2013.
In my opinion: okapi > word vector > document vector
“Automatic Discovery of Technology Trends from Patent”, Y. Kim et al. 2009, ACMSAC
Thank you
中文領域術語提取
溫鈺瑋
範例
×目前 此車 铣 設備 由 绮 發 機械 提供
目前 此 車铣 設備 由 绮發機械 提供
×L 固定 板會 有 擺動 過大 疑慮
L固定板 會 有 擺動 過大 疑慮
方法
• Collocation
– 利用Mutual information (簡稱MI) 得知「字跟字」及
「詞跟字」搭配成詞的機率, 詞的內部結合強度
– 例: c = “自然語言處理”, a = “自然語言處”
b = “然語言處理”
方法
• Adaptation
目前 此車 铣 設備 由 绮 發 機械 提供
b e b e s b e s s s b e b e
目前此車铣設備由绮發機械提供
CKIP, stanford, jieba…
手動調整
目前 此 車铣 設備 由 绮發機械 提供
b e s b e b e s b m m e b e
CRF-based DELTA word segmentor
Input : L 固定 板會 有 擺動 過大 疑慮
Output : L固定板 會 有 擺動 過大 疑慮
Thank you
台達資料的知識萃取
莊文立
Information Extraction
• Named Entity Recognition (NER)
– 專有名詞的辨識和分類
• 公司、人物、產品、地點…等等
• Relation Extraction (RE)
– 從文字裡找出named entities之間的關係,例如
• 競爭
• 合作
• 客戶
• 上游廠商
– 通常用(subject,relation,object)三元組來表示
SALES拜訪記錄:
對於BV3418專案價格的了解,欣特協寶姚經理給出的回應是,周總
認為,台達的價格比西門子808低階機種NC控制器的價格高。
• NER
• 西門子/Organization
• 欣特協寶/Organization
• 台達/Organization
• 姚經理/Person
• 周總/Person
• RE
# Subject Relation Object
1 台達 COMPETE_WITH 西門子
2 台達 IS_VENDOR 欣特協寶
3 西門子 IS_VENDOR 欣特協寶
4 欣特協寶 SUBORDINATE 姚經理
5 欣特協寶 SUBORDINATE 周總
Named Entity Recognition
• 資料處理
– 中文需要良好的斷詞結果
– 人工標記
• 模型: Conditional Random Fields (CRF)
– 從每個字的特徵裡,學習專有名詞使用的規律
• 本身的詞、詞性
• 上下文的詞、詞性
• 文法剖析樹
• 搭配用法
• 稱謂、姓氏
• 專有名詞資料庫
Relation Extraction
• 還是需要人工標記 
• Deep Learning!
– 讓機器自己發現最適合的表達方法
• Recursive Neural Network
– 順著文法剖析樹往上”爬”
– 每個字用 矩陣 +向量 表示
• 向量表示本身詞義
• 矩陣表示上下文資訊
– 兩個named entity交會處輸出的向量,放入分類器
1
−3
4
⋮
5
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●●
●
●
●
●
Classifier
Future work
• Cross sentence
• Cross document
• Cross language
Thank you

More Related Content

Similar to Multimedia-text team report_2015-07-31

Building Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDSBuilding Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDS
Sujit Pal
 

Similar to Multimedia-text team report_2015-07-31 (20)

[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- Redux
 
Building Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDSBuilding Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDS
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance Systems
 
Örüntü tanıma - Pattern Recognition
Örüntü tanıma - Pattern RecognitionÖrüntü tanıma - Pattern Recognition
Örüntü tanıma - Pattern Recognition
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
 
MCA Daemon: Hybrid Throughput Analysis Beyond Basic Blocks
MCA Daemon: Hybrid Throughput Analysis Beyond Basic BlocksMCA Daemon: Hybrid Throughput Analysis Beyond Basic Blocks
MCA Daemon: Hybrid Throughput Analysis Beyond Basic Blocks
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
 
01-pengantar.pdf
01-pengantar.pdf01-pengantar.pdf
01-pengantar.pdf
 
Deep Learning Summit (DLS01-4)
Deep Learning Summit (DLS01-4)Deep Learning Summit (DLS01-4)
Deep Learning Summit (DLS01-4)
 
R, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science Competitions
 
Connecting the dots mbse process dec02 2015
Connecting the dots mbse process dec02 2015Connecting the dots mbse process dec02 2015
Connecting the dots mbse process dec02 2015
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
 
Dfma
DfmaDfma
Dfma
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Multimedia-text team report_2015-07-31