SlideShare a Scribd company logo
Simple and Effective Knowledge-Driven Query Expansion
for QA-Based Product Attribute Extraction
Keiji Shinzato1
1) Rakuten Institute of Technology, Rakuten Group, Inc.
2) Institute of Industrial Science, the University of Tokyo
Naoki Yoshinaga2 Yandi Xia1 Wei-Te Chen1
ACL 2022 short paper
1
⾃⼰紹介
• 新⾥ 圭司
• Lead Scientist, Rakuten Institute of Technology Americas
• 経歴
• 2004 – 2006: 北陸先端科学技術⼤学院⼤学 博⼠後期課程(⿃澤研)
• 2006 – 2011: 京都⼤学⼤学院情報学研究科 特定助教・研究員(⿊橋研)
• 2011 – 2018: 楽天グループ株式会社 楽天技術研究所
• 2018 – 現在: Rakuten USA, Rakuten Institute of Technology Americas
• 趣味・興味
• 料理
• クラフトビール
2
Crafted from sleek
spazzolato leather
(black). This is an
elegant carryall
that's perfect for
your essentials.
10"H x 13”W x 6"D.
Large Elegant Leather Bag - BLK
Goal: Organizing Enormous Products in E-commerce
• Business contribution
• Sophisticated product search and recommendation.
• Better understanding of customers on the marketplace.
Attribute Value
Color Black
Material Leather
Height 10 inch
Width 13 inch
Depth 6 inch
Attribute value extraction
The bag image is designed by pch.vector / Freepik
3
From NER-Based to QA-Based Attribute Value Extraction
• Existing Named Entity Recognition (NER)-based approach to
attribute value extraction suffers from data sparseness problem.
• Number of classes (attributes) in attribute value extraction can exceed one thousand.
• Question Answering (QA)-based approach to attribute value extraction
alleviates the data sparseness problem [Xu+, 2019; Wang+, 2020].
QA-based approach
Adidas Running Shoes - 8.5 / White[SEP]Brand
Context Query
Adidas Running Shoes - 8.5 / White
Answer
4
Adidas Running Shoes - 8.5 / White[SEP]Brand
Context Query
From NER-Based to QA-Based Attribute Value Extraction
• Existing Named Entity Recognition (NER)-based approach to
attribute value extraction suffers from data sparseness problem.
• Number of classes (attributes) in attribute value extraction can exceed 1K.
• Question Answering (QA)-based approach to attribute value extraction
alleviates the data sparseness problem [Xu+, 2019; Wang+, 2020].
BERT
QA model BERT-QA
[Wang+, 2020]
Adidas Running Shoes - 8.5 / White
Answer
5
Attribute Value Extraction is Still Difficult
1
10
100
1000
10000
100000
1
48
95
142
189
236
283
330
377
424
471
518
565
612
659
706
753
800
847
894
941
988
1035
1082
1129
1176
1223
1270
1317
1364
1411
1458
1505
1552
1599
1646
1693
1740
1787
1834
1881
1928
1975
2022
2069
2116
Number
of
instances
Attributes (2,162)
Number of Instances per Attribute on AliExpress Dataset
6
Attribute Value Extraction is Still Difficult
1
10
100
1000
10000
100000
1
48
95
142
189
236
283
330
377
424
471
518
565
612
659
706
753
800
847
894
941
988
1035
1082
1129
1176
1223
1270
1317
1364
1411
1458
1505
1552
1599
1646
1693
1740
1787
1834
1881
1928
1975
2022
2069
2116
Number
of
instances
Attributes (2,162)
Problems
• Rare attributes
• Number of instances is less than 10 in 85% of attributes
Number of Instances per Attribute on AliExpress Dataset
7
Attribute Value Extraction is Still Difficult
1
10
100
1000
10000
100000
1
48
95
142
189
236
283
330
377
424
471
518
565
612
659
706
753
800
847
894
941
988
1035
1082
1129
1176
1223
1270
1317
1364
1411
1458
1505
1552
1599
1646
1693
1740
1787
1834
1881
1928
1975
2022
2069
2116
Number
of
instances
Attributes (2,162)
Problems
• Rare attributes
• Number of instances is less than 10 in 85% of attributes
• Ambiguous attributes
• function 1, suitable, sort, etc.
Number of Instances per Attribute on AliExpress Dataset
8
Attribute Value Extraction is Still Difficult
1
10
100
1000
10000
100000
1
48
95
142
189
236
283
330
377
424
471
518
565
612
659
706
753
800
847
894
941
988
1035
1082
1129
1176
1223
1270
1317
1364
1411
1458
1505
1552
1599
1646
1693
1740
1787
1834
1881
1928
1975
2022
2069
2116
Number
of
instances
Attributes (2,162)
Problems
• Rare attributes
• Number of instances is less than 10 in 85% of attributes
• Ambiguous attributes
• function 1, suitable, sort, etc.
Number of Labels per Attribute on AliExpress Dataset
How can we obtain effective query representation
for rare and ambiguous attributes?
9
Knowledge-Driven Query Expansion for QA-based AE (1/3)
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
BERT-QA
Title[SEP]Attribute[SEP]Values
Context
Exploit attribute values in training data as run-time
knowledge to induce better query representation
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
Query
Zipp Battery 12V 14AH SLA…
Nominal capacity
Brand
10
Knowledge-Driven Query Expansion for QA-based AE (1/3)
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
BERT-QA
Title[SEP]Attribute[SEP]Values
Context
Exploit attribute values in training data as run-time
knowledge to induce better query representation
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
Query
Imperfect
Zipp Battery 12V 14AH SLA…
Nominal capacity
Brand
11
Knowledge-Driven Query Expansion for QA-based AE (2/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
Title[SEP]Attribute[SEP]Values
Context Query
12
Knowledge-Driven Query Expansion for QA-based AE (2/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
Title[SEP]Attribute[SEP]Values
Context Query
13
Knowledge-Driven Query Expansion for QA-based AE (2/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
Title[SEP]Attribute[SEP]Values
Context Query
Knowledge dropout
14
Knowledge-Driven Query Expansion for QA-based AE (3/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
Title[SEP]Attribute[SEP]Values
Context Query
Knowledge dropout
15
Knowledge-Driven Query Expansion for QA-based AE (3/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Seen]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
Title[SEP][Un/seen]Attribute[SEP]Values
Context Query
Knowledge dropout
16
Knowledge-Driven Query Expansion for QA-based AE (3/3)
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Seen]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Unseen] nominal capacity Deleted
Title[SEP][Un/seen]Attribute[SEP]Values
Context Query
Knowledge dropout
17
Experimental Settings
• Perform experiments using cleaned AE-pub dataset.
• We construct the cleaned AE-pub dataset from the public AliExpress dataset [Xu+, 2019] by
removing 736 near-duplicated tuples.
• Each entry consists of a tuple of <product title, attribute, value>.
• Split the cleaned AE-pub dataset into train/dev/test sets with the ratio of 7:1:2.
Train Dev. Test
# of tuples 76,823 10,975 21,950
# of tuples with NULL 15,097 2,201 4,259
# of unique attribute-value pairs 11,819 2,680 4,431
# of unique attributes 1,801 635 872
# of unique values 9,317 2,258 3,671
Statistics of the cleaned AE-pub dataset
18
Baselines
• Dictionary matching
• SUOpenTag [Xu+, 2019]
• AVEQA [Wang+, 2020]
• BERT-QA [Wang+, 2020]
SUOpenTag AVEQA
19
Baselines
• Dictionary matching
• SUOpenTag [Xu+, 2019]
• AVEQA [Wang+, 2020]
• BERT-QA [Wang+, 2020]
SUOpenTag AVEQA
BERT-QA
20
Performance on Cleaned AE-pub Dataset
Models
Macro Micro
P (%) R (%) F1 P (%) R (%) F1
Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00)
SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20)
AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09)
BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24)
BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14)
BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14)
BERT-QA +vals +mixing 46.67 (±0.33) 43.32 (±0.50) 44.93 (±0.39) 88.30 (±0.69) 82.46 (±0.30) 85.28 (±0.26)
BERT-QA +vals +drop +mixing 47.74 (±0.54) 44.82 (±0.75) 46.23 (±0.64) 87.84 (±0.39) 82.61 (±0.07) 85.14 (±0.19)
BERT-QA +vals +drop +mixing outperformed the baseline methods.
21
Performance on Cleaned AE-pub Dataset
Models
Macro Micro
P (%) R (%) F1 P (%) R (%) F1
Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00)
SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20)
AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09)
BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24)
BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14)
BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14)
BERT-QA +vals +mixing 46.67 (±0.33) 43.32 (±0.50) 44.93 (±0.39) 88.30 (±0.69) 82.46 (±0.30) 85.28 (±0.26)
BERT-QA +vals +drop +mixing 47.74 (±0.54) 44.82 (±0.75) 46.23 (±0.64) 87.84 (±0.39) 82.61 (±0.07) 85.14 (±0.19)
BERT-QA +vals learns to find strings that are similar to ones retrieved from the training data.
22
Performance on Cleaned AE-pub Dataset
Models
Macro Micro
P (%) R (%) F1 P (%) R (%) F1
Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00)
SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20)
AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09)
BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24)
BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14)
BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14)
BERT-QA +vals +mixing 46.67 (±0.33) 43.32 (±0.50) 44.93 (±0.39) 88.30 (±0.69) 82.46 (±0.30) 85.28 (±0.26)
BERT-QA +vals +drop +mixing 47.74 (±0.54) 44.82 (±0.75) 46.23 (±0.64) 87.84 (±0.39) 82.61 (±0.07) 85.14 (±0.19)
Knowledge dropout and knowledge token mixing improve both macro and micro F1 performance.
23
Impact on Rare and Ambiguous Attributes
• Categorize attributes that took the query expansion according to the number of
training examples and the appropriateness of the attribute names.
• Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness.
• Compute the cosine similarity between the attribute embeddings and averaged value embeddings.
• Regard the attribute name as ambiguous if the similarity is low.
• Divide the attributes into four according to median frequency and similarity to values.
24
Impact on Rare and Ambiguous Attributes
• Categorize attributes that took the query expansion according to the number of
training examples and the appropriateness of the attribute names.
• Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness.
• Compute the cosine similarity between the attribute embeddings and averaged value embeddings.
• Regard the attribute name as ambiguous if the similarity is low.
• Divide the attributes into four according to median frequency and similarity to values.
Model
Cosine
similarity
(med: 0.929)
Number of training examples (median: 8)
[1, 8) [8, ∞) All
BERT-QA +vals +drop +mixing
[0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86)
[0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29)
All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08)
Macro F1 Gains over BERT-QA Model
25
Impact on Rare and Ambiguous Attributes
• Categorize attributes that took the query expansion according to the number of
training examples and the appropriateness of the attribute names.
• Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness.
• Compute the cosine similarity between the attribute embeddings and averaged value embeddings.
• Regard the attribute name as ambiguous if the similarity is low.
• Divide the attributes into four according to median frequency and similarity to values.
Model
Cosine
similarity
(med: 0.929)
Number of training examples (median: 8)
[1, 8) [8, ∞) All
BERT-QA +vals +drop +mixing
[0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86)
[0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29)
All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08)
Macro F1 Gains over BERT-QA Model
Query expansion can generate more informative queries than ambiguous attributes alone.
26
Impact on Rare and Ambiguous Attributes
• Categorize attributes that took the query expansion according to the number of
training examples and the appropriateness of the attribute names.
• Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness.
• Compute the cosine similarity between the attribute embeddings and averaged value embeddings.
• Regard the attribute name as ambiguous if the similarity is low.
• Divide the attributes into four according to median frequency and similarity to values.
Model
Cosine
similarity
(med: 0.929)
Number of training examples (median: 8)
[1, 8) [8, ∞) All
BERT-QA +vals +drop +mixing
[0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86)
[0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29)
All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08)
Macro F1 Gains over BERT-QA Model
Query expansion is effective for rare attributes more than frequent attributes.
27
Impact on Rare and Ambiguous Attributes
• Categorize attributes that took the query expansion according to the number of
training examples and the appropriateness of the attribute names.
• Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness.
• Compute the cosine similarity between the attribute embeddings and averaged value embeddings.
• Regard the attribute name as ambiguous if the similarity is low.
• Divide the attributes into four according to median frequency and similarity to values.
Model
Cosine
similarity
(med: 0.929)
Number of training examples (median: 8)
[1, 8) [8, ∞) All
BERT-QA +vals +drop +mixing
[0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86)
[0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29)
All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08)
Macro F1 Gains over BERT-QA Model
Model could use more parameters to solve the task itself by taking the internal knowledge
induced from the training data as runtime input.
28
Example Outputs
Context
Query
Gold
Prediction
Attribute Values BERT-QA
BERT-QA w/ query
expansion
aeronova bicycle carbon
mtb handlebar
mountain bikes flat
handlebar mtb
integrated handlebars
with stem bike
accessories
function 1
skiing goggles,
carbon road
bicycle
handlebar,
cycling glasses,
bicycle mask,
gas mask, …
carbon mtb handlebar
bicycle carbon mtb
handlebar L
carbon mtb handlebar
J
lfp 3.2v 100ah lifepo4
prismatic cell deep cycle
diy lithium ion battery
72v 60v 48v 24v 100ah
200ah ev solar storage
battery
nominal
capacity
14ah, 40ah,
17.4ah
100ah 3.2v 100ah L 100ah J
camel outdoor softshell
men’s hiking jacket
wind- proof thermal
jacket for camping ski
thick warm coats
suitable
men, camping,
kids,
saltwater/fresh
water, women,
4-15y, mtb
cycling shoes, …
men men J camping L
29
Conclusions
• Knowledge-driven query expansion for QA-based product attribute
extraction.
• We construct the knowledge from training data, and use it to induce better query
representation.
• Two tricks to mimic the imperfection of the knowledge.
• Knowledge dropout and knowledge token mixing.
• Our query expansion is effective, especially for rare and ambiguous attributes.
30
論⽂で触れていない話
• 評価実験と実際の利⽤シーンの乖離
• 評価実験︓先⾏研究も含め、正解属性が与えられている
• 実際の利⽤シーン︓正解属性はわからない
• QA-based modelの実⽤性
• 属性を変えて複数回モデルを⾛らせる必要がある
• どの属性について値を抽出したいのか事前に知っておく必要がある
• Eコマースサイトによってはマスターデータを参照すれば絞り込み可能
31
属性値抽出の今後
• 属性値抽出 à NERとなりがち
• NERベースの⼿法の問題
• 抽出された値の正規化が必要(D&G à Dolce & Gabbana)
• 属性値をアノテーションする場合、正解を定義するのが難しい
• 既存の商品データから学習データを⾃動⽣成すると誤ったアノテーションが含まれる
• 商品タイトル: ジャーナルスタンダード ジーンズ スタンダードフィット
• 属性︓<ブランド, ジャーナルスタンダード>, <パンツ脚幅, スタンダード>
• 値の種類がめったに増えない属性もある(e.g., ⾊、⽣産国)
• NER以外のアプローチ
• 分類として解く
• Extreme Multi-Label Classification with Label Masking for Product Attribute Value Extraction
• ⽣成として解く
• 研究中
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction

More Related Content

What's hot

Data-Centric AIの紹介
Data-Centric AIの紹介Data-Centric AIの紹介
Data-Centric AIの紹介
Kazuyuki Miyazawa
 
MLOpsはバズワード
MLOpsはバズワードMLOpsはバズワード
MLOpsはバズワード
Tetsutaro Watanabe
 
楽天のインフラ事情 2022
楽天のインフラ事情 2022楽天のインフラ事情 2022
楽天のインフラ事情 2022
Rakuten Group, Inc.
 
先駆者に学ぶ MLOpsの実際
先駆者に学ぶ MLOpsの実際先駆者に学ぶ MLOpsの実際
先駆者に学ぶ MLOpsの実際
Tetsutaro Watanabe
 
20180729 Preferred Networksの機械学習クラスタを支える技術
20180729 Preferred Networksの機械学習クラスタを支える技術20180729 Preferred Networksの機械学習クラスタを支える技術
20180729 Preferred Networksの機械学習クラスタを支える技術
Preferred Networks
 
Machine learning CI/CD with OSS
Machine learning CI/CD with OSSMachine learning CI/CD with OSS
Machine learning CI/CD with OSS
yusuke shibui
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
Rakuten Group, Inc.
 
大規模データに基づく自然言語処理
大規模データに基づく自然言語処理大規模データに基づく自然言語処理
大規模データに基づく自然言語処理
JunSuzuki21
 
画像キャプションの自動生成
画像キャプションの自動生成画像キャプションの自動生成
画像キャプションの自動生成
Yoshitaka Ushiku
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdf
Rakuten Group, Inc.
 
DXとかDevOpsとかのなんかいい感じのやつ 富士通TechLive
DXとかDevOpsとかのなんかいい感じのやつ 富士通TechLiveDXとかDevOpsとかのなんかいい感じのやつ 富士通TechLive
DXとかDevOpsとかのなんかいい感じのやつ 富士通TechLive
Tokoroten Nakayama
 
【DL輪読会】Scaling Laws for Neural Language Models
【DL輪読会】Scaling Laws for Neural Language Models【DL輪読会】Scaling Laws for Neural Language Models
【DL輪読会】Scaling Laws for Neural Language Models
Deep Learning JP
 
pgvectorを使ってChatGPTとPostgreSQLを連携してみよう!(PostgreSQL Conference Japan 2023 発表資料)
pgvectorを使ってChatGPTとPostgreSQLを連携してみよう!(PostgreSQL Conference Japan 2023 発表資料)pgvectorを使ってChatGPTとPostgreSQLを連携してみよう!(PostgreSQL Conference Japan 2023 発表資料)
pgvectorを使ってChatGPTとPostgreSQLを連携してみよう!(PostgreSQL Conference Japan 2023 発表資料)
NTT DATA Technology & Innovation
 
Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査
Kazuyuki Miyazawa
 
MLOps Yearning ~ 実運用システムを構築する前にデータサイエンティストが考えておきたいこと
MLOps Yearning ~ 実運用システムを構築する前にデータサイエンティストが考えておきたいことMLOps Yearning ~ 実運用システムを構築する前にデータサイエンティストが考えておきたいこと
MLOps Yearning ~ 実運用システムを構築する前にデータサイエンティストが考えておきたいこと
Rakuten Group, Inc.
 
楽天ネットワークエンジニアたちが目指す、次世代データセンターとは
楽天ネットワークエンジニアたちが目指す、次世代データセンターとは楽天ネットワークエンジニアたちが目指す、次世代データセンターとは
楽天ネットワークエンジニアたちが目指す、次世代データセンターとは
Rakuten Group, Inc.
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割
Rakuten Group, Inc.
 
Azure OpenAI ServiceのChatGPT API と OpenAIのChatGPT APIの比較
Azure OpenAI ServiceのChatGPT API と OpenAIのChatGPT APIの比較Azure OpenAI ServiceのChatGPT API と OpenAIのChatGPT APIの比較
Azure OpenAI ServiceのChatGPT API と OpenAIのChatGPT APIの比較
KazuoSuzuki6
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり
Rakuten Group, Inc.
 
MLOps入門
MLOps入門MLOps入門
MLOps入門
Hiro Mura
 

What's hot (20)

Data-Centric AIの紹介
Data-Centric AIの紹介Data-Centric AIの紹介
Data-Centric AIの紹介
 
MLOpsはバズワード
MLOpsはバズワードMLOpsはバズワード
MLOpsはバズワード
 
楽天のインフラ事情 2022
楽天のインフラ事情 2022楽天のインフラ事情 2022
楽天のインフラ事情 2022
 
先駆者に学ぶ MLOpsの実際
先駆者に学ぶ MLOpsの実際先駆者に学ぶ MLOpsの実際
先駆者に学ぶ MLOpsの実際
 
20180729 Preferred Networksの機械学習クラスタを支える技術
20180729 Preferred Networksの機械学習クラスタを支える技術20180729 Preferred Networksの機械学習クラスタを支える技術
20180729 Preferred Networksの機械学習クラスタを支える技術
 
Machine learning CI/CD with OSS
Machine learning CI/CD with OSSMachine learning CI/CD with OSS
Machine learning CI/CD with OSS
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
大規模データに基づく自然言語処理
大規模データに基づく自然言語処理大規模データに基づく自然言語処理
大規模データに基づく自然言語処理
 
画像キャプションの自動生成
画像キャプションの自動生成画像キャプションの自動生成
画像キャプションの自動生成
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdf
 
DXとかDevOpsとかのなんかいい感じのやつ 富士通TechLive
DXとかDevOpsとかのなんかいい感じのやつ 富士通TechLiveDXとかDevOpsとかのなんかいい感じのやつ 富士通TechLive
DXとかDevOpsとかのなんかいい感じのやつ 富士通TechLive
 
【DL輪読会】Scaling Laws for Neural Language Models
【DL輪読会】Scaling Laws for Neural Language Models【DL輪読会】Scaling Laws for Neural Language Models
【DL輪読会】Scaling Laws for Neural Language Models
 
pgvectorを使ってChatGPTとPostgreSQLを連携してみよう!(PostgreSQL Conference Japan 2023 発表資料)
pgvectorを使ってChatGPTとPostgreSQLを連携してみよう!(PostgreSQL Conference Japan 2023 発表資料)pgvectorを使ってChatGPTとPostgreSQLを連携してみよう!(PostgreSQL Conference Japan 2023 発表資料)
pgvectorを使ってChatGPTとPostgreSQLを連携してみよう!(PostgreSQL Conference Japan 2023 発表資料)
 
Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査
 
MLOps Yearning ~ 実運用システムを構築する前にデータサイエンティストが考えておきたいこと
MLOps Yearning ~ 実運用システムを構築する前にデータサイエンティストが考えておきたいことMLOps Yearning ~ 実運用システムを構築する前にデータサイエンティストが考えておきたいこと
MLOps Yearning ~ 実運用システムを構築する前にデータサイエンティストが考えておきたいこと
 
楽天ネットワークエンジニアたちが目指す、次世代データセンターとは
楽天ネットワークエンジニアたちが目指す、次世代データセンターとは楽天ネットワークエンジニアたちが目指す、次世代データセンターとは
楽天ネットワークエンジニアたちが目指す、次世代データセンターとは
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割
 
Azure OpenAI ServiceのChatGPT API と OpenAIのChatGPT APIの比較
Azure OpenAI ServiceのChatGPT API と OpenAIのChatGPT APIの比較Azure OpenAI ServiceのChatGPT API と OpenAIのChatGPT APIの比較
Azure OpenAI ServiceのChatGPT API と OpenAIのChatGPT APIの比較
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり
 
MLOps入門
MLOps入門MLOps入門
MLOps入門
 

Similar to Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction

Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with Transformers
Databricks
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
Aly Abdelkareem
 
Intel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification ExperienceIntel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification ExperienceDVClub
 
Static analysis of java enterprise applications
Static analysis of java enterprise applicationsStatic analysis of java enterprise applications
Static analysis of java enterprise applications
Anastasiοs Antoniadis
 
Spark Meetup July 2015
Spark Meetup July 2015Spark Meetup July 2015
Spark Meetup July 2015
Debasish Das
 
Perf onjs final
Perf onjs finalPerf onjs final
Perf onjs finalqi yang
 
Hybrid system architecture overview
Hybrid system architecture overviewHybrid system architecture overview
Hybrid system architecture overview
Jesse Wang
 
SQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query set
Kognitio
 
EMF-IncQuery 0.7 Presentation for Itemis
EMF-IncQuery 0.7 Presentation for ItemisEMF-IncQuery 0.7 Presentation for Itemis
EMF-IncQuery 0.7 Presentation for Itemis
Istvan Rath
 
A framework and approaches to develop an in-house CAT with freeware and open ...
A framework and approaches to develop an in-house CAT with freeware and open ...A framework and approaches to develop an in-house CAT with freeware and open ...
A framework and approaches to develop an in-house CAT with freeware and open ...
Tetsuo Kimura
 
Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​
Somnath Banerjee
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Rodney Joyce
 
AKUDA Labs: Pulsar
AKUDA Labs: PulsarAKUDA Labs: Pulsar
AKUDA Labs: PulsarAKUDA Labs
 
Counterfeit Detection Services
Counterfeit Detection ServicesCounterfeit Detection Services
Counterfeit Detection Services
USBid Inc.
 
Leaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldLeaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real World
ArmonDadgar
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
ramya marichamy
 
O'Reilly Software Architecture Conf: Cloud Economics
O'Reilly Software Architecture Conf: Cloud EconomicsO'Reilly Software Architecture Conf: Cloud Economics
O'Reilly Software Architecture Conf: Cloud Economics
Chris Bailey
 
Searching Encrypted Cloud Data: Academia and Industry Done Right
Searching Encrypted Cloud Data: Academia and Industry Done RightSearching Encrypted Cloud Data: Academia and Industry Done Right
Searching Encrypted Cloud Data: Academia and Industry Done Right
Skyhigh Networks
 
Insights and Lessons Learned Verifying the QoS Engine of a Network Processor
Insights and Lessons Learned Verifying the QoS Engine of a Network ProcessorInsights and Lessons Learned Verifying the QoS Engine of a Network Processor
Insights and Lessons Learned Verifying the QoS Engine of a Network ProcessorDVClub
 
Boston 2009 q1_kappler_chris
Boston 2009 q1_kappler_chrisBoston 2009 q1_kappler_chris
Boston 2009 q1_kappler_chrisObsidian Software
 

Similar to Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction (20)

Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with Transformers
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Intel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification ExperienceIntel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification Experience
 
Static analysis of java enterprise applications
Static analysis of java enterprise applicationsStatic analysis of java enterprise applications
Static analysis of java enterprise applications
 
Spark Meetup July 2015
Spark Meetup July 2015Spark Meetup July 2015
Spark Meetup July 2015
 
Perf onjs final
Perf onjs finalPerf onjs final
Perf onjs final
 
Hybrid system architecture overview
Hybrid system architecture overviewHybrid system architecture overview
Hybrid system architecture overview
 
SQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query set
 
EMF-IncQuery 0.7 Presentation for Itemis
EMF-IncQuery 0.7 Presentation for ItemisEMF-IncQuery 0.7 Presentation for Itemis
EMF-IncQuery 0.7 Presentation for Itemis
 
A framework and approaches to develop an in-house CAT with freeware and open ...
A framework and approaches to develop an in-house CAT with freeware and open ...A framework and approaches to develop an in-house CAT with freeware and open ...
A framework and approaches to develop an in-house CAT with freeware and open ...
 
Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
 
AKUDA Labs: Pulsar
AKUDA Labs: PulsarAKUDA Labs: Pulsar
AKUDA Labs: Pulsar
 
Counterfeit Detection Services
Counterfeit Detection ServicesCounterfeit Detection Services
Counterfeit Detection Services
 
Leaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldLeaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real World
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
 
O'Reilly Software Architecture Conf: Cloud Economics
O'Reilly Software Architecture Conf: Cloud EconomicsO'Reilly Software Architecture Conf: Cloud Economics
O'Reilly Software Architecture Conf: Cloud Economics
 
Searching Encrypted Cloud Data: Academia and Industry Done Right
Searching Encrypted Cloud Data: Academia and Industry Done RightSearching Encrypted Cloud Data: Academia and Industry Done Right
Searching Encrypted Cloud Data: Academia and Industry Done Right
 
Insights and Lessons Learned Verifying the QoS Engine of a Network Processor
Insights and Lessons Learned Verifying the QoS Engine of a Network ProcessorInsights and Lessons Learned Verifying the QoS Engine of a Network Processor
Insights and Lessons Learned Verifying the QoS Engine of a Network Processor
 
Boston 2009 q1_kappler_chris
Boston 2009 q1_kappler_chrisBoston 2009 q1_kappler_chris
Boston 2009 q1_kappler_chris
 

More from Rakuten Group, Inc.

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
Rakuten Group, Inc.
 
What Makes Software Green?
What Makes Software Green?What Makes Software Green?
What Makes Software Green?
Rakuten Group, Inc.
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組み
Rakuten Group, Inc.
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdf
Rakuten Group, Inc.
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdf
Rakuten Group, Inc.
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdf
Rakuten Group, Inc.
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdf
Rakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
Rakuten Group, Inc.
 
OWASPTop10_Introduction
OWASPTop10_IntroductionOWASPTop10_Introduction
OWASPTop10_Introduction
Rakuten Group, Inc.
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technology
Rakuten Group, Inc.
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー
Rakuten Group, Inc.
 
モニタリングプラットフォーム開発の裏側
モニタリングプラットフォーム開発の裏側モニタリングプラットフォーム開発の裏側
モニタリングプラットフォーム開発の裏側
Rakuten Group, Inc.
 
Rakuten Platform
Rakuten PlatformRakuten Platform
Rakuten Platform
Rakuten Group, Inc.
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
Rakuten Group, Inc.
 
Unclouding Container Challenges
 Unclouding  Container Challenges Unclouding  Container Challenges
Unclouding Container Challenges
Rakuten Group, Inc.
 
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
Rakuten Group, Inc.
 
アジャイル開発とメトリクス
アジャイル開発とメトリクスアジャイル開発とメトリクス
アジャイル開発とメトリクス
Rakuten Group, Inc.
 
AR/SLAM and IoT
AR/SLAM and IoTAR/SLAM and IoT
AR/SLAM and IoT
Rakuten Group, Inc.
 
Introduction of Rakuten Commerce QA Night#2
Introduction of Rakuten Commerce QA Night#2Introduction of Rakuten Commerce QA Night#2
Introduction of Rakuten Commerce QA Night#2
Rakuten Group, Inc.
 
Improve test automation operation
Improve test automation operationImprove test automation operation
Improve test automation operation
Rakuten Group, Inc.
 

More from Rakuten Group, Inc. (20)

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
 
What Makes Software Green?
What Makes Software Green?What Makes Software Green?
What Makes Software Green?
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組み
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdf
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdf
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdf
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdf
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
OWASPTop10_Introduction
OWASPTop10_IntroductionOWASPTop10_Introduction
OWASPTop10_Introduction
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technology
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー
 
モニタリングプラットフォーム開発の裏側
モニタリングプラットフォーム開発の裏側モニタリングプラットフォーム開発の裏側
モニタリングプラットフォーム開発の裏側
 
Rakuten Platform
Rakuten PlatformRakuten Platform
Rakuten Platform
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
Unclouding Container Challenges
 Unclouding  Container Challenges Unclouding  Container Challenges
Unclouding Container Challenges
 
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
 
アジャイル開発とメトリクス
アジャイル開発とメトリクスアジャイル開発とメトリクス
アジャイル開発とメトリクス
 
AR/SLAM and IoT
AR/SLAM and IoTAR/SLAM and IoT
AR/SLAM and IoT
 
Introduction of Rakuten Commerce QA Night#2
Introduction of Rakuten Commerce QA Night#2Introduction of Rakuten Commerce QA Night#2
Introduction of Rakuten Commerce QA Night#2
 
Improve test automation operation
Improve test automation operationImprove test automation operation
Improve test automation operation
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 

Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction

  • 1. Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction Keiji Shinzato1 1) Rakuten Institute of Technology, Rakuten Group, Inc. 2) Institute of Industrial Science, the University of Tokyo Naoki Yoshinaga2 Yandi Xia1 Wei-Te Chen1 ACL 2022 short paper
  • 2. 1 ⾃⼰紹介 • 新⾥ 圭司 • Lead Scientist, Rakuten Institute of Technology Americas • 経歴 • 2004 – 2006: 北陸先端科学技術⼤学院⼤学 博⼠後期課程(⿃澤研) • 2006 – 2011: 京都⼤学⼤学院情報学研究科 特定助教・研究員(⿊橋研) • 2011 – 2018: 楽天グループ株式会社 楽天技術研究所 • 2018 – 現在: Rakuten USA, Rakuten Institute of Technology Americas • 趣味・興味 • 料理 • クラフトビール
  • 3. 2 Crafted from sleek spazzolato leather (black). This is an elegant carryall that's perfect for your essentials. 10"H x 13”W x 6"D. Large Elegant Leather Bag - BLK Goal: Organizing Enormous Products in E-commerce • Business contribution • Sophisticated product search and recommendation. • Better understanding of customers on the marketplace. Attribute Value Color Black Material Leather Height 10 inch Width 13 inch Depth 6 inch Attribute value extraction The bag image is designed by pch.vector / Freepik
  • 4. 3 From NER-Based to QA-Based Attribute Value Extraction • Existing Named Entity Recognition (NER)-based approach to attribute value extraction suffers from data sparseness problem. • Number of classes (attributes) in attribute value extraction can exceed one thousand. • Question Answering (QA)-based approach to attribute value extraction alleviates the data sparseness problem [Xu+, 2019; Wang+, 2020]. QA-based approach Adidas Running Shoes - 8.5 / White[SEP]Brand Context Query Adidas Running Shoes - 8.5 / White Answer
  • 5. 4 Adidas Running Shoes - 8.5 / White[SEP]Brand Context Query From NER-Based to QA-Based Attribute Value Extraction • Existing Named Entity Recognition (NER)-based approach to attribute value extraction suffers from data sparseness problem. • Number of classes (attributes) in attribute value extraction can exceed 1K. • Question Answering (QA)-based approach to attribute value extraction alleviates the data sparseness problem [Xu+, 2019; Wang+, 2020]. BERT QA model BERT-QA [Wang+, 2020] Adidas Running Shoes - 8.5 / White Answer
  • 6. 5 Attribute Value Extraction is Still Difficult 1 10 100 1000 10000 100000 1 48 95 142 189 236 283 330 377 424 471 518 565 612 659 706 753 800 847 894 941 988 1035 1082 1129 1176 1223 1270 1317 1364 1411 1458 1505 1552 1599 1646 1693 1740 1787 1834 1881 1928 1975 2022 2069 2116 Number of instances Attributes (2,162) Number of Instances per Attribute on AliExpress Dataset
  • 7. 6 Attribute Value Extraction is Still Difficult 1 10 100 1000 10000 100000 1 48 95 142 189 236 283 330 377 424 471 518 565 612 659 706 753 800 847 894 941 988 1035 1082 1129 1176 1223 1270 1317 1364 1411 1458 1505 1552 1599 1646 1693 1740 1787 1834 1881 1928 1975 2022 2069 2116 Number of instances Attributes (2,162) Problems • Rare attributes • Number of instances is less than 10 in 85% of attributes Number of Instances per Attribute on AliExpress Dataset
  • 8. 7 Attribute Value Extraction is Still Difficult 1 10 100 1000 10000 100000 1 48 95 142 189 236 283 330 377 424 471 518 565 612 659 706 753 800 847 894 941 988 1035 1082 1129 1176 1223 1270 1317 1364 1411 1458 1505 1552 1599 1646 1693 1740 1787 1834 1881 1928 1975 2022 2069 2116 Number of instances Attributes (2,162) Problems • Rare attributes • Number of instances is less than 10 in 85% of attributes • Ambiguous attributes • function 1, suitable, sort, etc. Number of Instances per Attribute on AliExpress Dataset
  • 9. 8 Attribute Value Extraction is Still Difficult 1 10 100 1000 10000 100000 1 48 95 142 189 236 283 330 377 424 471 518 565 612 659 706 753 800 847 894 941 988 1035 1082 1129 1176 1223 1270 1317 1364 1411 1458 1505 1552 1599 1646 1693 1740 1787 1834 1881 1928 1975 2022 2069 2116 Number of instances Attributes (2,162) Problems • Rare attributes • Number of instances is less than 10 in 85% of attributes • Ambiguous attributes • function 1, suitable, sort, etc. Number of Labels per Attribute on AliExpress Dataset How can we obtain effective query representation for rare and ambiguous attributes?
  • 10. 9 Knowledge-Driven Query Expansion for QA-based AE (1/3) Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) BERT-QA Title[SEP]Attribute[SEP]Values Context Exploit attribute values in training data as run-time knowledge to induce better query representation CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah Query Zipp Battery 12V 14AH SLA… Nominal capacity Brand
  • 11. 10 Knowledge-Driven Query Expansion for QA-based AE (1/3) Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) BERT-QA Title[SEP]Attribute[SEP]Values Context Exploit attribute values in training data as run-time knowledge to induce better query representation CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah Query Imperfect Zipp Battery 12V 14AH SLA… Nominal capacity Brand
  • 12. 11 Knowledge-Driven Query Expansion for QA-based AE (2/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah BERT-QA Title[SEP]Attribute[SEP]Values Context Query
  • 13. 12 Knowledge-Driven Query Expansion for QA-based AE (2/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah BERT-QA Title[SEP]Attribute[SEP]Values Context Query
  • 14. 13 Knowledge-Driven Query Expansion for QA-based AE (2/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah BERT-QA Title[SEP]Attribute[SEP]Values Context Query Knowledge dropout
  • 15. 14 Knowledge-Driven Query Expansion for QA-based AE (3/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah BERT-QA Title[SEP]Attribute[SEP]Values Context Query Knowledge dropout
  • 16. 15 Knowledge-Driven Query Expansion for QA-based AE (3/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Seen]nominal capacity[SEP]14ah[SEP]40ah BERT-QA Title[SEP][Un/seen]Attribute[SEP]Values Context Query Knowledge dropout
  • 17. 16 Knowledge-Driven Query Expansion for QA-based AE (3/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Seen]nominal capacity[SEP]14ah[SEP]40ah BERT-QA CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Unseen] nominal capacity Deleted Title[SEP][Un/seen]Attribute[SEP]Values Context Query Knowledge dropout
  • 18. 17 Experimental Settings • Perform experiments using cleaned AE-pub dataset. • We construct the cleaned AE-pub dataset from the public AliExpress dataset [Xu+, 2019] by removing 736 near-duplicated tuples. • Each entry consists of a tuple of <product title, attribute, value>. • Split the cleaned AE-pub dataset into train/dev/test sets with the ratio of 7:1:2. Train Dev. Test # of tuples 76,823 10,975 21,950 # of tuples with NULL 15,097 2,201 4,259 # of unique attribute-value pairs 11,819 2,680 4,431 # of unique attributes 1,801 635 872 # of unique values 9,317 2,258 3,671 Statistics of the cleaned AE-pub dataset
  • 19. 18 Baselines • Dictionary matching • SUOpenTag [Xu+, 2019] • AVEQA [Wang+, 2020] • BERT-QA [Wang+, 2020] SUOpenTag AVEQA
  • 20. 19 Baselines • Dictionary matching • SUOpenTag [Xu+, 2019] • AVEQA [Wang+, 2020] • BERT-QA [Wang+, 2020] SUOpenTag AVEQA BERT-QA
  • 21. 20 Performance on Cleaned AE-pub Dataset Models Macro Micro P (%) R (%) F1 P (%) R (%) F1 Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00) SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20) AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09) BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24) BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14) BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14) BERT-QA +vals +mixing 46.67 (±0.33) 43.32 (±0.50) 44.93 (±0.39) 88.30 (±0.69) 82.46 (±0.30) 85.28 (±0.26) BERT-QA +vals +drop +mixing 47.74 (±0.54) 44.82 (±0.75) 46.23 (±0.64) 87.84 (±0.39) 82.61 (±0.07) 85.14 (±0.19) BERT-QA +vals +drop +mixing outperformed the baseline methods.
  • 22. 21 Performance on Cleaned AE-pub Dataset Models Macro Micro P (%) R (%) F1 P (%) R (%) F1 Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00) SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20) AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09) BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24) BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14) BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14) BERT-QA +vals +mixing 46.67 (±0.33) 43.32 (±0.50) 44.93 (±0.39) 88.30 (±0.69) 82.46 (±0.30) 85.28 (±0.26) BERT-QA +vals +drop +mixing 47.74 (±0.54) 44.82 (±0.75) 46.23 (±0.64) 87.84 (±0.39) 82.61 (±0.07) 85.14 (±0.19) BERT-QA +vals learns to find strings that are similar to ones retrieved from the training data.
  • 23. 22 Performance on Cleaned AE-pub Dataset Models Macro Micro P (%) R (%) F1 P (%) R (%) F1 Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00) SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20) AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09) BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24) BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14) BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14) BERT-QA +vals +mixing 46.67 (±0.33) 43.32 (±0.50) 44.93 (±0.39) 88.30 (±0.69) 82.46 (±0.30) 85.28 (±0.26) BERT-QA +vals +drop +mixing 47.74 (±0.54) 44.82 (±0.75) 46.23 (±0.64) 87.84 (±0.39) 82.61 (±0.07) 85.14 (±0.19) Knowledge dropout and knowledge token mixing improve both macro and micro F1 performance.
  • 24. 23 Impact on Rare and Ambiguous Attributes • Categorize attributes that took the query expansion according to the number of training examples and the appropriateness of the attribute names. • Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness. • Compute the cosine similarity between the attribute embeddings and averaged value embeddings. • Regard the attribute name as ambiguous if the similarity is low. • Divide the attributes into four according to median frequency and similarity to values.
  • 25. 24 Impact on Rare and Ambiguous Attributes • Categorize attributes that took the query expansion according to the number of training examples and the appropriateness of the attribute names. • Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness. • Compute the cosine similarity between the attribute embeddings and averaged value embeddings. • Regard the attribute name as ambiguous if the similarity is low. • Divide the attributes into four according to median frequency and similarity to values. Model Cosine similarity (med: 0.929) Number of training examples (median: 8) [1, 8) [8, ∞) All BERT-QA +vals +drop +mixing [0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86) [0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29) All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08) Macro F1 Gains over BERT-QA Model
  • 26. 25 Impact on Rare and Ambiguous Attributes • Categorize attributes that took the query expansion according to the number of training examples and the appropriateness of the attribute names. • Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness. • Compute the cosine similarity between the attribute embeddings and averaged value embeddings. • Regard the attribute name as ambiguous if the similarity is low. • Divide the attributes into four according to median frequency and similarity to values. Model Cosine similarity (med: 0.929) Number of training examples (median: 8) [1, 8) [8, ∞) All BERT-QA +vals +drop +mixing [0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86) [0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29) All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08) Macro F1 Gains over BERT-QA Model Query expansion can generate more informative queries than ambiguous attributes alone.
  • 27. 26 Impact on Rare and Ambiguous Attributes • Categorize attributes that took the query expansion according to the number of training examples and the appropriateness of the attribute names. • Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness. • Compute the cosine similarity between the attribute embeddings and averaged value embeddings. • Regard the attribute name as ambiguous if the similarity is low. • Divide the attributes into four according to median frequency and similarity to values. Model Cosine similarity (med: 0.929) Number of training examples (median: 8) [1, 8) [8, ∞) All BERT-QA +vals +drop +mixing [0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86) [0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29) All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08) Macro F1 Gains over BERT-QA Model Query expansion is effective for rare attributes more than frequent attributes.
  • 28. 27 Impact on Rare and Ambiguous Attributes • Categorize attributes that took the query expansion according to the number of training examples and the appropriateness of the attribute names. • Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness. • Compute the cosine similarity between the attribute embeddings and averaged value embeddings. • Regard the attribute name as ambiguous if the similarity is low. • Divide the attributes into four according to median frequency and similarity to values. Model Cosine similarity (med: 0.929) Number of training examples (median: 8) [1, 8) [8, ∞) All BERT-QA +vals +drop +mixing [0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86) [0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29) All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08) Macro F1 Gains over BERT-QA Model Model could use more parameters to solve the task itself by taking the internal knowledge induced from the training data as runtime input.
  • 29. 28 Example Outputs Context Query Gold Prediction Attribute Values BERT-QA BERT-QA w/ query expansion aeronova bicycle carbon mtb handlebar mountain bikes flat handlebar mtb integrated handlebars with stem bike accessories function 1 skiing goggles, carbon road bicycle handlebar, cycling glasses, bicycle mask, gas mask, … carbon mtb handlebar bicycle carbon mtb handlebar L carbon mtb handlebar J lfp 3.2v 100ah lifepo4 prismatic cell deep cycle diy lithium ion battery 72v 60v 48v 24v 100ah 200ah ev solar storage battery nominal capacity 14ah, 40ah, 17.4ah 100ah 3.2v 100ah L 100ah J camel outdoor softshell men’s hiking jacket wind- proof thermal jacket for camping ski thick warm coats suitable men, camping, kids, saltwater/fresh water, women, 4-15y, mtb cycling shoes, … men men J camping L
  • 30. 29 Conclusions • Knowledge-driven query expansion for QA-based product attribute extraction. • We construct the knowledge from training data, and use it to induce better query representation. • Two tricks to mimic the imperfection of the knowledge. • Knowledge dropout and knowledge token mixing. • Our query expansion is effective, especially for rare and ambiguous attributes.
  • 31. 30 論⽂で触れていない話 • 評価実験と実際の利⽤シーンの乖離 • 評価実験︓先⾏研究も含め、正解属性が与えられている • 実際の利⽤シーン︓正解属性はわからない • QA-based modelの実⽤性 • 属性を変えて複数回モデルを⾛らせる必要がある • どの属性について値を抽出したいのか事前に知っておく必要がある • Eコマースサイトによってはマスターデータを参照すれば絞り込み可能
  • 32. 31 属性値抽出の今後 • 属性値抽出 à NERとなりがち • NERベースの⼿法の問題 • 抽出された値の正規化が必要(D&G à Dolce & Gabbana) • 属性値をアノテーションする場合、正解を定義するのが難しい • 既存の商品データから学習データを⾃動⽣成すると誤ったアノテーションが含まれる • 商品タイトル: ジャーナルスタンダード ジーンズ スタンダードフィット • 属性︓<ブランド, ジャーナルスタンダード>, <パンツ脚幅, スタンダード> • 値の種類がめったに増えない属性もある(e.g., ⾊、⽣産国) • NER以外のアプローチ • 分類として解く • Extreme Multi-Label Classification with Label Masking for Product Attribute Value Extraction • ⽣成として解く • 研究中