Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction

Simple and Effective Knowledge-Driven Query Expansion
for QA-Based Product Attribute Extraction
Keiji Shinzato1
1) Rakuten Institute of Technology, Rakuten Group, Inc.
2) Institute of Industrial Science, the University of Tokyo
Naoki Yoshinaga2 Yandi Xia1 Wei-Te Chen1
ACL 2022 short paper

1
⾃⼰紹介
• 新⾥圭司
• Lead Scientist, Rakuten Institute of Technology Americas
• 経歴
• 2004 – 2006: 北陸先端科学技術⼤学院⼤学博⼠後期課程（⿃澤研）
• 2006 – 2011: 京都⼤学⼤学院情報学研究科特定助教・研究員（⿊橋研）
• 2011 – 2018: 楽天グループ株式会社楽天技術研究所
• 2018 – 現在: Rakuten USA, Rakuten Institute of Technology Americas
• 趣味・興味
• 料理
• クラフトビール

2
Crafted from sleek
spazzolato leather
(black). This is an
elegant carryall
that's perfect for
your essentials.
10"H x 13”W x 6"D.
Large Elegant Leather Bag - BLK
Goal: Organizing Enormous Products in E-commerce
• Business contribution
• Sophisticated product search and recommendation.
• Better understanding of customers on the marketplace.
Attribute Value
Color Black
Material Leather
Height 10 inch
Width 13 inch
Depth 6 inch
Attribute value extraction
The bag image is designed by pch.vector / Freepik

3
From NER-Based to QA-Based Attribute Value Extraction
• Existing Named Entity Recognition (NER)-based approach to
attribute value extraction suffers from data sparseness problem.
• Number of classes (attributes) in attribute value extraction can exceed one thousand.
• Question Answering (QA)-based approach to attribute value extraction
alleviates the data sparseness problem [Xu+, 2019; Wang+, 2020].
QA-based approach
Adidas Running Shoes - 8.5 / White[SEP]Brand
Context Query
Adidas Running Shoes - 8.5 / White
Answer

4
Adidas Running Shoes - 8.5 / White[SEP]Brand
Context Query
From NER-Based to QA-Based Attribute Value Extraction
• Existing Named Entity Recognition (NER)-based approach to
attribute value extraction suffers from data sparseness problem.
• Number of classes (attributes) in attribute value extraction can exceed 1K.
• Question Answering (QA)-based approach to attribute value extraction
alleviates the data sparseness problem [Xu+, 2019; Wang+, 2020].
BERT
QA model BERT-QA
[Wang+, 2020]
Adidas Running Shoes - 8.5 / White
Answer

5
Attribute Value Extraction is Still Difficult
1
10
100
1000
10000
100000
1
48
95
142
189
236
283
330
377
424
471
518
565
612
659
706
753
800
847
894
941
988
1035
1082
1129
1176
1223
1270
1317
1364
1411
1458
1505
1552
1599
1646
1693
1740
1787
1834
1881
1928
1975
2022
2069
2116
Number
of
instances
Attributes (2,162)
Number of Instances per Attribute on AliExpress Dataset

6
1
10
100
1000
10000
100000
1
48
95
142
189
236
283
330
377
424
471
518
565
612
659
706
753
800
847
894
941
988
1035
1082
1129
1176
1223
1270
1317
1364
1411
1458
1505
1552
1599
1646
1693
1740
1787
1834
1881
1928
1975
2022
2069
2116
Number
of
instances
Attributes (2,162)
Problems
• Rare attributes
• Number of instances is less than 10 in 85% of attributes

7
1
10
100
1000
10000
100000
1
48
95
142
189
236
283
330
377
424
471
518
565
612
659
706
753
800
847
894
941
988
1035
1082
1129
1176
1223
1270
1317
1364
1411
1458
1505
1552
1599
1646
1693
1740
1787
1834
1881
1928
1975
2022
2069
2116
Number
of
instances
Attributes (2,162)
Problems
• Rare attributes
• Ambiguous attributes
• function 1, suitable, sort, etc.

8
1
10
100
1000
10000
100000
1
48
95
142
189
236
283
330
377
424
471
518
565
612
659
706
753
800
847
894
941
988
1035
1082
1129
1176
1223
1270
1317
1364
1411
1458
1505
1552
1599
1646
1693
1740
1787
1834
1881
1928
1975
2022
2069
2116
Number
of
instances
Attributes (2,162)
Problems
• Rare attributes
• Ambiguous attributes
• function 1, suitable, sort, etc.
Number of Labels per Attribute on AliExpress Dataset
How can we obtain effective query representation
for rare and ambiguous attributes?

9
Knowledge-Driven Query Expansion for QA-based AE (1/3)
Training data
B E
CATL ... 100 Ah … battery
Knowledge
(Attribute-value pairs)
BERT-QA
Title[SEP]Attribute[SEP]Values
Context
Exploit attribute values in training data as run-time
knowledge to induce better query representation
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah
Query
Zipp Battery 12V 14AH SLA…
Nominal capacity
Brand

10
Training data
B E
Knowledge
BERT-QA
Context
Exploit attribute values in training data as run-time
knowledge to induce better query representation
Query
Imperfect
Zipp Battery 12V 14AH SLA…
Nominal capacity
Brand

11
• Train knowledge-based QA models while mimicking the imperfection of
knowledge in testing.
• Knowledge dropout: Prevent models from naively matching values in query with one in context.
• Knowledge token mixing: Prevent models from more relying on values than attributes.
• We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based
model with and without our value-based query expansion.
Training data
B E
Knowledge
BERT-QA
Context Query

12
Training data
B E
Knowledge
BERT-QA
Context Query

13
Training data
B E
Knowledge
BERT-QA
Context Query
Knowledge dropout

14
Training data
B E
Knowledge
BERT-QA
Context Query
Knowledge dropout

15
Training data
B E
Knowledge
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Seen]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
Title[SEP][Un/seen]Attribute[SEP]Values
Context Query
Knowledge dropout

16
Training data
B E
Knowledge
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Seen]nominal capacity[SEP]14ah[SEP]40ah
BERT-QA
CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Unseen] nominal capacity Deleted
Title[SEP][Un/seen]Attribute[SEP]Values
Context Query
Knowledge dropout

17
Experimental Settings
• Perform experiments using cleaned AE-pub dataset.
• We construct the cleaned AE-pub dataset from the public AliExpress dataset [Xu+, 2019] by
removing 736 near-duplicated tuples.
• Each entry consists of a tuple of <product title, attribute, value>.
• Split the cleaned AE-pub dataset into train/dev/test sets with the ratio of 7:1:2.
Train Dev. Test
# of tuples 76,823 10,975 21,950
# of tuples with NULL 15,097 2,201 4,259
# of unique attribute-value pairs 11,819 2,680 4,431
# of unique attributes 1,801 635 872
# of unique values 9,317 2,258 3,671
Statistics of the cleaned AE-pub dataset

18
Baselines
• Dictionary matching
• SUOpenTag [Xu+, 2019]
• AVEQA [Wang+, 2020]
• BERT-QA [Wang+, 2020]
SUOpenTag AVEQA

19
Baselines
• Dictionary matching
• SUOpenTag [Xu+, 2019]
• AVEQA [Wang+, 2020]
• BERT-QA [Wang+, 2020]
SUOpenTag AVEQA
BERT-QA

20
Performance on Cleaned AE-pub Dataset
Models
Macro Micro
P (%) R (%) F1 P (%) R (%) F1
Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00)
SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20)
AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09)
BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24)
BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14)
BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14)
BERT-QA +vals +mixing 46.67 (±0.33) 43.32 (±0.50) 44.93 (±0.39) 88.30 (±0.69) 82.46 (±0.30) 85.28 (±0.26)
BERT-QA +vals +drop +mixing 47.74 (±0.54) 44.82 (±0.75) 46.23 (±0.64) 87.84 (±0.39) 82.61 (±0.07) 85.14 (±0.19)
BERT-QA +vals +drop +mixing outperformed the baseline methods.

21
Models
Macro Micro
P (%) R (%) F1 P (%) R (%) F1
Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00)
SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20)
AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09)
BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24)
BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14)
BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14)
BERT-QA +vals learns to find strings that are similar to ones retrieved from the training data.

22
Models
Macro Micro
P (%) R (%) F1 P (%) R (%) F1
Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00)
SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20)
AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09)
BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24)
BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14)
BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14)
Knowledge dropout and knowledge token mixing improve both macro and micro F1 performance.

23
Impact on Rare and Ambiguous Attributes
• Categorize attributes that took the query expansion according to the number of
training examples and the appropriateness of the attribute names.
• Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness.
• Compute the cosine similarity between the attribute embeddings and averaged value embeddings.
• Regard the attribute name as ambiguous if the similarity is low.
• Divide the attributes into four according to median frequency and similarity to values.

24
Model
Cosine
similarity
(med: 0.929)
Number of training examples (median: 8)
[1, 8) [8, ∞) All
BERT-QA +vals +drop +mixing
[0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86)
[0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29)
All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08)
Macro F1 Gains over BERT-QA Model

25
Model
Cosine
similarity
(med: 0.929)
[1, 8) [8, ∞) All
[0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86)
[0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29)
All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08)
Query expansion can generate more informative queries than ambiguous attributes alone.

26
Model
Cosine
similarity
(med: 0.929)
[1, 8) [8, ∞) All
[0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86)
[0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29)
All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08)
Query expansion is effective for rare attributes more than frequent attributes.

27
Model
Cosine
similarity
(med: 0.929)
[1, 8) [8, ∞) All
[0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86)
[0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29)
All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08)
Model could use more parameters to solve the task itself by taking the internal knowledge
induced from the training data as runtime input.

28
Example Outputs
Context
Query
Gold
Prediction
Attribute Values BERT-QA
BERT-QA w/ query
expansion
aeronova bicycle carbon
mtb handlebar
mountain bikes flat
handlebar mtb
integrated handlebars
with stem bike
accessories
function 1
skiing goggles,
carbon road
bicycle
handlebar,
cycling glasses,
bicycle mask,
gas mask, …
carbon mtb handlebar
bicycle carbon mtb
handlebar L
carbon mtb handlebar
J
lfp 3.2v 100ah lifepo4
prismatic cell deep cycle
diy lithium ion battery
72v 60v 48v 24v 100ah
200ah ev solar storage
battery
nominal
capacity
14ah, 40ah,
17.4ah
100ah 3.2v 100ah L 100ah J
camel outdoor softshell
men’s hiking jacket
wind- proof thermal
jacket for camping ski
thick warm coats
suitable
men, camping,
kids,
saltwater/fresh
water, women,
4-15y, mtb
cycling shoes, …
men men J camping L

29
Conclusions
• Knowledge-driven query expansion for QA-based product attribute
extraction.
• We construct the knowledge from training data, and use it to induce better query
representation.
• Two tricks to mimic the imperfection of the knowledge.
• Knowledge dropout and knowledge token mixing.
• Our query expansion is effective, especially for rare and ambiguous attributes.

30
論⽂で触れていない話
• 評価実験と実際の利⽤シーンの乖離
• 評価実験︓先⾏研究も含め、正解属性が与えられている
• 実際の利⽤シーン︓正解属性はわからない
• QA-based modelの実⽤性
• 属性を変えて複数回モデルを⾛らせる必要がある
• どの属性について値を抽出したいのか事前に知っておく必要がある
• Eコマースサイトによってはマスターデータを参照すれば絞り込み可能

31
属性値抽出の今後
• 属性値抽出 à NERとなりがち
• NERベースの⼿法の問題
• 抽出された値の正規化が必要（D&G à Dolce & Gabbana）
• 属性値をアノテーションする場合、正解を定義するのが難しい
• 既存の商品データから学習データを⾃動⽣成すると誤ったアノテーションが含まれる
• 商品タイトル: ジャーナルスタンダードジーンズスタンダードフィット
• 属性︓<ブランド, ジャーナルスタンダード>, <パンツ脚幅, スタンダード>
• 値の種類がめったに増えない属性もある（e.g., ⾊、⽣産国）
• NER以外のアプローチ
• 分類として解く
• Extreme Multi-Label Classification with Label Masking for Product Attribute Value Extraction
• ⽣成として解く
• 研究中

Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction

Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction

Similar to Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction (20)

More from Rakuten Group, Inc.

More from Rakuten Group, Inc. (20)

Recently uploaded

Recently uploaded (20)

Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction