SlideShare a Scribd company logo
Meltwater Meetup Budapest - 7 Sep. 2016
Omer Gunes and Tim Furche
Structured Aspect Extraction
Giorgio Orsi
University of Birmingham University of Oxford
Aspect Extraction (AE)
Identifying relevant features of an explicit or implicit entity of interest
The Sony Xperia XZ is the new headliner with top-of-the-line hardware, a bigger
display, a new and improved camera, squared design, and, of course, water-proofing.
Sony Xperia XZ
Entity (explicit) Aspects
new headliner
top-of-the-line hardware
bigger display
new and improved camera
squared design
water-proofing
[Zhang and Liu, 2014]
Sentiment Analysis
Aspect (entity) based
new headliner
top-of-the-line hardware
bigger display
new and improved camera
squared design
water-proofing 0.218
The Sony Xperia XZ is the new headliner with top-of-the-line hardware, a bigger
display, a new and improved camera, squared design, and, of course, water-proofing.
0.476
0.476
0.476
Sony Xperia XZ
0.476
0.641
0.350
course 0.341
⟨ headliner, yes ⟩
⟨ hardware, top-of-the-line ⟩
⟨ display, { yes, bigger } ⟩
⟨ camera, { yes, new, improved } ⟩
⟨ design, squared ⟩
⟨ water-proofing, yes ⟩
The Sony Xperia XZ is the new headliner with top-of-the-line hardware, a bigger
display, a new and improved camera, squared design, and, of course, water-proofing.
Aspect extraction vs attribute extraction
Knowledge Base Construction
Basically, you want the attribute (i.e., aspect term) names and factual values
⟨ OEM, Sony ⟩
⟨ model, Xperia XZ ⟩
[Shin et al., 2015]
Structured Aspect Extraction (SAE)
Victorian two bedroom mid terrace property
Extends AE with fine-grained extraction and typing of complex (i.e., hierarchical) aspects
Victorian two bedroom mid terrace propertyAspect term extraction (ATE)
⟨ { Victorian, ⟨ { two }, bedroom ⟩, mid terrace }, property ⟩Segmentation
⟨ { JJ, ⟨ { CD }, bedroom ⟩, mid terrace }, property ⟩
Typing and Generalisation
modifiers = {qualifiers, quantifiers}
SAE: Why it is hard
Victorian two bedroom mid terrace property located in Cambridge and comprising of
living room with ORIGINAL!!! cupboards, and ORIGINAL!!! picture rail.Stairway off living
room leads to two bedrooms.
Noisy unstructured text (NUT)
bedroom mid terrace
picture rail.Stairway
cupboards
Cambridge
bedrooms
ORIGINAL
property
Cambridge
rail.Stairway
Victorian
Cambridge
rail.Stairway
cupboards
property
room
bedrooms
SAE: Why it is hard
Noisy unstructured text (NUT)
By the time we get to the dependency parser we have lost the battle already
The problems start with the tokenizer
picture rail.Stairway
Victorian two bedroom mid terrace property located in Cambridge and comprising of
living room with ORIGINAL !!! cupboards, and ORIGINAL !!! picture rail.Stairway off living
room leads to two bedrooms.
and continue with the POS tagger
NN NN NN
VBN
NNPNNP
NN NN
NN
NN NN
JJ JJ
JJ
CD VBG
VBG NNP NNP CC
CDVBZ
NNP VBG
Unsupervised SAE
Large corpus of homogeneous documents (50k ~ 250k)
same domain (use a classifier), preferably no bundles
Normalisation and tagging
tokenisation (NUT specific)
orthography normalisation (most common orthography)
POS tagging (Hepple’s on TreeBank)
NP chunking (Ramshaw – Mitchell)
NP Clustering
head noun lemmatization (approx. last noun in NP)
frequent head nouns -> aspect terms
Segmentation
cPMI optimal parsing of an NP -> modifiers / multi-words
Generalisation and typing
structured aspect patterns (SAP)
entity, aspect term, qualifier, quantifier
NP Clustering
Two further double bedrooms
Three further double bedrooms
A further double bedroom
Two first floor bedrooms
…
Input: A large number of (normalized) NPs
Abstraction of numerical expressions + removal of non-content word prefixes
CD further double bedrooms
CD further double bedrooms
DT further double bedroom
CD first floor bedrooms
{ CC, DT, EX, IN, PRP, PUNC }
Filter head nouns (exp. set but 70-75% of the corpus) and cluster them
Dameraau-Levenshtein to compensate for mispells
{ CD further double bedrooms
further double bedroom
CD first floor bedrooms }
[ bedroom ]
Segmentation
Victorian two bedroom mid terrace property
Basically, we have to assign the elements of the NP modifiers to:
a multi-word expression
an aspect term
find sub-patterns
⟨ Victorian ⟨ two bedroom mid ⟩ ⟨ terrace ⟩ property ⟩
⟨ Victorian ⟨ two bedroom ⟩ ⟨ mid terrace ⟩ property ⟩
⟨ Victorian ⟨ two bedroom ⟩ mid terrace property ⟩
Valid parenthesizations
balanced parenthesization (algorithms and data structures – DP)
for each level k of the parenthesization
we have at least two elements
it either terminates with a head of cluster OR it contains no head of cluster
Segmentation
cPMI-optimal parenthesizations
Adaptation of corpus-wide Point-wise Mutual Information (cPMI)
mentation is corpus-level significant point-wise mutual information (cPMI) (Damani and Ghong
13). Our definition of cPMI uses the corpus of NPs instead of arbitrary descriptions. Let C be the set o
clusters produced as described above. We denote by fC(t) the frequency of the string t in all cluste
C, i.e., obtained by summing up all of the occurrences of t in all clusters. Let 0 < < 1 be th
malization factor defined as in (Damani and Ghonge, 2013), and tkw, the concatenation of two string
d w. We then define cPMIC(t, w) as follows:
cPMIC(t, w) = log
fC(tkw)
fC(t) · fC(w)
|C| +
p
fC(t) ·
q
ln( )
( 2)
The cPMI value is used to determine whether a token should be associated with (i) the head noun
a nested token representing the head of a different cluster, thus possibly inducing a nested structur
iii) an adjacent token, thus forming a multi-word expression.
⟨ Victorian ⟨ two bedroom ⟩ ⟨ mid terrace ⟩ property ⟩
Parenthesization that maximises cPMInp becomes a (ground) structured aspect pattern (SAP)
⟨ { Victorian, ⟨ { two }, bedroom ⟩, mid terrace }, property ⟩
cPMInp = cPMIC (Victorian, property) + cPMIC (two bedroom, property) +
cPMIC (mid terrace, property) + cPMIC (two, bedroom) + cPMIC (mid, terrace)
[Damani and Ghonge 2013]
Typing and Generalisation
⟨ { Victorian, ⟨ { two }, bedroom ⟩, mid terrace }, property ⟩
Given a (ground) SAP…
Victorian → property-qualifier
two bedroom → property-qualifier
mid terrace → property-qualifier
property → property
two → bedroom-quantifier
bedroom → property{
Typing and Generalisation
Ground SAPs have good precision but pretty bad recall
POS-based pattern generalization
non-content words are always generalized
aspect terms generalized only if a nested pattern with a ground head exists
qualifiers are generalized one-at-a-time
⟨ { Victorian, ⟨ { two }, bedroom ⟩, mid terrace }, property ⟩
⟨ { JJ, ⟨ { CD }, bedroom ⟩, mid terrace }, property ⟩
⟨ { Victorian, ⟨ { CD }, bedroom ⟩, JJ terrace }, property ⟩
⟨ { Victorian, ⟨ { two }, bedroom ⟩, mid JJ }, property ⟩
⟨ { Victorian, ⟨ { two }, bedroom ⟩, JJ }, property ⟩
⟨ { Victorian, ⟨ { two }, bedroom ⟩, mid terrace }, NN ⟩
no labelled dataset is available. We take the heads of the noun-phrase clusters as a surrogate of the set of
valid aspects. The analysis is limited to aspect terms. Let T be the set of valid aspect terms as defined
above, and E be the set of aspect terms produced by an SAP P. The score of P is computed as:
⌫(P) =
|T|
P
e2E(1 maxt2T (dist(t,e)
len(t) < 0.2))
· log |T| ⌫(P) 2 [0, 1]
where dist(t, e) denotes the Dameraau-Levenshtein edit distance between two strings t and e and len(·)
denotes the length of the string. Patterns scoring less than an experimentally set threshold are eliminated.
3 Evaluation
Our method (SysName) is implemented in Java. All experiments are run on a Dell OptiPlex 9020 with
two quad-core i7-4770 Intel CPUs at 3.40GHz and 32GB RAM, running Linux Mint 17 Qiana. All
resources used in the evaluation are made available for replicability.2
Datasets and metrics We use three groups of datasets in our evaluation (Table 1): The first two con-
sist of the SemEval143 and SemEval154 datasets used for the aspect term extraction (ATE) and opinion
where:
is the set of reference aspect terms (cluster heads)
is the Dameraau – Levenshtein distance
is the length of the string
gate of the set of
terms as defined
uted as:
1]
and e and len(·)
d are eliminated.
tiPlex 9020 with
nt 17 Qiana. All
The first two con-
ds of the noun-phrase clusters as a surrogate of the set of
terms. Let T be the set of valid aspect terms as defined
ed by an SAP P. The score of P is computed as:
st(t,e)
en(t) < 0.2))
· log |T| ⌫(P) 2 [0, 1]
htein edit distance between two strings t and e and len(·)
less than an experimentally set threshold are eliminated.
. All experiments are run on a Dell OptiPlex 9020 with
z and 32GB RAM, running Linux Mint 17 Qiana. All
able for replicability.2
Typing and Generalisation
Pattern scoring [Gupta and Manning, 2014]
Score patterns on their ability to discriminate between correct and incorrect extractions
No labelled dataset available → use cluster heads as surrogate labels
no labelled dataset is available. We take the heads of the noun-phrase clusters as a surrogate of the set o
valid aspects. The analysis is limited to aspect terms. Let T be the set of valid aspect terms as define
above, and E be the set of aspect terms produced by an SAP P. The score of P is computed as:
⌫(P) =
|T|
P
e2E(1 maxt2T (dist(t,e)
len(t) < 0.2))
· log |T| ⌫(P) 2 [0, 1]
where dist(t, e) denotes the Dameraau-Levenshtein edit distance between two strings t and e and len(
denotes the length of the string. Patterns scoring less than an experimentally set threshold are eliminated
3 Evaluation
Our method (SysName) is implemented in Java. All experiments are run on a Dell OptiPlex 9020 wit
wo quad-core i7-4770 Intel CPUs at 3.40GHz and 32GB RAM, running Linux Mint 17 Qiana. A
resources used in the evaluation are made available for replicability.2
Datasets and metrics We use three groups of datasets in our evaluation (Table 1): The first two con
sist of the SemEval143 and SemEval154 datasets used for the aspect term extraction (ATE) and opinio
arget expression (OTE) subtasks of the aspect-based sentiment analysis (ABSA) task. The datasets pro
Patterns scoring less than an experimentally set threshold are eliminated
Pattern Matching
Pattern references
nested patterns are not repeated, they reference to each others
enables parallel SAP generalisation and matching
⟨ { JJ, #SAPbedroom , mid terrace }, property ⟩
⟨ { Victorian, #SAPbedroom , JJ terrace }, property ⟩
⟨ { Victorian, #SAPbedroom , mid JJ }, property ⟩
⟨ { Victorian, #SAPbedroom , JJ }, property ⟩
SAPproperty
⟨ { Victorian, #SAPbedroom , mid terrace }, NN ⟩
SAPNN
SAPbedroom
⟨ { two }, bedroom⟩
⟨ { CD }, bedroom ⟩
How fast?
Induction: 10-14 msec / sentence
Matching 2-3 msec / text
bottlenecks: morphological analysis and cPMI-optimal segmentation
Evaluation
Datasets
SemEval OTE/ATE only useful for aspect terms
We provide a SAED (Structured Aspect Extraction Dataset - http://bit.ly/2caeXf3)
consists of both NUT and (semi-) formal English texts. We provide GS annotations for 150 texts equally
distributed across the six domains. The GS provides and average of 355 aspect terms, 30 quantifiers,
430 qualifiers, and 45 nested aspects per domain. Annotations were produced by 6 independent anno-
tators ( =87%). We use standard recall, precision, and F1 score metrics. However, due to the different
granularity of the output produced by the systems and of the GS annotations, the definition of a correct
extraction varies slightly with each evaluation task.
Table 1: Datasets
DATASET DOMAIN SIZE (#texts) SOURCES CATEGORY FORMALITY TYPE
SemEval14
restaurants 3k + 800 GS (*) Citysearch service NUT evaluative
laptops 3k + 800 GS (*) N/A product NUT evaluative
SemEval15
restaurants 254 + 96 GS Citysearch service NUT evaluative
hotels N/A + 30 GS Citysearch service NUT evaluative
SAED
chairs 94k + 25 GS Amazon, GumTree product NUT descriptive
hotels 20k + 25 GS TripAdvisor service formal descriptive
real estate 87k + 25 GS RightMove product semi-formal descriptive
restaurants 115k + 25 GS TripAdvisor service formal descriptive
shoes 46k + 25 GS Amazon, GumTree product NUT descriptive
watches 10k + 25 GS Amazon, GumTree product NUT descriptive
Comparative evaluation – Simplified SAE The method by (Kim et al., 2012), hence ATL, is currently
the closest to SAE we are aware of. We have obtained from the authors the dataset used in their evaluation
2
All resources are available at http://bit.ly/29YtM3K and include: the SAED dataset and GS, our reimplementations of
IIITH and ATL, a compiled version of SysName, and all output files generated by all systems.
3
4
http://alt.qcri.org/semeval2015/task12/
Systems
The SemEval 14/15 systems
IITH [Raju et al., 2009]
ATL [Kim et al., 2012]
ATEX [Zhang and Liu, 2014]
Evaluation
ATE setting (SemEval Dataset)
0
20
40
60
80
100
HIS_RD
DLIREC	(U)
NRC-Can
UNITOR	(U)
XRCE
SAP_RI
IITP
SeemGo
ATEX	(U)
IIITH	(U)
ATL	(U)
Sysname	(U)
Supervised Unsupervised
Restaurants Laptops
(a) SemEval14 Dataset
0
20
40
60
80
100
ISISLif
LT3	(U)
Elixa	(U)
Sentiue
UFGRS
Wnlp
V3
IIITH	(U)
ATL	(U)
ATEX	(U)
SysName	(U)
Supervised Unsupervised
Restaurants Hotels
(b) SemEval15 Dataset
0
20
40
60
80
100
R P F1 R P F1 R P F1 R P F1 R P F1 R P F1
IIITH ATL ATEX SysName
SemEval 2014
ATL	(U)
Sysname	(U)
vised
0
20
40
60
80
100
ISISLif
LT3	(U)
Elixa	(U)
Sentiue
UFGRS
Wnlp
V3
IIITH	(U)
ATL	(U)
ATEX	(U)
SysName	(U)
Supervised Unsupervised
Restaurants Hotels
(b) SemEval15 Dataset
SemEval 2015
Evaluation
Simplified SAE setting (SAE Dataset)
but not an implementation of the system. We have reimplemented the method and successfully repro
duced the experimental results described in the original paper. Figure 1 shows a comparison between AT
and SysName on the SAED dataset. An extraction is correct if modifiers and aspect terms match exactl
the GS annotations, and if modifiers are correctly typed as qualifiers or quantifiers. This is a simplifie
SAE setting where we do not require correct linking of modifiers to aspect terms. SysName performs 33%
0
20
40
60
80
100
R P F1 R P F1 R P F1 R P F1 R P F1 R P F1
Chairs Hotels Real	Estate Restaurants Shoes Watches
ATL SysName
Figure 1: SysName vs. ATL on simplified SAE (SAED dataset)
better than ATL in average, outperforming it in all domains. Besides being unable to extract hierarchica
structures, a visible issue in ATL is the inability to establish and leverage the semantic connection betwee
Correct extraction: correct aspect term +
correct modifier +
correct typing for the modifier (i.e., qualifier / quantifier)
Evaluation
Full SAE setting (SAE Dataset)
Correct extraction: correct aspect term +
correct modifier +
correct typing for the modifier (i.e., X–quantifier, Y–qualifier) +
correct linking (modifier-entity, sub-patterns)
(c) SAED Dataset
Figure 2: SysName vs. others in ATE
es is indeed a much more challenging task than simply identifying them. Another int
s the impact of the generalization on the performance. Generalized SAPs produce 444
ons against the 386 of the ground ones (+15%).
0
20
40
60
80
100
R P F1 R P F1 R P F1 R P F1 R P F1 R P F1
Chairs Hotels Real	Estate Restaurants Shoes Watches
ATE Simpl.	SAE Full	SAE
Figure 3: SysName on full SAE
SAE is substantially harder than ATE/OTE and simplified SAE
Evaluation
Effect of corpus size (SAE Dataset)
The larger the corpus… the better?
0
20
40
60
80
100
1% 5% 10% 25% 50% 100%
AVG	R AVG	P AVG	F1
(a) SAE Task
0
20
40
60
80
100
1% 5% 10% 25% 50% 100%
AVG	R AVG	P AVG	F1
(b) ATE Task
Figure 4: Performance vs. corpus size (average – SAED dataset)
0% 100%
P AVG	F1
0
20
40
60
80
100
1% 5% 10% 25% 50% 100%
AVG	R AVG	P AVG	F1
(b) ATE Task
nce vs. corpus size (average – SAED dataset)
wn of this experiment by domain for the ATE and SAE tasks re-
o draw further conclusions on the relationship between the size
he SAPs. There is a relationship between the variety of features
y to induce good quality SAPs. For domains such as, e.g., chairs,
g from 25% of the size of the corpus we do not notice substantial
n be explained by the nature of the features in these domains that
models of the products, types of real estate properties, etc. In the
s are much more variegated in features, e.g., restaurant and hotel
SAE setting
ATE setting
Evaluation
Effect of corpus size (SAE Dataset)
Not necessarily… your often reach a point where more data is not going to help
0
20
40
60
80
100
1% 5% 10% 25% 50% 100%
Chairs	R
Chairs	P
Chairs	F1
(a) chairs
0
20
40
60
80
100
1% 5% 10% 25% 50% 100%
Hotels	R Hotels	P Hotels	F1
(b) hotels
0
20
40
60
80
100
1% 5% 10% 25% 50% 100%
Real	Estate	R
Real	Estate	P
Real	Estate	F1
(c) realestate
0
20
40
60
80
100
1% 5% 10% 25% 50% 100%
Restaurants	R
Restaurants	P
Restaurants	F1
(d) restaurants
0
20
40
60
80
100
1% 5% 10% 25% 50% 100%
Shoes	R Shoes	P Shoes	F1
(e) shoes
0
20
40
60
80
100
1% 5% 10% 25% 50% 100%
Watches	R Watches	P Watches	F1
(f) watches
What’s next
Injecting supervision
Several places…, clustering, pattern scoring, and typing probably the most important ones
Dynamic cut-off thresholds
Use test sets to adjust corpus size and thresholds
Aspects not in NPs
Named entities, relations, other grammatical forms
e.g., living room with sash windows
Automatically determine the domain
Map the NP cluster heads to an existing KB (e.g., BabelNet) and use their graph for scoping
That’s all Folks!
References
[Shin et al.2015] Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, and
Christopher Re ́. 2015. Incremental knowledge base construction using deepdive.
PVLDB, 8(11):1310–1321.
[Raju et al.2009] S. Raju, P. Pingali, and V. Varma. 2009. An unsupervised approach to
product attribute extraction. In Proc. of ECIR, pages 796–800.
[Ramshaw and Mitchell1999] L. A. Ramshaw and M. P. Mitchell. 1999. Text chunking
using transformation-based learning. In Armstrong S. et Al, editor, Natural Language
Processing Using Very Large Corpora, volume 11 of Text, Speech and Language
Technology, pages 157–176.
[Kim et al.2012] D. S. Kim, K. Verma, and P. Z. Yeh. 2012. Building a lightweight semantic
model for unsuper- vised information extraction on short listings. In Proc. of EMLNP,
pages 1081–1092.
[Zhang and Liu2014] Lei Zhang and Bing Liu, 2014. Aspect and Entity Extraction for
Opinion Mining, pages 1–40. Springer Berlin Heidelberg.

More Related Content

What's hot

Computational Assignment Help
Computational Assignment HelpComputational Assignment Help
Computational Assignment Help
Programming Homework Help
 
Error Control in Multimedia Communications using Wireless Sensor Networks report
Error Control in Multimedia Communications using Wireless Sensor Networks reportError Control in Multimedia Communications using Wireless Sensor Networks report
Error Control in Multimedia Communications using Wireless Sensor Networks reportMuragesh Kabbinakantimath
 
Strinng Classes in c++
Strinng Classes in c++Strinng Classes in c++
Strinng Classes in c++Vikash Dhal
 
String in python use of split method
String in python use of split methodString in python use of split method
String in python use of split method
vikram mahendra
 
Introduction to ad-3.4, an automatic differentiation library in Haskell
Introduction to ad-3.4, an automatic differentiation library in HaskellIntroduction to ad-3.4, an automatic differentiation library in Haskell
Introduction to ad-3.4, an automatic differentiation library in Haskell
nebuta
 
String in programming language in c or c++
 String in programming language  in c or c++  String in programming language  in c or c++
String in programming language in c or c++
Samsil Arefin
 
Format String Vulnerability
Format String VulnerabilityFormat String Vulnerability
Format String Vulnerability
Jian-Yu Li
 
Strings in c++
Strings in c++Strings in c++
Strings in c++
Neeru Mittal
 
C string
C stringC string
String c
String cString c
E6
E6E6
E6
lksoo
 
Think sharp, write swift
Think sharp, write swiftThink sharp, write swift
Think sharp, write swift
Pascal Batty
 
C++ string
C++ stringC++ string
C++ string
Dheenadayalan18
 
The Error of Our Ways
The Error of Our WaysThe Error of Our Ways
The Error of Our Ways
Kevlin Henney
 

What's hot (20)

14 strings
14 strings14 strings
14 strings
 
Computational Assignment Help
Computational Assignment HelpComputational Assignment Help
Computational Assignment Help
 
Error Control in Multimedia Communications using Wireless Sensor Networks report
Error Control in Multimedia Communications using Wireless Sensor Networks reportError Control in Multimedia Communications using Wireless Sensor Networks report
Error Control in Multimedia Communications using Wireless Sensor Networks report
 
Strinng Classes in c++
Strinng Classes in c++Strinng Classes in c++
Strinng Classes in c++
 
String in python use of split method
String in python use of split methodString in python use of split method
String in python use of split method
 
Chtp414
Chtp414Chtp414
Chtp414
 
Introduction to ad-3.4, an automatic differentiation library in Haskell
Introduction to ad-3.4, an automatic differentiation library in HaskellIntroduction to ad-3.4, an automatic differentiation library in Haskell
Introduction to ad-3.4, an automatic differentiation library in Haskell
 
String in programming language in c or c++
 String in programming language  in c or c++  String in programming language  in c or c++
String in programming language in c or c++
 
Format String Vulnerability
Format String VulnerabilityFormat String Vulnerability
Format String Vulnerability
 
Ch2
Ch2Ch2
Ch2
 
Strings
StringsStrings
Strings
 
Strings in c++
Strings in c++Strings in c++
Strings in c++
 
C string
C stringC string
C string
 
String c
String cString c
String c
 
05 c++-strings
05 c++-strings05 c++-strings
05 c++-strings
 
E6
E6E6
E6
 
Think sharp, write swift
Think sharp, write swiftThink sharp, write swift
Think sharp, write swift
 
C++ string
C++ stringC++ string
C++ string
 
The Error of Our Ways
The Error of Our WaysThe Error of Our Ways
The Error of Our Ways
 
Chapter1c
Chapter1cChapter1c
Chapter1c
 

Similar to SAE: Structured Aspect Extraction

Generating super resolution images using transformers
Generating super resolution images using transformersGenerating super resolution images using transformers
Generating super resolution images using transformers
NEERAJ BAGHEL
 
Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1
Deepak John
 
data structures using C 2 sem BCA univeristy of mysore
data structures using C 2 sem BCA univeristy of mysoredata structures using C 2 sem BCA univeristy of mysore
data structures using C 2 sem BCA univeristy of mysore
ambikavenkatesh2
 
Data structures notes for college students btech.pptx
Data structures notes for college students btech.pptxData structures notes for college students btech.pptx
Data structures notes for college students btech.pptx
KarthikVijay59
 
A Reflective Implementation of an Actor-based Concurrent Context-Oriented System
A Reflective Implementation of an Actor-based Concurrent Context-Oriented SystemA Reflective Implementation of an Actor-based Concurrent Context-Oriented System
A Reflective Implementation of an Actor-based Concurrent Context-Oriented System
Takuo Watanabe
 
Quasi succinct indices
Quasi succinct indicesQuasi succinct indices
Quasi succinct indices
Han Jiang
 
Python for R developers and data scientists
Python for R developers and data scientistsPython for R developers and data scientists
Python for R developers and data scientists
Lambda Tree
 
Intermediate code generation1
Intermediate code generation1Intermediate code generation1
Intermediate code generation1Shashwat Shriparv
 
VCE Unit 01 (2).pptx
VCE Unit 01 (2).pptxVCE Unit 01 (2).pptx
VCE Unit 01 (2).pptx
skilljiolms
 
14-Intermediate code generation - Variants of Syntax trees - Three Address Co...
14-Intermediate code generation - Variants of Syntax trees - Three Address Co...14-Intermediate code generation - Variants of Syntax trees - Three Address Co...
14-Intermediate code generation - Variants of Syntax trees - Three Address Co...
venkatapranaykumarGa
 
Design and Analysis of algorithms
Design and Analysis of algorithmsDesign and Analysis of algorithms
Design and Analysis of algorithms
Dr. Rupa Ch
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
NAVER D2
 
Lecture 12 intermediate code generation
Lecture 12 intermediate code generationLecture 12 intermediate code generation
Lecture 12 intermediate code generation
Iffat Anjum
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
Guy Lebanon
 
Algorithms Exam Help
Algorithms Exam HelpAlgorithms Exam Help
Algorithms Exam Help
Programming Exam Help
 
Counting Sort Lowerbound
Counting Sort LowerboundCounting Sort Lowerbound
Counting Sort Lowerbounddespicable me
 

Similar to SAE: Structured Aspect Extraction (20)

Generating super resolution images using transformers
Generating super resolution images using transformersGenerating super resolution images using transformers
Generating super resolution images using transformers
 
defense
defensedefense
defense
 
Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1
 
data structures using C 2 sem BCA univeristy of mysore
data structures using C 2 sem BCA univeristy of mysoredata structures using C 2 sem BCA univeristy of mysore
data structures using C 2 sem BCA univeristy of mysore
 
TeraSort
TeraSortTeraSort
TeraSort
 
Q
QQ
Q
 
Data structures notes for college students btech.pptx
Data structures notes for college students btech.pptxData structures notes for college students btech.pptx
Data structures notes for college students btech.pptx
 
A Reflective Implementation of an Actor-based Concurrent Context-Oriented System
A Reflective Implementation of an Actor-based Concurrent Context-Oriented SystemA Reflective Implementation of an Actor-based Concurrent Context-Oriented System
A Reflective Implementation of an Actor-based Concurrent Context-Oriented System
 
Quasi succinct indices
Quasi succinct indicesQuasi succinct indices
Quasi succinct indices
 
Python for R developers and data scientists
Python for R developers and data scientistsPython for R developers and data scientists
Python for R developers and data scientists
 
Intermediate code generation1
Intermediate code generation1Intermediate code generation1
Intermediate code generation1
 
VCE Unit 01 (2).pptx
VCE Unit 01 (2).pptxVCE Unit 01 (2).pptx
VCE Unit 01 (2).pptx
 
14-Intermediate code generation - Variants of Syntax trees - Three Address Co...
14-Intermediate code generation - Variants of Syntax trees - Three Address Co...14-Intermediate code generation - Variants of Syntax trees - Three Address Co...
14-Intermediate code generation - Variants of Syntax trees - Three Address Co...
 
Design and Analysis of algorithms
Design and Analysis of algorithmsDesign and Analysis of algorithms
Design and Analysis of algorithms
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
 
Lec5 Compression
Lec5 CompressionLec5 Compression
Lec5 Compression
 
Lecture 12 intermediate code generation
Lecture 12 intermediate code generationLecture 12 intermediate code generation
Lecture 12 intermediate code generation
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
 
Algorithms Exam Help
Algorithms Exam HelpAlgorithms Exam Help
Algorithms Exam Help
 
Counting Sort Lowerbound
Counting Sort LowerboundCounting Sort Lowerbound
Counting Sort Lowerbound
 

More from Giorgio Orsi

Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash Course
Giorgio Orsi
 
Fairhair.ai – alan turing institute june '17 (public)
Fairhair.ai – alan turing institute june '17 (public)Fairhair.ai – alan turing institute june '17 (public)
Fairhair.ai – alan turing institute june '17 (public)
Giorgio Orsi
 
wadar_poster_final
wadar_poster_finalwadar_poster_final
wadar_poster_finalGiorgio Orsi
 
Query Rewriting and Optimization for Ontological Databases
Query Rewriting and Optimization for Ontological DatabasesQuery Rewriting and Optimization for Ontological Databases
Query Rewriting and Optimization for Ontological Databases
Giorgio Orsi
 
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014
Giorgio Orsi
 
Deos 2014 - Welcome
Deos 2014 - WelcomeDeos 2014 - Welcome
Deos 2014 - Welcome
Giorgio Orsi
 
Heuristic Ranking in Tightly Coupled Probabilistic Description Logics
Heuristic Ranking in Tightly Coupled Probabilistic Description LogicsHeuristic Ranking in Tightly Coupled Probabilistic Description Logics
Heuristic Ranking in Tightly Coupled Probabilistic Description Logics
Giorgio Orsi
 
Datalog and its Extensions for Semantic Web Databases
Datalog and its Extensions for Semantic Web DatabasesDatalog and its Extensions for Semantic Web Databases
Datalog and its Extensions for Semantic Web DatabasesGiorgio Orsi
 
AMBER WWW 2012 Poster
AMBER WWW 2012 PosterAMBER WWW 2012 Poster
AMBER WWW 2012 PosterGiorgio Orsi
 
AMBER WWW 2012 (Demonstration)
AMBER WWW 2012 (Demonstration)AMBER WWW 2012 (Demonstration)
AMBER WWW 2012 (Demonstration)
Giorgio Orsi
 
DIADEM WWW 2012
DIADEM WWW 2012DIADEM WWW 2012
DIADEM WWW 2012
Giorgio Orsi
 
OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)
OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)
OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)
Giorgio Orsi
 
Querying UML Class Diagrams - FoSSaCS 2012
Querying UML Class Diagrams - FoSSaCS 2012Querying UML Class Diagrams - FoSSaCS 2012
Querying UML Class Diagrams - FoSSaCS 2012
Giorgio Orsi
 
OPAL: automated form understanding for the deep web - WWW 2012
OPAL: automated form understanding for the deep web - WWW 2012OPAL: automated form understanding for the deep web - WWW 2012
OPAL: automated form understanding for the deep web - WWW 2012
Giorgio Orsi
 
Nyaya: Semantic data markets: a flexible environment for knowledge management...
Nyaya: Semantic data markets: a flexible environment for knowledge management...Nyaya: Semantic data markets: a flexible environment for knowledge management...
Nyaya: Semantic data markets: a flexible environment for knowledge management...
Giorgio Orsi
 
The Diadem Ontology
The Diadem OntologyThe Diadem Ontology
The Diadem OntologyGiorgio Orsi
 

More from Giorgio Orsi (20)

Web Data Extraction: A Crash Course
Web Data Extraction: A Crash CourseWeb Data Extraction: A Crash Course
Web Data Extraction: A Crash Course
 
Fairhair.ai – alan turing institute june '17 (public)
Fairhair.ai – alan turing institute june '17 (public)Fairhair.ai – alan turing institute june '17 (public)
Fairhair.ai – alan turing institute june '17 (public)
 
diadem-vldb-2015
diadem-vldb-2015diadem-vldb-2015
diadem-vldb-2015
 
wadar_poster_final
wadar_poster_finalwadar_poster_final
wadar_poster_final
 
Query Rewriting and Optimization for Ontological Databases
Query Rewriting and Optimization for Ontological DatabasesQuery Rewriting and Optimization for Ontological Databases
Query Rewriting and Optimization for Ontological Databases
 
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014
 
Deos 2014 - Welcome
Deos 2014 - WelcomeDeos 2014 - Welcome
Deos 2014 - Welcome
 
Perv a ds-rr13
Perv a ds-rr13Perv a ds-rr13
Perv a ds-rr13
 
Heuristic Ranking in Tightly Coupled Probabilistic Description Logics
Heuristic Ranking in Tightly Coupled Probabilistic Description LogicsHeuristic Ranking in Tightly Coupled Probabilistic Description Logics
Heuristic Ranking in Tightly Coupled Probabilistic Description Logics
 
Datalog and its Extensions for Semantic Web Databases
Datalog and its Extensions for Semantic Web DatabasesDatalog and its Extensions for Semantic Web Databases
Datalog and its Extensions for Semantic Web Databases
 
AMBER WWW 2012 Poster
AMBER WWW 2012 PosterAMBER WWW 2012 Poster
AMBER WWW 2012 Poster
 
AMBER WWW 2012 (Demonstration)
AMBER WWW 2012 (Demonstration)AMBER WWW 2012 (Demonstration)
AMBER WWW 2012 (Demonstration)
 
DIADEM WWW 2012
DIADEM WWW 2012DIADEM WWW 2012
DIADEM WWW 2012
 
OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)
OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)
OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)
 
Querying UML Class Diagrams - FoSSaCS 2012
Querying UML Class Diagrams - FoSSaCS 2012Querying UML Class Diagrams - FoSSaCS 2012
Querying UML Class Diagrams - FoSSaCS 2012
 
OPAL: automated form understanding for the deep web - WWW 2012
OPAL: automated form understanding for the deep web - WWW 2012OPAL: automated form understanding for the deep web - WWW 2012
OPAL: automated form understanding for the deep web - WWW 2012
 
Nyaya: Semantic data markets: a flexible environment for knowledge management...
Nyaya: Semantic data markets: a flexible environment for knowledge management...Nyaya: Semantic data markets: a flexible environment for knowledge management...
Nyaya: Semantic data markets: a flexible environment for knowledge management...
 
Table Recognition
Table RecognitionTable Recognition
Table Recognition
 
The Diadem Ontology
The Diadem OntologyThe Diadem Ontology
The Diadem Ontology
 
Diadem 1.0
Diadem 1.0Diadem 1.0
Diadem 1.0
 

Recently uploaded

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 

Recently uploaded (20)

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 

SAE: Structured Aspect Extraction

  • 1. Meltwater Meetup Budapest - 7 Sep. 2016 Omer Gunes and Tim Furche Structured Aspect Extraction Giorgio Orsi University of Birmingham University of Oxford
  • 2. Aspect Extraction (AE) Identifying relevant features of an explicit or implicit entity of interest The Sony Xperia XZ is the new headliner with top-of-the-line hardware, a bigger display, a new and improved camera, squared design, and, of course, water-proofing. Sony Xperia XZ Entity (explicit) Aspects new headliner top-of-the-line hardware bigger display new and improved camera squared design water-proofing [Zhang and Liu, 2014]
  • 3. Sentiment Analysis Aspect (entity) based new headliner top-of-the-line hardware bigger display new and improved camera squared design water-proofing 0.218 The Sony Xperia XZ is the new headliner with top-of-the-line hardware, a bigger display, a new and improved camera, squared design, and, of course, water-proofing. 0.476 0.476 0.476 Sony Xperia XZ 0.476 0.641 0.350 course 0.341
  • 4. ⟨ headliner, yes ⟩ ⟨ hardware, top-of-the-line ⟩ ⟨ display, { yes, bigger } ⟩ ⟨ camera, { yes, new, improved } ⟩ ⟨ design, squared ⟩ ⟨ water-proofing, yes ⟩ The Sony Xperia XZ is the new headliner with top-of-the-line hardware, a bigger display, a new and improved camera, squared design, and, of course, water-proofing. Aspect extraction vs attribute extraction Knowledge Base Construction Basically, you want the attribute (i.e., aspect term) names and factual values ⟨ OEM, Sony ⟩ ⟨ model, Xperia XZ ⟩ [Shin et al., 2015]
  • 5. Structured Aspect Extraction (SAE) Victorian two bedroom mid terrace property Extends AE with fine-grained extraction and typing of complex (i.e., hierarchical) aspects Victorian two bedroom mid terrace propertyAspect term extraction (ATE) ⟨ { Victorian, ⟨ { two }, bedroom ⟩, mid terrace }, property ⟩Segmentation ⟨ { JJ, ⟨ { CD }, bedroom ⟩, mid terrace }, property ⟩ Typing and Generalisation modifiers = {qualifiers, quantifiers}
  • 6. SAE: Why it is hard Victorian two bedroom mid terrace property located in Cambridge and comprising of living room with ORIGINAL!!! cupboards, and ORIGINAL!!! picture rail.Stairway off living room leads to two bedrooms. Noisy unstructured text (NUT) bedroom mid terrace picture rail.Stairway cupboards Cambridge bedrooms ORIGINAL property Cambridge rail.Stairway Victorian Cambridge rail.Stairway cupboards property room bedrooms
  • 7. SAE: Why it is hard Noisy unstructured text (NUT) By the time we get to the dependency parser we have lost the battle already The problems start with the tokenizer picture rail.Stairway Victorian two bedroom mid terrace property located in Cambridge and comprising of living room with ORIGINAL !!! cupboards, and ORIGINAL !!! picture rail.Stairway off living room leads to two bedrooms. and continue with the POS tagger NN NN NN VBN NNPNNP NN NN NN NN NN JJ JJ JJ CD VBG VBG NNP NNP CC CDVBZ NNP VBG
  • 8. Unsupervised SAE Large corpus of homogeneous documents (50k ~ 250k) same domain (use a classifier), preferably no bundles Normalisation and tagging tokenisation (NUT specific) orthography normalisation (most common orthography) POS tagging (Hepple’s on TreeBank) NP chunking (Ramshaw – Mitchell) NP Clustering head noun lemmatization (approx. last noun in NP) frequent head nouns -> aspect terms Segmentation cPMI optimal parsing of an NP -> modifiers / multi-words Generalisation and typing structured aspect patterns (SAP) entity, aspect term, qualifier, quantifier
  • 9. NP Clustering Two further double bedrooms Three further double bedrooms A further double bedroom Two first floor bedrooms … Input: A large number of (normalized) NPs Abstraction of numerical expressions + removal of non-content word prefixes CD further double bedrooms CD further double bedrooms DT further double bedroom CD first floor bedrooms { CC, DT, EX, IN, PRP, PUNC } Filter head nouns (exp. set but 70-75% of the corpus) and cluster them Dameraau-Levenshtein to compensate for mispells { CD further double bedrooms further double bedroom CD first floor bedrooms } [ bedroom ]
  • 10. Segmentation Victorian two bedroom mid terrace property Basically, we have to assign the elements of the NP modifiers to: a multi-word expression an aspect term find sub-patterns ⟨ Victorian ⟨ two bedroom mid ⟩ ⟨ terrace ⟩ property ⟩ ⟨ Victorian ⟨ two bedroom ⟩ ⟨ mid terrace ⟩ property ⟩ ⟨ Victorian ⟨ two bedroom ⟩ mid terrace property ⟩ Valid parenthesizations balanced parenthesization (algorithms and data structures – DP) for each level k of the parenthesization we have at least two elements it either terminates with a head of cluster OR it contains no head of cluster
  • 11. Segmentation cPMI-optimal parenthesizations Adaptation of corpus-wide Point-wise Mutual Information (cPMI) mentation is corpus-level significant point-wise mutual information (cPMI) (Damani and Ghong 13). Our definition of cPMI uses the corpus of NPs instead of arbitrary descriptions. Let C be the set o clusters produced as described above. We denote by fC(t) the frequency of the string t in all cluste C, i.e., obtained by summing up all of the occurrences of t in all clusters. Let 0 < < 1 be th malization factor defined as in (Damani and Ghonge, 2013), and tkw, the concatenation of two string d w. We then define cPMIC(t, w) as follows: cPMIC(t, w) = log fC(tkw) fC(t) · fC(w) |C| + p fC(t) · q ln( ) ( 2) The cPMI value is used to determine whether a token should be associated with (i) the head noun a nested token representing the head of a different cluster, thus possibly inducing a nested structur iii) an adjacent token, thus forming a multi-word expression. ⟨ Victorian ⟨ two bedroom ⟩ ⟨ mid terrace ⟩ property ⟩ Parenthesization that maximises cPMInp becomes a (ground) structured aspect pattern (SAP) ⟨ { Victorian, ⟨ { two }, bedroom ⟩, mid terrace }, property ⟩ cPMInp = cPMIC (Victorian, property) + cPMIC (two bedroom, property) + cPMIC (mid terrace, property) + cPMIC (two, bedroom) + cPMIC (mid, terrace) [Damani and Ghonge 2013]
  • 12. Typing and Generalisation ⟨ { Victorian, ⟨ { two }, bedroom ⟩, mid terrace }, property ⟩ Given a (ground) SAP… Victorian → property-qualifier two bedroom → property-qualifier mid terrace → property-qualifier property → property two → bedroom-quantifier bedroom → property{
  • 13. Typing and Generalisation Ground SAPs have good precision but pretty bad recall POS-based pattern generalization non-content words are always generalized aspect terms generalized only if a nested pattern with a ground head exists qualifiers are generalized one-at-a-time ⟨ { Victorian, ⟨ { two }, bedroom ⟩, mid terrace }, property ⟩ ⟨ { JJ, ⟨ { CD }, bedroom ⟩, mid terrace }, property ⟩ ⟨ { Victorian, ⟨ { CD }, bedroom ⟩, JJ terrace }, property ⟩ ⟨ { Victorian, ⟨ { two }, bedroom ⟩, mid JJ }, property ⟩ ⟨ { Victorian, ⟨ { two }, bedroom ⟩, JJ }, property ⟩ ⟨ { Victorian, ⟨ { two }, bedroom ⟩, mid terrace }, NN ⟩
  • 14. no labelled dataset is available. We take the heads of the noun-phrase clusters as a surrogate of the set of valid aspects. The analysis is limited to aspect terms. Let T be the set of valid aspect terms as defined above, and E be the set of aspect terms produced by an SAP P. The score of P is computed as: ⌫(P) = |T| P e2E(1 maxt2T (dist(t,e) len(t) < 0.2)) · log |T| ⌫(P) 2 [0, 1] where dist(t, e) denotes the Dameraau-Levenshtein edit distance between two strings t and e and len(·) denotes the length of the string. Patterns scoring less than an experimentally set threshold are eliminated. 3 Evaluation Our method (SysName) is implemented in Java. All experiments are run on a Dell OptiPlex 9020 with two quad-core i7-4770 Intel CPUs at 3.40GHz and 32GB RAM, running Linux Mint 17 Qiana. All resources used in the evaluation are made available for replicability.2 Datasets and metrics We use three groups of datasets in our evaluation (Table 1): The first two con- sist of the SemEval143 and SemEval154 datasets used for the aspect term extraction (ATE) and opinion where: is the set of reference aspect terms (cluster heads) is the Dameraau – Levenshtein distance is the length of the string gate of the set of terms as defined uted as: 1] and e and len(·) d are eliminated. tiPlex 9020 with nt 17 Qiana. All The first two con- ds of the noun-phrase clusters as a surrogate of the set of terms. Let T be the set of valid aspect terms as defined ed by an SAP P. The score of P is computed as: st(t,e) en(t) < 0.2)) · log |T| ⌫(P) 2 [0, 1] htein edit distance between two strings t and e and len(·) less than an experimentally set threshold are eliminated. . All experiments are run on a Dell OptiPlex 9020 with z and 32GB RAM, running Linux Mint 17 Qiana. All able for replicability.2 Typing and Generalisation Pattern scoring [Gupta and Manning, 2014] Score patterns on their ability to discriminate between correct and incorrect extractions No labelled dataset available → use cluster heads as surrogate labels no labelled dataset is available. We take the heads of the noun-phrase clusters as a surrogate of the set o valid aspects. The analysis is limited to aspect terms. Let T be the set of valid aspect terms as define above, and E be the set of aspect terms produced by an SAP P. The score of P is computed as: ⌫(P) = |T| P e2E(1 maxt2T (dist(t,e) len(t) < 0.2)) · log |T| ⌫(P) 2 [0, 1] where dist(t, e) denotes the Dameraau-Levenshtein edit distance between two strings t and e and len( denotes the length of the string. Patterns scoring less than an experimentally set threshold are eliminated 3 Evaluation Our method (SysName) is implemented in Java. All experiments are run on a Dell OptiPlex 9020 wit wo quad-core i7-4770 Intel CPUs at 3.40GHz and 32GB RAM, running Linux Mint 17 Qiana. A resources used in the evaluation are made available for replicability.2 Datasets and metrics We use three groups of datasets in our evaluation (Table 1): The first two con sist of the SemEval143 and SemEval154 datasets used for the aspect term extraction (ATE) and opinio arget expression (OTE) subtasks of the aspect-based sentiment analysis (ABSA) task. The datasets pro Patterns scoring less than an experimentally set threshold are eliminated
  • 15. Pattern Matching Pattern references nested patterns are not repeated, they reference to each others enables parallel SAP generalisation and matching ⟨ { JJ, #SAPbedroom , mid terrace }, property ⟩ ⟨ { Victorian, #SAPbedroom , JJ terrace }, property ⟩ ⟨ { Victorian, #SAPbedroom , mid JJ }, property ⟩ ⟨ { Victorian, #SAPbedroom , JJ }, property ⟩ SAPproperty ⟨ { Victorian, #SAPbedroom , mid terrace }, NN ⟩ SAPNN SAPbedroom ⟨ { two }, bedroom⟩ ⟨ { CD }, bedroom ⟩ How fast? Induction: 10-14 msec / sentence Matching 2-3 msec / text bottlenecks: morphological analysis and cPMI-optimal segmentation
  • 16. Evaluation Datasets SemEval OTE/ATE only useful for aspect terms We provide a SAED (Structured Aspect Extraction Dataset - http://bit.ly/2caeXf3) consists of both NUT and (semi-) formal English texts. We provide GS annotations for 150 texts equally distributed across the six domains. The GS provides and average of 355 aspect terms, 30 quantifiers, 430 qualifiers, and 45 nested aspects per domain. Annotations were produced by 6 independent anno- tators ( =87%). We use standard recall, precision, and F1 score metrics. However, due to the different granularity of the output produced by the systems and of the GS annotations, the definition of a correct extraction varies slightly with each evaluation task. Table 1: Datasets DATASET DOMAIN SIZE (#texts) SOURCES CATEGORY FORMALITY TYPE SemEval14 restaurants 3k + 800 GS (*) Citysearch service NUT evaluative laptops 3k + 800 GS (*) N/A product NUT evaluative SemEval15 restaurants 254 + 96 GS Citysearch service NUT evaluative hotels N/A + 30 GS Citysearch service NUT evaluative SAED chairs 94k + 25 GS Amazon, GumTree product NUT descriptive hotels 20k + 25 GS TripAdvisor service formal descriptive real estate 87k + 25 GS RightMove product semi-formal descriptive restaurants 115k + 25 GS TripAdvisor service formal descriptive shoes 46k + 25 GS Amazon, GumTree product NUT descriptive watches 10k + 25 GS Amazon, GumTree product NUT descriptive Comparative evaluation – Simplified SAE The method by (Kim et al., 2012), hence ATL, is currently the closest to SAE we are aware of. We have obtained from the authors the dataset used in their evaluation 2 All resources are available at http://bit.ly/29YtM3K and include: the SAED dataset and GS, our reimplementations of IIITH and ATL, a compiled version of SysName, and all output files generated by all systems. 3 4 http://alt.qcri.org/semeval2015/task12/ Systems The SemEval 14/15 systems IITH [Raju et al., 2009] ATL [Kim et al., 2012] ATEX [Zhang and Liu, 2014]
  • 17. Evaluation ATE setting (SemEval Dataset) 0 20 40 60 80 100 HIS_RD DLIREC (U) NRC-Can UNITOR (U) XRCE SAP_RI IITP SeemGo ATEX (U) IIITH (U) ATL (U) Sysname (U) Supervised Unsupervised Restaurants Laptops (a) SemEval14 Dataset 0 20 40 60 80 100 ISISLif LT3 (U) Elixa (U) Sentiue UFGRS Wnlp V3 IIITH (U) ATL (U) ATEX (U) SysName (U) Supervised Unsupervised Restaurants Hotels (b) SemEval15 Dataset 0 20 40 60 80 100 R P F1 R P F1 R P F1 R P F1 R P F1 R P F1 IIITH ATL ATEX SysName SemEval 2014 ATL (U) Sysname (U) vised 0 20 40 60 80 100 ISISLif LT3 (U) Elixa (U) Sentiue UFGRS Wnlp V3 IIITH (U) ATL (U) ATEX (U) SysName (U) Supervised Unsupervised Restaurants Hotels (b) SemEval15 Dataset SemEval 2015
  • 18. Evaluation Simplified SAE setting (SAE Dataset) but not an implementation of the system. We have reimplemented the method and successfully repro duced the experimental results described in the original paper. Figure 1 shows a comparison between AT and SysName on the SAED dataset. An extraction is correct if modifiers and aspect terms match exactl the GS annotations, and if modifiers are correctly typed as qualifiers or quantifiers. This is a simplifie SAE setting where we do not require correct linking of modifiers to aspect terms. SysName performs 33% 0 20 40 60 80 100 R P F1 R P F1 R P F1 R P F1 R P F1 R P F1 Chairs Hotels Real Estate Restaurants Shoes Watches ATL SysName Figure 1: SysName vs. ATL on simplified SAE (SAED dataset) better than ATL in average, outperforming it in all domains. Besides being unable to extract hierarchica structures, a visible issue in ATL is the inability to establish and leverage the semantic connection betwee Correct extraction: correct aspect term + correct modifier + correct typing for the modifier (i.e., qualifier / quantifier)
  • 19. Evaluation Full SAE setting (SAE Dataset) Correct extraction: correct aspect term + correct modifier + correct typing for the modifier (i.e., X–quantifier, Y–qualifier) + correct linking (modifier-entity, sub-patterns) (c) SAED Dataset Figure 2: SysName vs. others in ATE es is indeed a much more challenging task than simply identifying them. Another int s the impact of the generalization on the performance. Generalized SAPs produce 444 ons against the 386 of the ground ones (+15%). 0 20 40 60 80 100 R P F1 R P F1 R P F1 R P F1 R P F1 R P F1 Chairs Hotels Real Estate Restaurants Shoes Watches ATE Simpl. SAE Full SAE Figure 3: SysName on full SAE SAE is substantially harder than ATE/OTE and simplified SAE
  • 20. Evaluation Effect of corpus size (SAE Dataset) The larger the corpus… the better? 0 20 40 60 80 100 1% 5% 10% 25% 50% 100% AVG R AVG P AVG F1 (a) SAE Task 0 20 40 60 80 100 1% 5% 10% 25% 50% 100% AVG R AVG P AVG F1 (b) ATE Task Figure 4: Performance vs. corpus size (average – SAED dataset) 0% 100% P AVG F1 0 20 40 60 80 100 1% 5% 10% 25% 50% 100% AVG R AVG P AVG F1 (b) ATE Task nce vs. corpus size (average – SAED dataset) wn of this experiment by domain for the ATE and SAE tasks re- o draw further conclusions on the relationship between the size he SAPs. There is a relationship between the variety of features y to induce good quality SAPs. For domains such as, e.g., chairs, g from 25% of the size of the corpus we do not notice substantial n be explained by the nature of the features in these domains that models of the products, types of real estate properties, etc. In the s are much more variegated in features, e.g., restaurant and hotel SAE setting ATE setting
  • 21. Evaluation Effect of corpus size (SAE Dataset) Not necessarily… your often reach a point where more data is not going to help 0 20 40 60 80 100 1% 5% 10% 25% 50% 100% Chairs R Chairs P Chairs F1 (a) chairs 0 20 40 60 80 100 1% 5% 10% 25% 50% 100% Hotels R Hotels P Hotels F1 (b) hotels 0 20 40 60 80 100 1% 5% 10% 25% 50% 100% Real Estate R Real Estate P Real Estate F1 (c) realestate 0 20 40 60 80 100 1% 5% 10% 25% 50% 100% Restaurants R Restaurants P Restaurants F1 (d) restaurants 0 20 40 60 80 100 1% 5% 10% 25% 50% 100% Shoes R Shoes P Shoes F1 (e) shoes 0 20 40 60 80 100 1% 5% 10% 25% 50% 100% Watches R Watches P Watches F1 (f) watches
  • 22. What’s next Injecting supervision Several places…, clustering, pattern scoring, and typing probably the most important ones Dynamic cut-off thresholds Use test sets to adjust corpus size and thresholds Aspects not in NPs Named entities, relations, other grammatical forms e.g., living room with sash windows Automatically determine the domain Map the NP cluster heads to an existing KB (e.g., BabelNet) and use their graph for scoping
  • 24. References [Shin et al.2015] Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, and Christopher Re ́. 2015. Incremental knowledge base construction using deepdive. PVLDB, 8(11):1310–1321. [Raju et al.2009] S. Raju, P. Pingali, and V. Varma. 2009. An unsupervised approach to product attribute extraction. In Proc. of ECIR, pages 796–800. [Ramshaw and Mitchell1999] L. A. Ramshaw and M. P. Mitchell. 1999. Text chunking using transformation-based learning. In Armstrong S. et Al, editor, Natural Language Processing Using Very Large Corpora, volume 11 of Text, Speech and Language Technology, pages 157–176. [Kim et al.2012] D. S. Kim, K. Verma, and P. Z. Yeh. 2012. Building a lightweight semantic model for unsuper- vised information extraction on short listings. In Proc. of EMLNP, pages 1081–1092. [Zhang and Liu2014] Lei Zhang and Bing Liu, 2014. Aspect and Entity Extraction for Opinion Mining, pages 1–40. Springer Berlin Heidelberg.