SlideShare a Scribd company logo
1 of 7
Download to read offline
Finding Contextual Relationships between Fashion Houses


                   Madiha Mubin                                 Sushant Shankar
           Department of Computer Science                 Department of Computer Science
                 Stanford University                            Stanford University
            mamubin@stanford.edu                           sshankar@stanford.edu




                      Abstract                         online blogs and articles to (i) find pairs of competi-
                                                       tive companies in the Fashion industry and (ii) to ex-
    Understanding how companies and products           tract context to provide meaning to those pairs (Fig-
    are compared online can provide insights for       ure 1). The notion of context can be extended to ge-
    analyzing a market. This paper proposes a
                                                       ographical location, various demographics, markets,
    method to bootstrap entities (‘players’) in a
    market and their relationship to each other.
                                                       and even descriptors like ‘launched a new line of
    Our corpus was high quality fashion blogs          prescription sunglasses’. For instance, from the sen-
    on eyewear. Our robust system takes the            tence ‘In recent news, Gucci’s rival Prada launched
    dataset and discovers competitive associations     a new line of retro sunglasses’, we want to extract
    between different fashion houses and returns       [‘Gucci’, ‘Prada’, ‘launched a new line of retro sun-
    weights for each association and phrases that      glasses’].
    describe them. This is done in three steps:
    a) looking for patterns in sentences that are
    of a competitive nature, b) using Parts of
    Speech tagging and Named Entity Recognizer
    to extract singleton and paired entities, and c)
    traversing the Typed Dependency Graph to ex-
    tract contextual phrases that describe the com-
    petitive association. Out of the top 10 fashion
    houses that we extracted, all the relations we
    caught were actual competitive relationships         Figure 1: Shows different types of context (e.g.
    between fashion houses. Of all long contex-        demographic and geographic) that can be associated
    tual phrases that we extracted, 92.1% were ac-     between two companies about a certain product.
    curate and informative.
                                                         In order to narrow the scope of the problem, we
                                                       focused on Eyewear, however our method can gen-
1   Introduction                                       eralize to other products from the fashion world.
Comparative analysis is an important component of
                                                       2 Related Work
natural language understanding. While it makes in-
tuitive sense to compare similar products or compet-   Recently, a weakly supervised method was proposed
ing companies, defining the similarity is tricky. For   to learn comparative questions and extract compara-
instance, Apple Inc. and Google Inc. are compared      tors simultaneously (Li et. al., 2010). The process
in the mobile industry, but if Google Inc. and Face-   begins with identifying an Indicative Extraction Pat-
book are compared, then it is more likely a compar-    tern (IEP) - a sequence that can be used to identify
ison based on their social networking products. In     other comparative questions. From this initial seed,
this paper, our goal is to extract information from    more comparative pairs are identified. For each
comparator pair extracted, all the questions contain-     which contained these words, but felt that the sen-
ing that pair are identified which allows recognition      tences around these also could potentially contain
of more comparator pairs. The comparative ques-           relations and contexts of a competitive nature.
tions and patterns are scored for reliability and those
that are deemed reliable are stored for use as seeds      3.2     Entity Pair Extraction
to improve performance over time. Comparator pat-
terns are generated using language rules that account
for lexical generalized and specialized patterns. The
reliability of each pattern at every iteration is com-
puted as weighted average of the performance of the
pattern so far in terms of number of the questions
extracted by the pattern and a look-ahead reliability
score which helps reduce the problem of underesti-
mation due to incomplete knowledge at a given iter-
ation.

3   Methodology
In this paper, we augmented the comparator extrac-
tion logic in three ways. First, we expanded our ex-
traction process to be able to deal with regular sen-
tences as well as questions that exhibited competi-         Figure 2: Shows our pipeline starting from high-
tive or comparative nature. Second, we used both          quality relation-rich blogs to extract patterns and us-
entities in each pair that we extracted from compar-      ing those as seeds to scrape more data which might
ative sentences to extract more information about         not contain as much of the desired content.
them to inform our pipeline. Finally, using single-
                                                          3.2.1    Relation-Rich Tier (Pairs)
tons and pairs of entities, we devised an algorithm to
extract context from the sentences containing those          We start with our dataset to                     ex-
entities.                                                 tract    reliable    patterns of the              type
   For this project, we were able to accumulate a         (CompanyX, CompanyY, [Context])                    that
set of approximately 300 fashion-related articles by      exhibit competitive behavior.
manually searching the web. In fact, the main ra-            Once we identified our target paragraphs, we used
tionale behind using fashion blogs was to find arti-       the Stanford CoreNLP parser to extract Parts-of-
cles that explicitly talked about fashion house rival-    Speech tags, Named Entity Recognition informa-
ries. One such example is http://www.christian-dior-      tion and collapsed typed dependency tree structures
glasses.com/articles/. These articles were blogs and      for each sentence (Toutanova, 2003; de Marneffe,
reports of trends and changes in the fashion indus-       2006). Our entity extraction method simply looked
try pertaining to eyewear from year 2010 to current       for POS NNP entities that had NER tags ‘ORGA-
time. Using this data source not only allowed us to       NIZATION’ or ‘PERSON’. Many of the entities we
extract good quality entitiy pairs, but helped us vali-   caught were fashion houses, but not all. For in-
date our content extraction pipeline as it was extract-   stance, we caught entities like ‘Tom Cruise’ from
ing temporally relevant information.                      sentence ‘Tom Cruise is a fan of Ray-Ban sun-
                                                          glasses’ or ‘Apple’ from sentence ‘Apple was an-
3.1 Pattern Generation for extraction                     other attendee in an event with a fashion house’.
We focused on relations between fashion houses of            We noticed a common style in our dataset. Often
a competitive nature. We looked for paragraphs in         after the entity, a sentence contained further speci-
our corpus that had words with prefixes ‘rival’ or         fication of the the line or type of product that was
‘compet’. We considered just looking for sentences        being compared or talked about, for instance, ‘Prada
sunglasses’. To extract this, we looked for POS NNS        Algorithm 1 Extracting Context From Sentence S
tags after any the entities found. For instance, one of    GetContext (E: entities in S, C: typed depen-
our patterns looked like: NNP (ORG) – NNS* – NNP           dency list for S):
(ORG). As we were looking only at fashion eyewear
sites, these NNS tags tended to denote ‘eyeglasses’          • T ← construct a tree from C
or ‘glasses’.                                                • Traverse from the left most subtree discarding
   We hypothesized that for sentences that contained           any subtree T such that there is an e ∈ E found
more than one entity, all of the possible entities             in T .
could be compared or grouped in some way with
each other. Therefore, sentences with more than a            • If there are subtrees left:
pair of entities gave us more entities to compare.
                                                                  – Use breadth-first search to assemble the
For instance, if we saw three entities(x, y, z), then
                                                                    context of the sentence starting from the
we generated relations (x, y), (x, z) and (y, z), i.e.
                                                                    first subtree after all the entities in E have
we generated all possible pairs from a list of enti-
                                                                    been visited.
ties. For example, from the sentence ‘Additionally,
another Christian Dior glasses competitor Mar-                    – R ← Concatenate the words, ignore ‘cop’
chon Eyewear renewed its agreement with NIKE.’,                     and ‘det’ for compression, recheck the or-
we generated three relations: (‘Christian Dior’.                    dering of the words from the actual sen-
‘Marchon Eyewear’), (‘Christian Dior’, ‘Nike’) and                  tence.
(‘Marchon Eyewear’, ‘Nike’).                                      – return R

3.2.2 Relation-Poor Tier (Singletons)                        • else
   Most sentences that talk about a fashion house or              – return None
new line will not contain its rival or competition.
However, if more is known about the context of that
rival or competition in isolation, it can help us un-
derstand the context in which it is being compared to
another company/line. When there is no more than
one entity in a sentence, then we call these single-
tons where the sentence could have some contextual
information about that entity.

3.3 Context Extraction
The context extraction method described in this pa-
per relies on the key assumption that sentences con-
taining multiple entities as well as context typically        Figure 3: Shows a collapsed typed dependency
introduced the entities in the first part of the sentence   tree for the sentence ‘This month Christian Dior
and the context in the second part. Consider the sen-      glasses competitor Marchon released a line aiming
tence: ‘This month Christian Dior glasses competi-         for the teen market’.
tor Marchon released a line aiming for the teen mar-          Following this assumption, we were able to de-
ket’. Here the entities identified are ‘Christian Dior’     sign an algorithm (Algorithm 1) for context extrac-
and ‘Marchon’ and the product being compared is            tion given a pair of entities that were previously
’glasses’. As depicted in Figure 3, the entities were      identified in the sentence by our specialized patterns.
introduced in the left sub-tree of the typed depen-
dency graph and context is within the right subtree.         Figure 4 shows a run of Algorithm 1. The left
This pattern was identified by manually analyzing a         most subtree is ignored because it contains entities
sample of sentences and by understanding the style         and the right subtree is traversed to obtain context.
of documentation of articles in our corpus.                As the subtree is traversed, words within ‘det’ or
‘cop’ dependencies are ignored.                         of the occurrence of each entity individually and the
   This reduced the number of traversals without al-    number of occurrences with each other entity, we
tering the context of the sentence and was specially    decided Pointwise Mutual Information with Contex-
useful for longer sentences.                            tual Rescaling was a metric that would be able to
                                                        capture how often an entity is seen with another en-
                                                        tity while taking into account how often it is seen by
                                                        itself.
                                                           Let P (ei ) be the probability of seeing entity ei
                                                        and P (ei , ej ) be the probability of seeing entity ei
                                                        and ej in a pair. We had an n × n frequency matrix
                                                        f of fashion houses. So fij represents the number of
                                                        times ei appeared with ej in our corpus.


  Figure 4: Shows the typed dependency tree that
                                                                               P (ei ,ej )
Algorithm 1 traverses. Nodes circled red are entities     • pmiij = log       P (ei )P (ej )
and those circles blue are part of the context.
   While a majority of sentences in our corpus fol-          (assume log(0) = 0)
lowed the assumption we specified earlier, there
were a few that did not. For instance, Figure 5 shows
such an example where the entity ‘Kate Spade’ ap-
                                                                                                              fij
pears at the end of the sentence causing the algo-        • scaledpmiij             =           pmiij   ×   fij +1   ×
                                                                      n             n
rithm to miss out on the context completely. We dis-          min(    k=1 fkj ,     k=1 fik )
                                                                     n            n
cuss the limitations and possible improvements to            min(    k=1 fkj ,    k=1 fik )+1

our algorithm in the discussion section.

                                                        Figure 6 below shows these adjusted PMI values for
                                                        all 169 of our entities and Figure 7 shows these PMI
                                                        values for top 10 of our entities (in terms of the num-
                                                        ber of mentions). Note that the PMI values are lower
                                                        for the top entities than for others - this is because for
                                                        the less mentioned entities, the times they are men-
                                                        tioned with another entity is a much higher ratio of
                                                        the total times of they’re mentioned.

  Figure 5: Shows how the sentence ‘Now that time          We set about manually checking relations from
has passed , Bebe is a now famous west coast based      the top 10 relations. To do this, we chose a PMI
company that rivals European companies such as          threshold 0.3 to consider relations from. We can
Kate Spade eyeglasses’ violated the key assumption.     generate a graph of relations using this threshold,
                                                        see Figure 8. Checking these relations, 100% of the
4   Evaluation                                          relations were corroborated by another source (i.e.
                                                        they were related and competed or worked together).
4.1 Paired Relations                                    Our relation extraction is of course restricted by our
As such there is no gold standard to evaluate rela-     corpus and amount of data, as there are many rela-
tions. However, we decided to come up with a met-       tions between fashion houses (not to mention other
ric that captures whether we think a relation is say-   entities) that we do not catch as they are elsewhere
ing something of significance. Since we have counts      on the web or unwritten.
Figure 8: Shows the relationships between the top
                                                         10 fashion houses. These entities were connected if
                                                         their PMI score is greater than or equal to 0.3.
                                                            Figure 6 also shows some interesting relationships
                                                         between not so frequently occuring fashion houses.
                                                         Using a higher threshold > 1, we were able to create
                                                         a landscape of relationships between those fashion
                                                         houses also.


   Figure 6: PMI with contextual discounting for all
fashion houses x fashion houses. The fashion houses
are ranked by the number of occurrences. Note that
the top fashion houses seem to be compared with all
other fashion houses - this is expected (the top fash-     Figure 9: Shows relationships between fashion
ion houses are seen as rivals by many other fashion      houses that were not in the top 10. These had higher
houses and also are mentioned more, so have more         PMI values and so they depict relationships with
customers, suppliers, etc.).                             higher confidence.

                                                         4.2   Context
                                                         For many of the edges in graph, we were able to
                                                         catch contextual phrases. Table 1 shows some con-
                                                         textual phrases that we do catch and Table 2 are
                                                         phrases that actually have the wrong meaning. For
                                                         Table 2, in the first case, the sentence was ‘In re-
                                                         cent Prada eyeglasses news, rival companies Altair
                                                         and Tommy Bahama will be teaming up to create
                                                         eyewear for the upcoming 10 years’ and the second
                                                         sentence was ‘Lastly , Kate Spade eyeglasses com-
                                                         petitors Altair and Tommy Bahama grew their cur-
                                                         rent partnership for creation eyeglass frames for a
                                                         long time’. The sentence structures that we miss are
                                                         often more complex than our algorithm can catch as
                                                         it talks about a entity x’s rival entity y who partners
  Figure 7: PMI with contextual discounting for the      with entity z and it is not right to say x partners with
top 10 fashion houses.                                   z.

   The signature of ‘Marchon’ is particularly strik-        To evaluate the accuracy of our contexts, we first
ing. We investigated this trend using other blogs and    look at unique contexts. Due to many short con-
newspaper sources and found out that Marchon is a        texts, we have many repetitions - 181 unique con-
supplier of eyewear to someof the top fashion houses     texts out of 429 total contexts. We found that none of
as well as a competitor to others.                       the contexts that were less than word length 3 were
Entity 1         Entity 2               PMI      Contextual Phrase
    Ralph Lauren     Michael Bastian        0.35     ‘Held event Monday promote his upcoming designer rimless line’
    Marcolin Group   Dolomiti               1.13     ‘released their new round eyeglasses’
    Marcolin Group   Kenneth Cole Prod.     1.20     ‘announced their licensing agreement for distribution of glasses’
    RayBan           Charmant Group         0.71     ‘told the press that a new vintage eyeglasses line was released’
    Christian Dior   S. Filo Group          1.54     ‘are partnering online sell their designer eyeglasses’
    Prada            Marchon                0.74     ‘released line aiming for teen market’
    Marchon          Prada                  0.74     ‘just launched iWear’

                                 Table 1: Positive Contextual Phrases extracted.

    Entity 1     Entity 2           PMI     Contextual Phrase
    Prada        Tommy Bahama       0.45    ‘teaming for up create for eyewear for the upcoming 10 years’
    Kate Spade   Tommy Bahama       0.54    ‘grew their with current partnership creation eyeglass frames long time’

                                Table 2: Negative Contextual Phrases extracted.


useful. It’s interesting to note that the word ‘do-       textual phrases that describe the competitive asso-
nate’ occurred 18 times, the words ‘launched’ and         ciation. We showed an evaluation metric that uses
‘renewed’ 12 times, and the word ‘agreed’ 10 ties.        Pointwise Mutual Information with Contextual Dis-
Some of these sentences did not elaborate further         counting to identify important or ‘surprising’ rela-
contextually, but others are contextual phrases that      tionships along with graph representations of the re-
our method does not catch.                                lationships between entities. We have also shown
   We found 183 contexts (42.7% of total contexts)        that our context extraction system is quite accurate
and 102 unique contexts (56.4% of total unique con-       – of all long contextual phrases that we extracted,
texts) that are longer than two words. Out of these       92.1% were accurate and informative. This method
102 unique contexts, we found three 2.9% sentences        can be applied to bootstrap and learn entities and re-
really did not carry much information at all (our         lationships in any market given the appropriate cor-
method caught the right phrase but did not mean           pus.
anything - a reflection of the sentence) and five
sentences (4.1%) that were in-naccurate (indicating       6 Future Work
problems with our method). This means that 92.1%          6.1   Improve contextual phrase extraction
of our contexts were both accurate and informative.
                                                          As pointed out in our evaluation, we miss phrases
Two of these were because we did not completely
                                                          that have more complex sentence structure. In ad-
put back prepositions into our context (as it would
                                                          dition, we do not catch sentences where the context
require more graph traversals); three were because
                                                          is mentioned before the entities are (this would re-
the sentences were of a more complex structure.
                                                          quire a simple modification of our algorithm where
5     Conclusion                                          we start at the end instead of the beginning).
                                                             In addition, our contextual phrase extraction
We have proposed a method to discover entities            sometimes identifies phrases that are common to
and relationships between fashion houses. In ad-          multiple entities, but often has phrases that are spe-
dition, we have shown a method to provide con-            cific to one entity – our method does not disam-
textual phrases describing these relationships. This      biguate these.
was done by: a) looking for patterns in sentences
that are of a competitive nature, b) using Parts of       6.2   Filtering contexts
Speech tagging and Named Entity Recognition to            Our method finds phrases for contexts that do not
extract singleton and paired entities, and c) travers-    provide much information. This is partially due to
ing the Typed Dependency Graph to extract con-            a method that is too simple and needs more rules
for different structures of sentences and contexts.        References
However, largely it is because most sentences do not       Sasha Li et. al 2010. Comparable Entity Mining from
contain relevant information and do not say much.             Comparative Question. Proceedings of the 48th An-
There needs to be a method to filter the contexts that         nual Meeting of the Association for Computational
are possibly relevant. A classifier can be built using         Linguistics:650–658.
features such as the length of the phrase, the number      Siddharthan 2011. Text Simplification Using Typed
of modifiers in the context (as a proxy for how spe-           Dependencies: A Comparison of the Robustness of
cific the context is), and other such features can be          Different Strategies. Proceedings of the 13th Eu-
                                                              ropean Workshop on Natural Language Generation
developed to classify an important context vs. a less
                                                              (ENLG):2–11.
informative one.
                                                           Kristina Toutanova and Christopher D. Manning 2003.
                                                              Enriching the Knowledge Sources Used in a Maximum
6.3 Types of relationship                                     Entropy Part-of-Speech Tagger. In Proceedings of the
We believe one of the most exciting contributions of          Joint SIGDAT Conference on Empirical Methods in
the paper is the rich set of relations and contexts that      Natural Language Processing and Very Large Corpora
                                                              (EMNLP/VLC-2000):63–70.
we are able to generate from a relatively small cor-
                                                           Kristina Toutanova, Dan Klein, Christopher Manning,
pus with just two seed prefixes to search for (‘rival’         and Yoram Singer. 2003. Feature-Rich Part-of-Speech
and ‘compete’). If a more extensive lexicon is de-            Tagging with a Cyclic Dependency Network. In Pro-
veloped for this type of relations, we believe we can         ceedings of HLT-NAACL 2003:252–259.
catch even more relations and contexts. Addition-          Marie-Catherine de Marneffe, Bill MacCartney and
ally, this paper is focused on catching relationships         Christopher D. Manning. 2006. Generating Typed
of a competitive or rival nature. What we found is            Dependency Parses from Phrase Structure Parses . In
that since we are looking for sentences around sen-           LREC 2006.
tences that contain ‘rival’ or ‘compete’, we are also      Jenny Rose Finkel, Trond Grenager, and Christopher
                                                              Manning. 2005. Incorporating Non-local Information
catching relationships that are partnerships (licens-
                                                              into Information Extraction Systems by Gibbs Sam-
ing agreements for example), or sentences that con-           pling . In Proceedings of the 43rd Annual Meeting of
tain multiple relationships (i.e., ‘Rival of x, y, is         the Association for Computational Linguistics (ACL
partnering with z)’. We can similarly create a lex-           2005):363–370.
icon for other types of relationships such as partner-
ships, supplier, customer, or other types of relation-
ships between companies and their products.

6.4 Obtaining more data
While these high quality fashion blogs are limited in
number, they give us a starting point by providing
us a set of high confidence relation triples. In future,
we hope to augment our corpus by adding fashion-
related articles from reliable newspapers. Most of
the fashion houses compete over more than one
products and by adding information for other prod-
ucts, we hope to be able to recover more interesting
competitive relationships.

Acknowledgments
We would like to thank Christopher Potts for helpful
discussions and constructive feedback and for Typed
Dependency visualization code.

More Related Content

Viewers also liked

Microsoft Microservices
Microsoft MicroservicesMicrosoft Microservices
Microsoft MicroservicesChase Aucoin
 
Stormshield Visibility Center
Stormshield Visibility CenterStormshield Visibility Center
Stormshield Visibility CenterNRC
 
Docker security introduction-task-2016
Docker security introduction-task-2016Docker security introduction-task-2016
Docker security introduction-task-2016Ricardo Gerardi
 
Interesting Places in Poland
Interesting Places in PolandInteresting Places in Poland
Interesting Places in Polandwojcik_agnieszka
 
A BRIEF OVERVIEW ON WILDLIFE MANAGEMENT
A BRIEF OVERVIEW ON WILDLIFE MANAGEMENTA BRIEF OVERVIEW ON WILDLIFE MANAGEMENT
A BRIEF OVERVIEW ON WILDLIFE MANAGEMENTPintu Kabiraj
 
Catálogo 15 16 elksport
Catálogo 15 16 elksportCatálogo 15 16 elksport
Catálogo 15 16 elksportElk Sport
 
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBig Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBigData_Europe
 
IBM Bluemix Nice meetup #5 - 20170504 - Container Service based on Kubernetes
IBM Bluemix Nice meetup #5 - 20170504 - Container Service based on KubernetesIBM Bluemix Nice meetup #5 - 20170504 - Container Service based on Kubernetes
IBM Bluemix Nice meetup #5 - 20170504 - Container Service based on KubernetesIBM France Lab
 
Monitoring and tuning your chef server - chef conf talk
Monitoring and tuning your chef server - chef conf talk Monitoring and tuning your chef server - chef conf talk
Monitoring and tuning your chef server - chef conf talk Andrew DuFour
 
Cisco Network Functions Virtualization Infrastructure (NFVI)
Cisco Network Functions Virtualization Infrastructure (NFVI)Cisco Network Functions Virtualization Infrastructure (NFVI)
Cisco Network Functions Virtualization Infrastructure (NFVI)Cisco Russia
 
Retelling nonfiction
Retelling nonfictionRetelling nonfiction
Retelling nonfictionEmily Kissner
 
Reversing Engineering a Web Application - For fun, behavior and detection
Reversing Engineering a Web Application - For fun, behavior and detectionReversing Engineering a Web Application - For fun, behavior and detection
Reversing Engineering a Web Application - For fun, behavior and detectionRodrigo Montoro
 
Docker swarm-mike-goelzer-mv-meetup-45min-workshop 02242016 (1)
Docker swarm-mike-goelzer-mv-meetup-45min-workshop 02242016 (1)Docker swarm-mike-goelzer-mv-meetup-45min-workshop 02242016 (1)
Docker swarm-mike-goelzer-mv-meetup-45min-workshop 02242016 (1)Michelle Antebi
 
Do we need a bigger dev data culture
Do we need a bigger dev data cultureDo we need a bigger dev data culture
Do we need a bigger dev data cultureSimon Dittlmann
 
Performance monitoring and call tracing in microservice environments
Performance monitoring and call tracing in microservice environmentsPerformance monitoring and call tracing in microservice environments
Performance monitoring and call tracing in microservice environmentsMartin Gutenbrunner
 
How to Scale Your Architecture and DevOps Practices for Big Data Applications
How to Scale Your Architecture and DevOps Practices for Big Data ApplicationsHow to Scale Your Architecture and DevOps Practices for Big Data Applications
How to Scale Your Architecture and DevOps Practices for Big Data ApplicationsAmazon Web Services
 

Viewers also liked (19)

Microsoft Microservices
Microsoft MicroservicesMicrosoft Microservices
Microsoft Microservices
 
Stormshield Visibility Center
Stormshield Visibility CenterStormshield Visibility Center
Stormshield Visibility Center
 
Docker security introduction-task-2016
Docker security introduction-task-2016Docker security introduction-task-2016
Docker security introduction-task-2016
 
Interesting Places in Poland
Interesting Places in PolandInteresting Places in Poland
Interesting Places in Poland
 
A BRIEF OVERVIEW ON WILDLIFE MANAGEMENT
A BRIEF OVERVIEW ON WILDLIFE MANAGEMENTA BRIEF OVERVIEW ON WILDLIFE MANAGEMENT
A BRIEF OVERVIEW ON WILDLIFE MANAGEMENT
 
Catálogo 15 16 elksport
Catálogo 15 16 elksportCatálogo 15 16 elksport
Catálogo 15 16 elksport
 
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBig Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
 
IBM Bluemix Nice meetup #5 - 20170504 - Container Service based on Kubernetes
IBM Bluemix Nice meetup #5 - 20170504 - Container Service based on KubernetesIBM Bluemix Nice meetup #5 - 20170504 - Container Service based on Kubernetes
IBM Bluemix Nice meetup #5 - 20170504 - Container Service based on Kubernetes
 
Monitoring and tuning your chef server - chef conf talk
Monitoring and tuning your chef server - chef conf talk Monitoring and tuning your chef server - chef conf talk
Monitoring and tuning your chef server - chef conf talk
 
Cisco Network Functions Virtualization Infrastructure (NFVI)
Cisco Network Functions Virtualization Infrastructure (NFVI)Cisco Network Functions Virtualization Infrastructure (NFVI)
Cisco Network Functions Virtualization Infrastructure (NFVI)
 
Retelling nonfiction
Retelling nonfictionRetelling nonfiction
Retelling nonfiction
 
Reversing Engineering a Web Application - For fun, behavior and detection
Reversing Engineering a Web Application - For fun, behavior and detectionReversing Engineering a Web Application - For fun, behavior and detection
Reversing Engineering a Web Application - For fun, behavior and detection
 
Docker swarm-mike-goelzer-mv-meetup-45min-workshop 02242016 (1)
Docker swarm-mike-goelzer-mv-meetup-45min-workshop 02242016 (1)Docker swarm-mike-goelzer-mv-meetup-45min-workshop 02242016 (1)
Docker swarm-mike-goelzer-mv-meetup-45min-workshop 02242016 (1)
 
Do we need a bigger dev data culture
Do we need a bigger dev data cultureDo we need a bigger dev data culture
Do we need a bigger dev data culture
 
Performance monitoring and call tracing in microservice environments
Performance monitoring and call tracing in microservice environmentsPerformance monitoring and call tracing in microservice environments
Performance monitoring and call tracing in microservice environments
 
How to Scale Your Architecture and DevOps Practices for Big Data Applications
How to Scale Your Architecture and DevOps Practices for Big Data ApplicationsHow to Scale Your Architecture and DevOps Practices for Big Data Applications
How to Scale Your Architecture and DevOps Practices for Big Data Applications
 
What is dev ops?
What is dev ops?What is dev ops?
What is dev ops?
 
Elk stack
Elk stackElk stack
Elk stack
 
IBM Containers- Bluemix
IBM Containers- BluemixIBM Containers- Bluemix
IBM Containers- Bluemix
 

Similar to Finding Contextual Relationships between Fashion Houses

Unsupervised Learning of a Social Network from a Multiple-Source News Corpus
Unsupervised Learning of a Social Network from a Multiple-Source News CorpusUnsupervised Learning of a Social Network from a Multiple-Source News Corpus
Unsupervised Learning of a Social Network from a Multiple-Source News Corpushtanev
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Mariana Damova, Ph.D
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Miningiosrjce
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Cuong Tran Van
 
Quality, Quantity, Web and Semantics
Quality, Quantity, Web and SemanticsQuality, Quantity, Web and Semantics
Quality, Quantity, Web and SemanticsZemanta
 
Quality, quantity, web and semantics
Quality, quantity, web and semanticsQuality, quantity, web and semantics
Quality, quantity, web and semanticsAndraz Tori
 
How to start for machine learning career
How to start for machine learning careerHow to start for machine learning career
How to start for machine learning careerBigAnalytics .me
 
Repositories thru the looking glass
Repositories thru the looking glassRepositories thru the looking glass
Repositories thru the looking glassEduserv Foundation
 
Biperpedia: An ontology of Search Application
Biperpedia: An ontology of Search ApplicationBiperpedia: An ontology of Search Application
Biperpedia: An ontology of Search ApplicationHarsh Kevadia
 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data scienceNikhil Jaiswal
 
The need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsThe need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsBen DeMott
 
Lieke Verhelst: Ontology Development ..the Lean way
Lieke Verhelst: Ontology Development ..the Lean wayLieke Verhelst: Ontology Development ..the Lean way
Lieke Verhelst: Ontology Development ..the Lean waySemantic Web Company
 
Lean ontology development
Lean ontology developmentLean ontology development
Lean ontology developmentLieke Verhelst
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory acijjournal
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations onijistjournal
 
Object Oriented Design
Object Oriented DesignObject Oriented Design
Object Oriented DesignAravinth NSP
 

Similar to Finding Contextual Relationships between Fashion Houses (20)

Unsupervised Learning of a Social Network from a Multiple-Source News Corpus
Unsupervised Learning of a Social Network from a Multiple-Source News CorpusUnsupervised Learning of a Social Network from a Multiple-Source News Corpus
Unsupervised Learning of a Social Network from a Multiple-Source News Corpus
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Mining
 
E017252831
E017252831E017252831
E017252831
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
 
Quality, Quantity, Web and Semantics
Quality, Quantity, Web and SemanticsQuality, Quantity, Web and Semantics
Quality, Quantity, Web and Semantics
 
Quality, quantity, web and semantics
Quality, quantity, web and semanticsQuality, quantity, web and semantics
Quality, quantity, web and semantics
 
How to start for machine learning career
How to start for machine learning careerHow to start for machine learning career
How to start for machine learning career
 
Repositories thru the looking glass
Repositories thru the looking glassRepositories thru the looking glass
Repositories thru the looking glass
 
Biperpedia: An ontology of Search Application
Biperpedia: An ontology of Search ApplicationBiperpedia: An ontology of Search Application
Biperpedia: An ontology of Search Application
 
Unit 18
Unit 18Unit 18
Unit 18
 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data science
 
The need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsThe need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementations
 
Lieke Verhelst: Ontology Development ..the Lean way
Lieke Verhelst: Ontology Development ..the Lean wayLieke Verhelst: Ontology Development ..the Lean way
Lieke Verhelst: Ontology Development ..the Lean way
 
Lean ontology development
Lean ontology developmentLean ontology development
Lean ontology development
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 
Object Oriented Design
Object Oriented DesignObject Oriented Design
Object Oriented Design
 
Object Oriented Design
Object Oriented DesignObject Oriented Design
Object Oriented Design
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Finding Contextual Relationships between Fashion Houses

  • 1. Finding Contextual Relationships between Fashion Houses Madiha Mubin Sushant Shankar Department of Computer Science Department of Computer Science Stanford University Stanford University mamubin@stanford.edu sshankar@stanford.edu Abstract online blogs and articles to (i) find pairs of competi- tive companies in the Fashion industry and (ii) to ex- Understanding how companies and products tract context to provide meaning to those pairs (Fig- are compared online can provide insights for ure 1). The notion of context can be extended to ge- analyzing a market. This paper proposes a ographical location, various demographics, markets, method to bootstrap entities (‘players’) in a market and their relationship to each other. and even descriptors like ‘launched a new line of Our corpus was high quality fashion blogs prescription sunglasses’. For instance, from the sen- on eyewear. Our robust system takes the tence ‘In recent news, Gucci’s rival Prada launched dataset and discovers competitive associations a new line of retro sunglasses’, we want to extract between different fashion houses and returns [‘Gucci’, ‘Prada’, ‘launched a new line of retro sun- weights for each association and phrases that glasses’]. describe them. This is done in three steps: a) looking for patterns in sentences that are of a competitive nature, b) using Parts of Speech tagging and Named Entity Recognizer to extract singleton and paired entities, and c) traversing the Typed Dependency Graph to ex- tract contextual phrases that describe the com- petitive association. Out of the top 10 fashion houses that we extracted, all the relations we caught were actual competitive relationships Figure 1: Shows different types of context (e.g. between fashion houses. Of all long contex- demographic and geographic) that can be associated tual phrases that we extracted, 92.1% were ac- between two companies about a certain product. curate and informative. In order to narrow the scope of the problem, we focused on Eyewear, however our method can gen- 1 Introduction eralize to other products from the fashion world. Comparative analysis is an important component of 2 Related Work natural language understanding. While it makes in- tuitive sense to compare similar products or compet- Recently, a weakly supervised method was proposed ing companies, defining the similarity is tricky. For to learn comparative questions and extract compara- instance, Apple Inc. and Google Inc. are compared tors simultaneously (Li et. al., 2010). The process in the mobile industry, but if Google Inc. and Face- begins with identifying an Indicative Extraction Pat- book are compared, then it is more likely a compar- tern (IEP) - a sequence that can be used to identify ison based on their social networking products. In other comparative questions. From this initial seed, this paper, our goal is to extract information from more comparative pairs are identified. For each
  • 2. comparator pair extracted, all the questions contain- which contained these words, but felt that the sen- ing that pair are identified which allows recognition tences around these also could potentially contain of more comparator pairs. The comparative ques- relations and contexts of a competitive nature. tions and patterns are scored for reliability and those that are deemed reliable are stored for use as seeds 3.2 Entity Pair Extraction to improve performance over time. Comparator pat- terns are generated using language rules that account for lexical generalized and specialized patterns. The reliability of each pattern at every iteration is com- puted as weighted average of the performance of the pattern so far in terms of number of the questions extracted by the pattern and a look-ahead reliability score which helps reduce the problem of underesti- mation due to incomplete knowledge at a given iter- ation. 3 Methodology In this paper, we augmented the comparator extrac- tion logic in three ways. First, we expanded our ex- traction process to be able to deal with regular sen- tences as well as questions that exhibited competi- Figure 2: Shows our pipeline starting from high- tive or comparative nature. Second, we used both quality relation-rich blogs to extract patterns and us- entities in each pair that we extracted from compar- ing those as seeds to scrape more data which might ative sentences to extract more information about not contain as much of the desired content. them to inform our pipeline. Finally, using single- 3.2.1 Relation-Rich Tier (Pairs) tons and pairs of entities, we devised an algorithm to extract context from the sentences containing those We start with our dataset to ex- entities. tract reliable patterns of the type For this project, we were able to accumulate a (CompanyX, CompanyY, [Context]) that set of approximately 300 fashion-related articles by exhibit competitive behavior. manually searching the web. In fact, the main ra- Once we identified our target paragraphs, we used tionale behind using fashion blogs was to find arti- the Stanford CoreNLP parser to extract Parts-of- cles that explicitly talked about fashion house rival- Speech tags, Named Entity Recognition informa- ries. One such example is http://www.christian-dior- tion and collapsed typed dependency tree structures glasses.com/articles/. These articles were blogs and for each sentence (Toutanova, 2003; de Marneffe, reports of trends and changes in the fashion indus- 2006). Our entity extraction method simply looked try pertaining to eyewear from year 2010 to current for POS NNP entities that had NER tags ‘ORGA- time. Using this data source not only allowed us to NIZATION’ or ‘PERSON’. Many of the entities we extract good quality entitiy pairs, but helped us vali- caught were fashion houses, but not all. For in- date our content extraction pipeline as it was extract- stance, we caught entities like ‘Tom Cruise’ from ing temporally relevant information. sentence ‘Tom Cruise is a fan of Ray-Ban sun- glasses’ or ‘Apple’ from sentence ‘Apple was an- 3.1 Pattern Generation for extraction other attendee in an event with a fashion house’. We focused on relations between fashion houses of We noticed a common style in our dataset. Often a competitive nature. We looked for paragraphs in after the entity, a sentence contained further speci- our corpus that had words with prefixes ‘rival’ or fication of the the line or type of product that was ‘compet’. We considered just looking for sentences being compared or talked about, for instance, ‘Prada
  • 3. sunglasses’. To extract this, we looked for POS NNS Algorithm 1 Extracting Context From Sentence S tags after any the entities found. For instance, one of GetContext (E: entities in S, C: typed depen- our patterns looked like: NNP (ORG) – NNS* – NNP dency list for S): (ORG). As we were looking only at fashion eyewear sites, these NNS tags tended to denote ‘eyeglasses’ • T ← construct a tree from C or ‘glasses’. • Traverse from the left most subtree discarding We hypothesized that for sentences that contained any subtree T such that there is an e ∈ E found more than one entity, all of the possible entities in T . could be compared or grouped in some way with each other. Therefore, sentences with more than a • If there are subtrees left: pair of entities gave us more entities to compare. – Use breadth-first search to assemble the For instance, if we saw three entities(x, y, z), then context of the sentence starting from the we generated relations (x, y), (x, z) and (y, z), i.e. first subtree after all the entities in E have we generated all possible pairs from a list of enti- been visited. ties. For example, from the sentence ‘Additionally, another Christian Dior glasses competitor Mar- – R ← Concatenate the words, ignore ‘cop’ chon Eyewear renewed its agreement with NIKE.’, and ‘det’ for compression, recheck the or- we generated three relations: (‘Christian Dior’. dering of the words from the actual sen- ‘Marchon Eyewear’), (‘Christian Dior’, ‘Nike’) and tence. (‘Marchon Eyewear’, ‘Nike’). – return R 3.2.2 Relation-Poor Tier (Singletons) • else Most sentences that talk about a fashion house or – return None new line will not contain its rival or competition. However, if more is known about the context of that rival or competition in isolation, it can help us un- derstand the context in which it is being compared to another company/line. When there is no more than one entity in a sentence, then we call these single- tons where the sentence could have some contextual information about that entity. 3.3 Context Extraction The context extraction method described in this pa- per relies on the key assumption that sentences con- taining multiple entities as well as context typically Figure 3: Shows a collapsed typed dependency introduced the entities in the first part of the sentence tree for the sentence ‘This month Christian Dior and the context in the second part. Consider the sen- glasses competitor Marchon released a line aiming tence: ‘This month Christian Dior glasses competi- for the teen market’. tor Marchon released a line aiming for the teen mar- Following this assumption, we were able to de- ket’. Here the entities identified are ‘Christian Dior’ sign an algorithm (Algorithm 1) for context extrac- and ‘Marchon’ and the product being compared is tion given a pair of entities that were previously ’glasses’. As depicted in Figure 3, the entities were identified in the sentence by our specialized patterns. introduced in the left sub-tree of the typed depen- dency graph and context is within the right subtree. Figure 4 shows a run of Algorithm 1. The left This pattern was identified by manually analyzing a most subtree is ignored because it contains entities sample of sentences and by understanding the style and the right subtree is traversed to obtain context. of documentation of articles in our corpus. As the subtree is traversed, words within ‘det’ or
  • 4. ‘cop’ dependencies are ignored. of the occurrence of each entity individually and the This reduced the number of traversals without al- number of occurrences with each other entity, we tering the context of the sentence and was specially decided Pointwise Mutual Information with Contex- useful for longer sentences. tual Rescaling was a metric that would be able to capture how often an entity is seen with another en- tity while taking into account how often it is seen by itself. Let P (ei ) be the probability of seeing entity ei and P (ei , ej ) be the probability of seeing entity ei and ej in a pair. We had an n × n frequency matrix f of fashion houses. So fij represents the number of times ei appeared with ej in our corpus. Figure 4: Shows the typed dependency tree that P (ei ,ej ) Algorithm 1 traverses. Nodes circled red are entities • pmiij = log P (ei )P (ej ) and those circles blue are part of the context. While a majority of sentences in our corpus fol- (assume log(0) = 0) lowed the assumption we specified earlier, there were a few that did not. For instance, Figure 5 shows such an example where the entity ‘Kate Spade’ ap- fij pears at the end of the sentence causing the algo- • scaledpmiij = pmiij × fij +1 × n n rithm to miss out on the context completely. We dis- min( k=1 fkj , k=1 fik ) n n cuss the limitations and possible improvements to min( k=1 fkj , k=1 fik )+1 our algorithm in the discussion section. Figure 6 below shows these adjusted PMI values for all 169 of our entities and Figure 7 shows these PMI values for top 10 of our entities (in terms of the num- ber of mentions). Note that the PMI values are lower for the top entities than for others - this is because for the less mentioned entities, the times they are men- tioned with another entity is a much higher ratio of the total times of they’re mentioned. Figure 5: Shows how the sentence ‘Now that time We set about manually checking relations from has passed , Bebe is a now famous west coast based the top 10 relations. To do this, we chose a PMI company that rivals European companies such as threshold 0.3 to consider relations from. We can Kate Spade eyeglasses’ violated the key assumption. generate a graph of relations using this threshold, see Figure 8. Checking these relations, 100% of the 4 Evaluation relations were corroborated by another source (i.e. they were related and competed or worked together). 4.1 Paired Relations Our relation extraction is of course restricted by our As such there is no gold standard to evaluate rela- corpus and amount of data, as there are many rela- tions. However, we decided to come up with a met- tions between fashion houses (not to mention other ric that captures whether we think a relation is say- entities) that we do not catch as they are elsewhere ing something of significance. Since we have counts on the web or unwritten.
  • 5. Figure 8: Shows the relationships between the top 10 fashion houses. These entities were connected if their PMI score is greater than or equal to 0.3. Figure 6 also shows some interesting relationships between not so frequently occuring fashion houses. Using a higher threshold > 1, we were able to create a landscape of relationships between those fashion houses also. Figure 6: PMI with contextual discounting for all fashion houses x fashion houses. The fashion houses are ranked by the number of occurrences. Note that the top fashion houses seem to be compared with all other fashion houses - this is expected (the top fash- Figure 9: Shows relationships between fashion ion houses are seen as rivals by many other fashion houses that were not in the top 10. These had higher houses and also are mentioned more, so have more PMI values and so they depict relationships with customers, suppliers, etc.). higher confidence. 4.2 Context For many of the edges in graph, we were able to catch contextual phrases. Table 1 shows some con- textual phrases that we do catch and Table 2 are phrases that actually have the wrong meaning. For Table 2, in the first case, the sentence was ‘In re- cent Prada eyeglasses news, rival companies Altair and Tommy Bahama will be teaming up to create eyewear for the upcoming 10 years’ and the second sentence was ‘Lastly , Kate Spade eyeglasses com- petitors Altair and Tommy Bahama grew their cur- rent partnership for creation eyeglass frames for a long time’. The sentence structures that we miss are often more complex than our algorithm can catch as it talks about a entity x’s rival entity y who partners Figure 7: PMI with contextual discounting for the with entity z and it is not right to say x partners with top 10 fashion houses. z. The signature of ‘Marchon’ is particularly strik- To evaluate the accuracy of our contexts, we first ing. We investigated this trend using other blogs and look at unique contexts. Due to many short con- newspaper sources and found out that Marchon is a texts, we have many repetitions - 181 unique con- supplier of eyewear to someof the top fashion houses texts out of 429 total contexts. We found that none of as well as a competitor to others. the contexts that were less than word length 3 were
  • 6. Entity 1 Entity 2 PMI Contextual Phrase Ralph Lauren Michael Bastian 0.35 ‘Held event Monday promote his upcoming designer rimless line’ Marcolin Group Dolomiti 1.13 ‘released their new round eyeglasses’ Marcolin Group Kenneth Cole Prod. 1.20 ‘announced their licensing agreement for distribution of glasses’ RayBan Charmant Group 0.71 ‘told the press that a new vintage eyeglasses line was released’ Christian Dior S. Filo Group 1.54 ‘are partnering online sell their designer eyeglasses’ Prada Marchon 0.74 ‘released line aiming for teen market’ Marchon Prada 0.74 ‘just launched iWear’ Table 1: Positive Contextual Phrases extracted. Entity 1 Entity 2 PMI Contextual Phrase Prada Tommy Bahama 0.45 ‘teaming for up create for eyewear for the upcoming 10 years’ Kate Spade Tommy Bahama 0.54 ‘grew their with current partnership creation eyeglass frames long time’ Table 2: Negative Contextual Phrases extracted. useful. It’s interesting to note that the word ‘do- textual phrases that describe the competitive asso- nate’ occurred 18 times, the words ‘launched’ and ciation. We showed an evaluation metric that uses ‘renewed’ 12 times, and the word ‘agreed’ 10 ties. Pointwise Mutual Information with Contextual Dis- Some of these sentences did not elaborate further counting to identify important or ‘surprising’ rela- contextually, but others are contextual phrases that tionships along with graph representations of the re- our method does not catch. lationships between entities. We have also shown We found 183 contexts (42.7% of total contexts) that our context extraction system is quite accurate and 102 unique contexts (56.4% of total unique con- – of all long contextual phrases that we extracted, texts) that are longer than two words. Out of these 92.1% were accurate and informative. This method 102 unique contexts, we found three 2.9% sentences can be applied to bootstrap and learn entities and re- really did not carry much information at all (our lationships in any market given the appropriate cor- method caught the right phrase but did not mean pus. anything - a reflection of the sentence) and five sentences (4.1%) that were in-naccurate (indicating 6 Future Work problems with our method). This means that 92.1% 6.1 Improve contextual phrase extraction of our contexts were both accurate and informative. As pointed out in our evaluation, we miss phrases Two of these were because we did not completely that have more complex sentence structure. In ad- put back prepositions into our context (as it would dition, we do not catch sentences where the context require more graph traversals); three were because is mentioned before the entities are (this would re- the sentences were of a more complex structure. quire a simple modification of our algorithm where 5 Conclusion we start at the end instead of the beginning). In addition, our contextual phrase extraction We have proposed a method to discover entities sometimes identifies phrases that are common to and relationships between fashion houses. In ad- multiple entities, but often has phrases that are spe- dition, we have shown a method to provide con- cific to one entity – our method does not disam- textual phrases describing these relationships. This biguate these. was done by: a) looking for patterns in sentences that are of a competitive nature, b) using Parts of 6.2 Filtering contexts Speech tagging and Named Entity Recognition to Our method finds phrases for contexts that do not extract singleton and paired entities, and c) travers- provide much information. This is partially due to ing the Typed Dependency Graph to extract con- a method that is too simple and needs more rules
  • 7. for different structures of sentences and contexts. References However, largely it is because most sentences do not Sasha Li et. al 2010. Comparable Entity Mining from contain relevant information and do not say much. Comparative Question. Proceedings of the 48th An- There needs to be a method to filter the contexts that nual Meeting of the Association for Computational are possibly relevant. A classifier can be built using Linguistics:650–658. features such as the length of the phrase, the number Siddharthan 2011. Text Simplification Using Typed of modifiers in the context (as a proxy for how spe- Dependencies: A Comparison of the Robustness of cific the context is), and other such features can be Different Strategies. Proceedings of the 13th Eu- ropean Workshop on Natural Language Generation developed to classify an important context vs. a less (ENLG):2–11. informative one. Kristina Toutanova and Christopher D. Manning 2003. Enriching the Knowledge Sources Used in a Maximum 6.3 Types of relationship Entropy Part-of-Speech Tagger. In Proceedings of the We believe one of the most exciting contributions of Joint SIGDAT Conference on Empirical Methods in the paper is the rich set of relations and contexts that Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000):63–70. we are able to generate from a relatively small cor- Kristina Toutanova, Dan Klein, Christopher Manning, pus with just two seed prefixes to search for (‘rival’ and Yoram Singer. 2003. Feature-Rich Part-of-Speech and ‘compete’). If a more extensive lexicon is de- Tagging with a Cyclic Dependency Network. In Pro- veloped for this type of relations, we believe we can ceedings of HLT-NAACL 2003:252–259. catch even more relations and contexts. Addition- Marie-Catherine de Marneffe, Bill MacCartney and ally, this paper is focused on catching relationships Christopher D. Manning. 2006. Generating Typed of a competitive or rival nature. What we found is Dependency Parses from Phrase Structure Parses . In that since we are looking for sentences around sen- LREC 2006. tences that contain ‘rival’ or ‘compete’, we are also Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating Non-local Information catching relationships that are partnerships (licens- into Information Extraction Systems by Gibbs Sam- ing agreements for example), or sentences that con- pling . In Proceedings of the 43rd Annual Meeting of tain multiple relationships (i.e., ‘Rival of x, y, is the Association for Computational Linguistics (ACL partnering with z)’. We can similarly create a lex- 2005):363–370. icon for other types of relationships such as partner- ships, supplier, customer, or other types of relation- ships between companies and their products. 6.4 Obtaining more data While these high quality fashion blogs are limited in number, they give us a starting point by providing us a set of high confidence relation triples. In future, we hope to augment our corpus by adding fashion- related articles from reliable newspapers. Most of the fashion houses compete over more than one products and by adding information for other prod- ucts, we hope to be able to recover more interesting competitive relationships. Acknowledgments We would like to thank Christopher Potts for helpful discussions and constructive feedback and for Typed Dependency visualization code.