Creativity detection in texts

Autor Conducător științific
Universitatea
Politehnica
București
Facultatea de
Automatică și
Calculatoare
Catedra de
Calculatoare
Creativity Detection in Texts
• Costin-Gabriel Chiru • Ştefan Trăuşan-Matu
Costin-Gabriel CHIRU
Politehnica University of Bucharest
E-mail: costin.chiru@cs.pub.ro

Content
• Introduction
• Creativity Measures
• Experiment Methodology
• Experiments and Results
• Conclusions
Creativity Detection in Texts ICIW201326.06.2013

Introduction (I)
• Goal: Automatically identify creativity in a
text.
• How?
– Define the elements that characterize a creative
text.
– Determine the most important features that
explain creativity.
– Build a model for automatic creativity detection (a
classifier).

Introduction (II)
• Definitions of Creativity:
– The ability to transcend traditional ideas, rules,
patterns, relationships, or the like, and to create
meaningful new ideas, forms, methods,
interpretations, etc. (Zhu, Xu, Khot, 2009).
– Creativity is typically thought of acting or the quality
of an unpredictable departure from the rules of
regular word formation (Renouf, 2007).
• Linguistic creativity = creativity in texts and
measures “new and creative ways of expressing a
given idea” (Veale, 2011).

Other approaches
• Manual Identification: 21 creative writers, 4 human judges  105
rated tuples like: (word, sentence, creativity) (Zhu, Xu, Khot, 2009).
• Machine learning algorithm using a linear regression model with 17
features (Zhu, Xu, Khot, 2009).
• Jordanous (2012), - SPECS: three steps procedure for determining
whether a computational system can be defined as creative or not.
• Understanding and using metaphors (Kovecses, 2011; Veale, 2006)
and analogies (Veale, 2006) or on explaining the appearance of new
words from already existing ones (Lehrer, 2007).
• Creativity detection in song lyrics (Hu and Yu, 2011) – uses three
measures for identifying mood and creativity in a lyric.

Creativity Measures
• A computational creativity measure should address
two aspects:
– Novelty: To what extent an item is different to the existing
samples of its genre?
– Quality: How good the item really is?
• We tried to capture these two criteria through nine
different measures: Type-to-Token Ratio, Word Norms
Fraction, Google Similarity Distance, Explicit Semantic
Analysis , Number of Named Entities, Named Entities
Score, Wordnet Similarity, Coherence measure, and
Latent Semantic Analysis (LSA) measures.

Experiment Methodology (I)
• Extract the news articles
from the Web and save
them into the database.
• Apply NLP Preprocessing
techniques.
• Apply NLP Categorization
and Tagging techniques.
Corpus &
Statistics
Web
Articles’
URLs
Text Preprocessing
Normalize Text
Segmentation
Tokenization
Stemming
Tokens &
Stems
Text
Extraction
URLs HTML
Text & Statistics
Categorization and
Tagging
Part-of-Speech Tagging
Named Entities Recognition
Chunking
Plain Text
Tokens &
Sentences
Named Entities
Corpus Acquisition
Text
Understanding

Experiment Methodology (II)
• Compute the value of
each measure for the
given text.
• Use Stepwise Logistic
Regression to select the
measures that best
describe creativity and
build the Classification
Model.
Results
Wikipedia
Results in
CSV & ARFF
Formats
Google
Search
Wordnet
Compute Measures
Type-to-Token Ratio
Word Norms Fraction
Google Similarity Distance
Explicit Semantic Analysis
Number of Named Entities
Named Entities Score
Wordnet-Similarity
Coherence measure
LSA measures
Categorized
Text
Categorization
and Tagging

The Corpus
• 185 articles on the US Election news taken from:
– 67 articles from The Onion, and
– 118 articles from 12 news sites from all over the
world: UK (BBC, Wired, The Independent, The Sun),
Canada (CBC), Australia (News.com.au, The Australian,
Sydney Morning Herald), USA (Foxnews and
Huffington Post), South Africa (News24), and New
Zealand (The NZ Herald).
• We made the assumption that The Onion articles
are creative.

Experiments (I)
• Two Experiments:
A. In order to assess the quality of our classifier  was
tested on the news articles that we have extracted,
and
B. Intended to measure the capacity of the classifier to
adapt to different kinds texts.
• A – Classifier Evaluation
– Identify the mix parameters of the 9 measures for the
logistic regression model.
– Apply feature selection.

Experiments (II)
Attribute Beta P value
Constant 1.830477 0.246
WordNet
Similarity -9.779 0
Named
Entities
Score 2.484793 0.0059
LSA cos.
Similarity
sentences 3.445448 0.0001
Number of
Named
Entities -2.89686 0.0053
Word
Norms
Fraction 3.585301 0.0378
Google Path
Similarity -3.25499 0.0538

Experiments (III)
• Therefore, the classifier for discriminating
creative from non-creative text is given by:
– Pr (Y = 1 | X1, ... , X9) = F(1.83 + 3.585* X2 + 3.255 * X3 -
2.897 * X5 + 2.485 * X6 - 9.779 * X7 + 3.445 * X9)
• Where and X2, X3, X5, X6, X7, X9 are
the scores obtained by each text for Word Norms
Fraction, Google Similarity Distance, Number of
Named Entities, Named Entities Score, Wordnet
Similarity, LSA.

Results (I)
• The obtained model was tested in a 10-fold
cross-validation:
• The accuracy for this experiment was 80.54%,
which is quite high, considering the difficulty
of this task.
Values prediction Confusion matrix
Real
Predicted
Creative Non-creative Sum Precision Recall
Creative 46 15 61 0.754 0.6866
Non-creative 21 103 124 0.8306 0.8729
All 67 118 185

Experiments (IV)
• B – Adaptability Experiment
– Tried to use the built classifier to evaluate 20 book reviews taken
from The SFU Review Corpus (Taboada, Anthony and Voll, 2006).
– The reviews were independently evaluated by 3 master students:
1 = creative texts, 2 = mildly creative texts, 3 = non-creative texts
– Inter-rater agreement Kappa Statistic was too low (perceived
agreement was Po = 0.45)  we considered binary classification:
• mildly creative texts = creative  Po = 0.633 + considering the majority
class  12 out of 20 were considered creative
• mildly creative texts = non - creative  Po = 0.733 + considering the
majority class  4 out of 20 were considered creative
– Since usually there are more non-creative texts than creative
ones, we considered the second situation (mildly creative texts =
non-creative)
– The classifier considered all the reviews as being non-creative 
80% accuracy (missed the 4 positive samples, only 1 of these
being considered creative by all 3 students).

Conclusions (I)
• We presented a model for discriminating creative from non-creative news
articles that was built combining nine different measures.
• The model could be improved by removing or changing the assumptions
that the The Onion articles are always creative.
• The feature selection revealed the following conclusions:
– The lack of creativity was best correlated with Word Norms Fraction which was
expected considering the definitions of creativity and of Word Norms Fraction.
Google Similarity Distance was in the same situation.
– Named Entities analysis showed that they are signs of a creative text as long as
not too many distinct such entities are used.
– Wordnet Similarity was the best evidence for creative texts, while LSA was
similar to the measures of Word Norms Fraction and Google Similarity Distance
in providing a measure for text “usualness” and therefore giving evidence of
non-creative texts. They also have similar weights in the final classifier. ESA had
no influence in the built classifier.
– Less coherent texts were expected to be more creative but coherence score
was found to have no influence in identifying creativity.

Conclusions (II)
• The second experiment revealed that there are “levels” of
creativity: satire news articles may be more creative than
books reviews, in general.
• Both experiments had around 80% accuracy, showing that
there might be a possibility that the classifier adapts well 
However, the lack of true positive examples from the second
experiment makes us be a little cautious in clearly stating
this fact.
• The classifier performed reasonably well at differentiating
articles from The Onion and from other news websites:
– Did we really identified creativity or we detected satire in fact?
– Increasing the size of the data set, and testing it further, could
shed some light on the decision of whether any of the two
assumptions stands and which of them is more adequate.

Questions
26.06.2013 ICIW2013
Thank you very much!
Creativity Detection in Texts

Creativity detection in texts

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Creativity detection in texts

Similar to Creativity detection in texts (20)

More from University Politehnica Bucharest

More from University Politehnica Bucharest (20)

Recently uploaded

Recently uploaded (20)

Creativity detection in texts