SlideShare a Scribd company logo
1 of 17
Autor Conducător științific
Universitatea
Politehnica
București
Facultatea de
Automatică și
Calculatoare
Catedra de
Calculatoare
Creativity Detection in Texts
• Costin-Gabriel Chiru • Ştefan Trăuşan-Matu
Costin-Gabriel CHIRU
Politehnica University of Bucharest
E-mail: costin.chiru@cs.pub.ro
Content
• Introduction
• Creativity Measures
• Experiment Methodology
• Experiments and Results
• Conclusions
Creativity Detection in Texts ICIW201326.06.2013
Introduction (I)
• Goal: Automatically identify creativity in a
text.
• How?
– Define the elements that characterize a creative
text.
– Determine the most important features that
explain creativity.
– Build a model for automatic creativity detection (a
classifier).
Creativity Detection in Texts ICIW201326.06.2013
Introduction (II)
• Definitions of Creativity:
– The ability to transcend traditional ideas, rules,
patterns, relationships, or the like, and to create
meaningful new ideas, forms, methods,
interpretations, etc. (Zhu, Xu, Khot, 2009).
– Creativity is typically thought of acting or the quality
of an unpredictable departure from the rules of
regular word formation (Renouf, 2007).
• Linguistic creativity = creativity in texts and
measures “new and creative ways of expressing a
given idea” (Veale, 2011).
Creativity Detection in Texts ICIW201326.06.2013
Other approaches
• Manual Identification: 21 creative writers, 4 human judges  105
rated tuples like: (word, sentence, creativity) (Zhu, Xu, Khot, 2009).
• Machine learning algorithm using a linear regression model with 17
features (Zhu, Xu, Khot, 2009).
• Jordanous (2012), - SPECS: three steps procedure for determining
whether a computational system can be defined as creative or not.
• Understanding and using metaphors (Kovecses, 2011; Veale, 2006)
and analogies (Veale, 2006) or on explaining the appearance of new
words from already existing ones (Lehrer, 2007).
• Creativity detection in song lyrics (Hu and Yu, 2011) – uses three
measures for identifying mood and creativity in a lyric.
Creativity Detection in Texts ICIW201326.06.2013
Creativity Measures
• A computational creativity measure should address
two aspects:
– Novelty: To what extent an item is different to the existing
samples of its genre?
– Quality: How good the item really is?
• We tried to capture these two criteria through nine
different measures: Type-to-Token Ratio, Word Norms
Fraction, Google Similarity Distance, Explicit Semantic
Analysis , Number of Named Entities, Named Entities
Score, Wordnet Similarity, Coherence measure, and
Latent Semantic Analysis (LSA) measures.
Creativity Detection in Texts ICIW201326.06.2013
Experiment Methodology (I)
• Extract the news articles
from the Web and save
them into the database.
• Apply NLP Preprocessing
techniques.
• Apply NLP Categorization
and Tagging techniques.
Creativity Detection in Texts ICIW201326.06.2013
Corpus &
Statistics
Web
Articles’
URLs
Text Preprocessing
Normalize Text
Segmentation
Tokenization
Stemming
Tokens &
Stems
Text
Extraction
URLs HTML
Text & Statistics
Categorization and
Tagging
Part-of-Speech Tagging
Named Entities Recognition
Chunking
Plain Text
Tokens &
Sentences
Named Entities
Corpus Acquisition
Text
Understanding
Experiment Methodology (II)
• Compute the value of
each measure for the
given text.
• Use Stepwise Logistic
Regression to select the
measures that best
describe creativity and
build the Classification
Model.
Creativity Detection in Texts ICIW201326.06.2013
Results
Wikipedia
Results in
CSV & ARFF
Formats
Google
Search
Wordnet
Compute Measures
Type-to-Token Ratio
Word Norms Fraction
Google Similarity Distance
Explicit Semantic Analysis
Number of Named Entities
Named Entities Score
Wordnet-Similarity
Coherence measure
LSA measures
Categorized
Text
Categorization
and Tagging
The Corpus
• 185 articles on the US Election news taken from:
– 67 articles from The Onion, and
– 118 articles from 12 news sites from all over the
world: UK (BBC, Wired, The Independent, The Sun),
Canada (CBC), Australia (News.com.au, The Australian,
Sydney Morning Herald), USA (Foxnews and
Huffington Post), South Africa (News24), and New
Zealand (The NZ Herald).
• We made the assumption that The Onion articles
are creative.
Creativity Detection in Texts ICIW201326.06.2013
Experiments (I)
• Two Experiments:
A. In order to assess the quality of our classifier  was
tested on the news articles that we have extracted,
and
B. Intended to measure the capacity of the classifier to
adapt to different kinds texts.
• A – Classifier Evaluation
– Identify the mix parameters of the 9 measures for the
logistic regression model.
– Apply feature selection.
Creativity Detection in Texts ICIW201326.06.2013
Experiments (II)
Attribute Beta P value
Constant 1.830477 0.246
WordNet
Similarity -9.779 0
Named
Entities
Score 2.484793 0.0059
LSA cos.
Similarity
sentences 3.445448 0.0001
Number of
Named
Entities -2.89686 0.0053
Word
Norms
Fraction 3.585301 0.0378
Google Path
Similarity -3.25499 0.0538
Creativity Detection in Texts ICIW201326.06.2013
Experiments (III)
• Therefore, the classifier for discriminating
creative from non-creative text is given by:
– Pr (Y = 1 | X1, ... , X9) = F(1.83 + 3.585* X2 + 3.255 * X3 -
2.897 * X5 + 2.485 * X6 - 9.779 * X7 + 3.445 * X9)
• Where and X2, X3, X5, X6, X7, X9 are
the scores obtained by each text for Word Norms
Fraction, Google Similarity Distance, Number of
Named Entities, Named Entities Score, Wordnet
Similarity, LSA.
Creativity Detection in Texts ICIW201326.06.2013
Results (I)
• The obtained model was tested in a 10-fold
cross-validation:
• The accuracy for this experiment was 80.54%,
which is quite high, considering the difficulty
of this task.
Creativity Detection in Texts ICIW201326.06.2013
Values prediction Confusion matrix
Real
Predicted
Creative Non-creative Sum Precision Recall
Creative 46 15 61 0.754 0.6866
Non-creative 21 103 124 0.8306 0.8729
All 67 118 185
Experiments (IV)
• B – Adaptability Experiment
– Tried to use the built classifier to evaluate 20 book reviews taken
from The SFU Review Corpus (Taboada, Anthony and Voll, 2006).
– The reviews were independently evaluated by 3 master students:
1 = creative texts, 2 = mildly creative texts, 3 = non-creative texts
– Inter-rater agreement Kappa Statistic was too low (perceived
agreement was Po = 0.45)  we considered binary classification:
• mildly creative texts = creative  Po = 0.633 + considering the majority
class  12 out of 20 were considered creative
• mildly creative texts = non - creative  Po = 0.733 + considering the
majority class  4 out of 20 were considered creative
– Since usually there are more non-creative texts than creative
ones, we considered the second situation (mildly creative texts =
non-creative)
– The classifier considered all the reviews as being non-creative 
80% accuracy (missed the 4 positive samples, only 1 of these
being considered creative by all 3 students).
Creativity Detection in Texts ICIW201326.06.2013
Conclusions (I)
• We presented a model for discriminating creative from non-creative news
articles that was built combining nine different measures.
• The model could be improved by removing or changing the assumptions
that the The Onion articles are always creative.
• The feature selection revealed the following conclusions:
– The lack of creativity was best correlated with Word Norms Fraction which was
expected considering the definitions of creativity and of Word Norms Fraction.
Google Similarity Distance was in the same situation.
– Named Entities analysis showed that they are signs of a creative text as long as
not too many distinct such entities are used.
– Wordnet Similarity was the best evidence for creative texts, while LSA was
similar to the measures of Word Norms Fraction and Google Similarity Distance
in providing a measure for text “usualness” and therefore giving evidence of
non-creative texts. They also have similar weights in the final classifier. ESA had
no influence in the built classifier.
– Less coherent texts were expected to be more creative but coherence score
was found to have no influence in identifying creativity.
Creativity Detection in Texts ICIW201326.06.2013
Conclusions (II)
• The second experiment revealed that there are “levels” of
creativity: satire news articles may be more creative than
books reviews, in general.
• Both experiments had around 80% accuracy, showing that
there might be a possibility that the classifier adapts well 
However, the lack of true positive examples from the second
experiment makes us be a little cautious in clearly stating
this fact.
• The classifier performed reasonably well at differentiating
articles from The Onion and from other news websites:
– Did we really identified creativity or we detected satire in fact?
– Increasing the size of the data set, and testing it further, could
shed some light on the decision of whether any of the two
assumptions stands and which of them is more adequate.
Creativity Detection in Texts ICIW201326.06.2013
Questions
26.06.2013 ICIW2013
Thank you very much!
Creativity Detection in Texts

More Related Content

What's hot

Recommender systems
Recommender systemsRecommender systems
Recommender systemsTamer Rezk
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperChangsung Moon
 
Information Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesDaniel Valcarce
 
Text Analytics Presentation
Text Analytics PresentationText Analytics Presentation
Text Analytics PresentationSkylar Ritchie
 
Research on AITV
Research on AITVResearch on AITV
Research on AITVSyo Kyojin
 
Tutorial on Coreference Resolution
Tutorial on Coreference Resolution Tutorial on Coreference Resolution
Tutorial on Coreference Resolution Anirudh Jayakumar
 
Final presentation
Final presentationFinal presentation
Final presentationssuser8e5ee2
 
How to do content analysis_abriged
How to do content analysis_abrigedHow to do content analysis_abriged
How to do content analysis_abrigedDevi Prasad
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender systemKaren Li
 
Recommender system a-introduction
Recommender system a-introductionRecommender system a-introduction
Recommender system a-introductionzh3f
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filteringD Yogendra Rao
 
Book recommendation system using opinion mining technique
Book recommendation system using opinion mining techniqueBook recommendation system using opinion mining technique
Book recommendation system using opinion mining techniqueeSAT Journals
 
Content analysis
Content analysisContent analysis
Content analysisAtul Thakur
 

What's hot (18)

Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
Content analysis
Content analysisContent analysis
Content analysis
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paper
 
Information Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slides
 
Text Analytics Presentation
Text Analytics PresentationText Analytics Presentation
Text Analytics Presentation
 
Content analysis20 07-12
Content analysis20 07-12Content analysis20 07-12
Content analysis20 07-12
 
Research on AITV
Research on AITVResearch on AITV
Research on AITV
 
Automatic Summarizaton Tutorial
Automatic Summarizaton TutorialAutomatic Summarizaton Tutorial
Automatic Summarizaton Tutorial
 
Tutorial on Coreference Resolution
Tutorial on Coreference Resolution Tutorial on Coreference Resolution
Tutorial on Coreference Resolution
 
Final presentation
Final presentationFinal presentation
Final presentation
 
How to do content analysis_abriged
How to do content analysis_abrigedHow to do content analysis_abriged
How to do content analysis_abriged
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender system
 
Recommender system a-introduction
Recommender system a-introductionRecommender system a-introduction
Recommender system a-introduction
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
Book recommendation system using opinion mining technique
Book recommendation system using opinion mining techniqueBook recommendation system using opinion mining technique
Book recommendation system using opinion mining technique
 
Content analysis
Content analysisContent analysis
Content analysis
 
Content analysis
Content analysisContent analysis
Content analysis
 
Inspecting the sentiment behind customer ijcset feb_2017
Inspecting the sentiment behind customer ijcset feb_2017Inspecting the sentiment behind customer ijcset feb_2017
Inspecting the sentiment behind customer ijcset feb_2017
 

Similar to Creativity detection in texts

Content analysis
Content analysis Content analysis
Content analysis ayesha shah
 
Lecture 6 qualitative data analysis
Lecture 6 qualitative data analysisLecture 6 qualitative data analysis
Lecture 6 qualitative data analysisAyuni Abdullah
 
Aligning Learning Analytics with Classroom Practices & Needs
Aligning Learning Analytics with Classroom Practices & NeedsAligning Learning Analytics with Classroom Practices & Needs
Aligning Learning Analytics with Classroom Practices & NeedsSimon Knight
 
17352 12ppt
17352 12ppt17352 12ppt
17352 12pptmpopescu
 
Data analysis
Data analysisData analysis
Data analysisAida Adia
 
Shibani Antonette_Augmenting pedagogic writing practice with CLAD.pdf
Shibani Antonette_Augmenting pedagogic writing practice with CLAD.pdfShibani Antonette_Augmenting pedagogic writing practice with CLAD.pdf
Shibani Antonette_Augmenting pedagogic writing practice with CLAD.pdfShibani22
 
Study of Recommendation System Used In Tourism and Travel
Study of Recommendation System Used In Tourism and TravelStudy of Recommendation System Used In Tourism and Travel
Study of Recommendation System Used In Tourism and Travelijtsrd
 
Qualitative analysis\slideshow
Qualitative analysis\slideshowQualitative analysis\slideshow
Qualitative analysis\slideshowPaulo Azenha
 
IT3010 Lecture on Reviewing the literature
IT3010 Lecture on Reviewing the literatureIT3010 Lecture on Reviewing the literature
IT3010 Lecture on Reviewing the literatureBabakFarshchian
 
Linking Heterogeneous Scholarly Data Sources in an Interoperable Setting: the...
Linking Heterogeneous Scholarly Data Sources in an Interoperable Setting: the...Linking Heterogeneous Scholarly Data Sources in an Interoperable Setting: the...
Linking Heterogeneous Scholarly Data Sources in an Interoperable Setting: the...Platforma Otwartej Nauki
 
Abacá: Technically Assisted Sensitivity Review of Digital Records
Abacá: Technically Assisted Sensitivity Review of Digital RecordsAbacá: Technically Assisted Sensitivity Review of Digital Records
Abacá: Technically Assisted Sensitivity Review of Digital RecordsProjectAbaca
 
MELJUN CORTES research seminar_1_introductory_lectures_research_seminar_1
MELJUN CORTES research seminar_1_introductory_lectures_research_seminar_1MELJUN CORTES research seminar_1_introductory_lectures_research_seminar_1
MELJUN CORTES research seminar_1_introductory_lectures_research_seminar_1MELJUN CORTES
 
It services & research methods
It services & research methodsIt services & research methods
It services & research methodsAkanshShandilya
 
SubbuProjectReport
SubbuProjectReportSubbuProjectReport
SubbuProjectReportSubba Oota
 

Similar to Creativity detection in texts (20)

Content analysis
Content analysis Content analysis
Content analysis
 
Lecture 6 qualitative data analysis
Lecture 6 qualitative data analysisLecture 6 qualitative data analysis
Lecture 6 qualitative data analysis
 
Aligning Learning Analytics with Classroom Practices & Needs
Aligning Learning Analytics with Classroom Practices & NeedsAligning Learning Analytics with Classroom Practices & Needs
Aligning Learning Analytics with Classroom Practices & Needs
 
17352 12ppt
17352 12ppt17352 12ppt
17352 12ppt
 
Data analysis
Data analysisData analysis
Data analysis
 
82611944
8261194482611944
82611944
 
Shibani Antonette_Augmenting pedagogic writing practice with CLAD.pdf
Shibani Antonette_Augmenting pedagogic writing practice with CLAD.pdfShibani Antonette_Augmenting pedagogic writing practice with CLAD.pdf
Shibani Antonette_Augmenting pedagogic writing practice with CLAD.pdf
 
Study of Recommendation System Used In Tourism and Travel
Study of Recommendation System Used In Tourism and TravelStudy of Recommendation System Used In Tourism and Travel
Study of Recommendation System Used In Tourism and Travel
 
Qualitative analysis\slideshow
Qualitative analysis\slideshowQualitative analysis\slideshow
Qualitative analysis\slideshow
 
Content Analysis 1
Content Analysis 1Content Analysis 1
Content Analysis 1
 
Qualitative Data Analysis
Qualitative Data Analysis  Qualitative Data Analysis
Qualitative Data Analysis
 
IT3010 Lecture on Reviewing the literature
IT3010 Lecture on Reviewing the literatureIT3010 Lecture on Reviewing the literature
IT3010 Lecture on Reviewing the literature
 
Linking Heterogeneous Scholarly Data Sources in an Interoperable Setting: the...
Linking Heterogeneous Scholarly Data Sources in an Interoperable Setting: the...Linking Heterogeneous Scholarly Data Sources in an Interoperable Setting: the...
Linking Heterogeneous Scholarly Data Sources in an Interoperable Setting: the...
 
Welsh Government Workshop
Welsh Government WorkshopWelsh Government Workshop
Welsh Government Workshop
 
Abacá: Technically Assisted Sensitivity Review of Digital Records
Abacá: Technically Assisted Sensitivity Review of Digital RecordsAbacá: Technically Assisted Sensitivity Review of Digital Records
Abacá: Technically Assisted Sensitivity Review of Digital Records
 
MELJUN CORTES research seminar_1_introductory_lectures_research_seminar_1
MELJUN CORTES research seminar_1_introductory_lectures_research_seminar_1MELJUN CORTES research seminar_1_introductory_lectures_research_seminar_1
MELJUN CORTES research seminar_1_introductory_lectures_research_seminar_1
 
It services & research methods
It services & research methodsIt services & research methods
It services & research methods
 
progressivereport (1).pptx
progressivereport (1).pptxprogressivereport (1).pptx
progressivereport (1).pptx
 
SubbuProjectReport
SubbuProjectReportSubbuProjectReport
SubbuProjectReport
 
0 Employer Employee Scheme.pptx
0 Employer Employee Scheme.pptx0 Employer Employee Scheme.pptx
0 Employer Employee Scheme.pptx
 

More from University Politehnica Bucharest

PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic AnalysisPhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic AnalysisUniversity Politehnica Bucharest
 
Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...University Politehnica Bucharest
 
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...University Politehnica Bucharest
 
Determine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysisDetermine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysisUniversity Politehnica Bucharest
 
Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...University Politehnica Bucharest
 
Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...University Politehnica Bucharest
 
Movie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profileMovie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profileUniversity Politehnica Bucharest
 
Tracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corporaTracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corporaUniversity Politehnica Bucharest
 
The collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case studyThe collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case studyUniversity Politehnica Bucharest
 
Unsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesisUnsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesisUniversity Politehnica Bucharest
 
Tweets topic modelling across different countries prezentarea
Tweets topic modelling across different countries   prezentareaTweets topic modelling across different countries   prezentarea
Tweets topic modelling across different countries prezentareaUniversity Politehnica Bucharest
 
Nlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chatsNlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chatsUniversity Politehnica Bucharest
 
2012 Presidential Elections on Twitter - An Analysis of How the US and French...
2012 Presidential Elections on Twitter - An Analysis of How the US and French...2012 Presidential Elections on Twitter - An Analysis of How the US and French...
2012 Presidential Elections on Twitter - An Analysis of How the US and French...University Politehnica Bucharest
 

More from University Politehnica Bucharest (20)

PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic AnalysisPhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
 
Time series analysis for sales prediction
Time series analysis for sales predictionTime series analysis for sales prediction
Time series analysis for sales prediction
 
Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...
 
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
 
Identifying cyclic words with the help of google
Identifying cyclic words with the help of googleIdentifying cyclic words with the help of google
Identifying cyclic words with the help of google
 
Expression of Political Opinions in Press
Expression of Political Opinions in PressExpression of Political Opinions in Press
Expression of Political Opinions in Press
 
Determine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysisDetermine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysis
 
Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...
 
Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...
 
Movie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profileMovie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profile
 
Tracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corporaTracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corpora
 
The collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case studyThe collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case study
 
Archaisms and neologisms identification in texts
Archaisms and neologisms identification in textsArchaisms and neologisms identification in texts
Archaisms and neologisms identification in texts
 
Unsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesisUnsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesis
 
Tweets topic modelling across different countries prezentarea
Tweets topic modelling across different countries   prezentareaTweets topic modelling across different countries   prezentarea
Tweets topic modelling across different countries prezentarea
 
Sentiment based text segmentation
Sentiment based text segmentationSentiment based text segmentation
Sentiment based text segmentation
 
Nlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chatsNlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chats
 
Detecting discourse creativity in chat conversations
Detecting discourse creativity in chat conversationsDetecting discourse creativity in chat conversations
Detecting discourse creativity in chat conversations
 
Metaphor detection
Metaphor detectionMetaphor detection
Metaphor detection
 
2012 Presidential Elections on Twitter - An Analysis of How the US and French...
2012 Presidential Elections on Twitter - An Analysis of How the US and French...2012 Presidential Elections on Twitter - An Analysis of How the US and French...
2012 Presidential Elections on Twitter - An Analysis of How the US and French...
 

Recently uploaded

VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 

Recently uploaded (20)

VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 

Creativity detection in texts

  • 1. Autor Conducător științific Universitatea Politehnica București Facultatea de Automatică și Calculatoare Catedra de Calculatoare Creativity Detection in Texts • Costin-Gabriel Chiru • Ştefan Trăuşan-Matu Costin-Gabriel CHIRU Politehnica University of Bucharest E-mail: costin.chiru@cs.pub.ro
  • 2. Content • Introduction • Creativity Measures • Experiment Methodology • Experiments and Results • Conclusions Creativity Detection in Texts ICIW201326.06.2013
  • 3. Introduction (I) • Goal: Automatically identify creativity in a text. • How? – Define the elements that characterize a creative text. – Determine the most important features that explain creativity. – Build a model for automatic creativity detection (a classifier). Creativity Detection in Texts ICIW201326.06.2013
  • 4. Introduction (II) • Definitions of Creativity: – The ability to transcend traditional ideas, rules, patterns, relationships, or the like, and to create meaningful new ideas, forms, methods, interpretations, etc. (Zhu, Xu, Khot, 2009). – Creativity is typically thought of acting or the quality of an unpredictable departure from the rules of regular word formation (Renouf, 2007). • Linguistic creativity = creativity in texts and measures “new and creative ways of expressing a given idea” (Veale, 2011). Creativity Detection in Texts ICIW201326.06.2013
  • 5. Other approaches • Manual Identification: 21 creative writers, 4 human judges  105 rated tuples like: (word, sentence, creativity) (Zhu, Xu, Khot, 2009). • Machine learning algorithm using a linear regression model with 17 features (Zhu, Xu, Khot, 2009). • Jordanous (2012), - SPECS: three steps procedure for determining whether a computational system can be defined as creative or not. • Understanding and using metaphors (Kovecses, 2011; Veale, 2006) and analogies (Veale, 2006) or on explaining the appearance of new words from already existing ones (Lehrer, 2007). • Creativity detection in song lyrics (Hu and Yu, 2011) – uses three measures for identifying mood and creativity in a lyric. Creativity Detection in Texts ICIW201326.06.2013
  • 6. Creativity Measures • A computational creativity measure should address two aspects: – Novelty: To what extent an item is different to the existing samples of its genre? – Quality: How good the item really is? • We tried to capture these two criteria through nine different measures: Type-to-Token Ratio, Word Norms Fraction, Google Similarity Distance, Explicit Semantic Analysis , Number of Named Entities, Named Entities Score, Wordnet Similarity, Coherence measure, and Latent Semantic Analysis (LSA) measures. Creativity Detection in Texts ICIW201326.06.2013
  • 7. Experiment Methodology (I) • Extract the news articles from the Web and save them into the database. • Apply NLP Preprocessing techniques. • Apply NLP Categorization and Tagging techniques. Creativity Detection in Texts ICIW201326.06.2013 Corpus & Statistics Web Articles’ URLs Text Preprocessing Normalize Text Segmentation Tokenization Stemming Tokens & Stems Text Extraction URLs HTML Text & Statistics Categorization and Tagging Part-of-Speech Tagging Named Entities Recognition Chunking Plain Text Tokens & Sentences Named Entities Corpus Acquisition Text Understanding
  • 8. Experiment Methodology (II) • Compute the value of each measure for the given text. • Use Stepwise Logistic Regression to select the measures that best describe creativity and build the Classification Model. Creativity Detection in Texts ICIW201326.06.2013 Results Wikipedia Results in CSV & ARFF Formats Google Search Wordnet Compute Measures Type-to-Token Ratio Word Norms Fraction Google Similarity Distance Explicit Semantic Analysis Number of Named Entities Named Entities Score Wordnet-Similarity Coherence measure LSA measures Categorized Text Categorization and Tagging
  • 9. The Corpus • 185 articles on the US Election news taken from: – 67 articles from The Onion, and – 118 articles from 12 news sites from all over the world: UK (BBC, Wired, The Independent, The Sun), Canada (CBC), Australia (News.com.au, The Australian, Sydney Morning Herald), USA (Foxnews and Huffington Post), South Africa (News24), and New Zealand (The NZ Herald). • We made the assumption that The Onion articles are creative. Creativity Detection in Texts ICIW201326.06.2013
  • 10. Experiments (I) • Two Experiments: A. In order to assess the quality of our classifier  was tested on the news articles that we have extracted, and B. Intended to measure the capacity of the classifier to adapt to different kinds texts. • A – Classifier Evaluation – Identify the mix parameters of the 9 measures for the logistic regression model. – Apply feature selection. Creativity Detection in Texts ICIW201326.06.2013
  • 11. Experiments (II) Attribute Beta P value Constant 1.830477 0.246 WordNet Similarity -9.779 0 Named Entities Score 2.484793 0.0059 LSA cos. Similarity sentences 3.445448 0.0001 Number of Named Entities -2.89686 0.0053 Word Norms Fraction 3.585301 0.0378 Google Path Similarity -3.25499 0.0538 Creativity Detection in Texts ICIW201326.06.2013
  • 12. Experiments (III) • Therefore, the classifier for discriminating creative from non-creative text is given by: – Pr (Y = 1 | X1, ... , X9) = F(1.83 + 3.585* X2 + 3.255 * X3 - 2.897 * X5 + 2.485 * X6 - 9.779 * X7 + 3.445 * X9) • Where and X2, X3, X5, X6, X7, X9 are the scores obtained by each text for Word Norms Fraction, Google Similarity Distance, Number of Named Entities, Named Entities Score, Wordnet Similarity, LSA. Creativity Detection in Texts ICIW201326.06.2013
  • 13. Results (I) • The obtained model was tested in a 10-fold cross-validation: • The accuracy for this experiment was 80.54%, which is quite high, considering the difficulty of this task. Creativity Detection in Texts ICIW201326.06.2013 Values prediction Confusion matrix Real Predicted Creative Non-creative Sum Precision Recall Creative 46 15 61 0.754 0.6866 Non-creative 21 103 124 0.8306 0.8729 All 67 118 185
  • 14. Experiments (IV) • B – Adaptability Experiment – Tried to use the built classifier to evaluate 20 book reviews taken from The SFU Review Corpus (Taboada, Anthony and Voll, 2006). – The reviews were independently evaluated by 3 master students: 1 = creative texts, 2 = mildly creative texts, 3 = non-creative texts – Inter-rater agreement Kappa Statistic was too low (perceived agreement was Po = 0.45)  we considered binary classification: • mildly creative texts = creative  Po = 0.633 + considering the majority class  12 out of 20 were considered creative • mildly creative texts = non - creative  Po = 0.733 + considering the majority class  4 out of 20 were considered creative – Since usually there are more non-creative texts than creative ones, we considered the second situation (mildly creative texts = non-creative) – The classifier considered all the reviews as being non-creative  80% accuracy (missed the 4 positive samples, only 1 of these being considered creative by all 3 students). Creativity Detection in Texts ICIW201326.06.2013
  • 15. Conclusions (I) • We presented a model for discriminating creative from non-creative news articles that was built combining nine different measures. • The model could be improved by removing or changing the assumptions that the The Onion articles are always creative. • The feature selection revealed the following conclusions: – The lack of creativity was best correlated with Word Norms Fraction which was expected considering the definitions of creativity and of Word Norms Fraction. Google Similarity Distance was in the same situation. – Named Entities analysis showed that they are signs of a creative text as long as not too many distinct such entities are used. – Wordnet Similarity was the best evidence for creative texts, while LSA was similar to the measures of Word Norms Fraction and Google Similarity Distance in providing a measure for text “usualness” and therefore giving evidence of non-creative texts. They also have similar weights in the final classifier. ESA had no influence in the built classifier. – Less coherent texts were expected to be more creative but coherence score was found to have no influence in identifying creativity. Creativity Detection in Texts ICIW201326.06.2013
  • 16. Conclusions (II) • The second experiment revealed that there are “levels” of creativity: satire news articles may be more creative than books reviews, in general. • Both experiments had around 80% accuracy, showing that there might be a possibility that the classifier adapts well  However, the lack of true positive examples from the second experiment makes us be a little cautious in clearly stating this fact. • The classifier performed reasonably well at differentiating articles from The Onion and from other news websites: – Did we really identified creativity or we detected satire in fact? – Increasing the size of the data set, and testing it further, could shed some light on the decision of whether any of the two assumptions stands and which of them is more adequate. Creativity Detection in Texts ICIW201326.06.2013
  • 17. Questions 26.06.2013 ICIW2013 Thank you very much! Creativity Detection in Texts