Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
IFITT PhD Seminar 2015. Text Mining Ideas & Examples
1. 7/28/2015
1
J. A. Mazanec, http://raptor.mazanec.com:3000 1
IFITT PhD Seminar 2015
Text Mining Ideas & Examples
J. A. Mazanec
Modul University Vienna
J. A. Mazanec, https://raptor.mazanec.com 2
Contents
Purpose of the introductory presentation
Analyzing the projected images of
destinations
Classifying customer reviews via significant
word items
Extracting topics from positive versus
negative reviews
2. 7/28/2015
2
J. A. Mazanec, https://raptor.mazanec.com 3
Destinations under similarity
competition
Theoretical underpinnings
Choosing attributes: affective image
Connotations
Survey data vs. Internet sources
Retrieving co-occurrences
(dis)similarity – Normalized Google Distance
Visualizing with …
Dendrograms and
Image maps
J. A. Mazanec, https://raptor.mazanec.com 4
Wayne Chase‘s system of
emotionally positive connotations
Adoration Amazement Admiration Appreciation Affection
Amusement Bliss Amorousness Astonishment Cheer
Comfort Devotion Eagerness Delight Contentment
Fondness Enthusiasm Ecstasy Friendliness Excitement
Elation Gladness Infatuation Exhilaration Enjoyment
Gratitude Kindliness Exuberance Euphoria Hope
Liking Fun Exultation Peacefulness Love
Glee Happiness Lust Hilarity Joy
Relief Passion Merriment Jubilation Satisfaction
Tenderness Mirth Pleasure Serenity Trust
Surprise Pride Thankfulness Warmth Thrill
Rapture Wonder Well-being
3. 7/28/2015
3
J. A. Mazanec, https://raptor.mazanec.com 5
Normalized Google Similarity
Distance (Cilibrasi & Vitanyi 2007)
LSL
SA
S
S
J. A. Mazanec, https://raptor.mazanec.com 6
Destination countries with similar
connotative environment
4. 7/28/2015
4
J. A. Mazanec, https://raptor.mazanec.com 7
Destination countries in connotative
Google space
J. A. Mazanec, https://raptor.mazanec.com 8
Exercises
In-class (group) work
Choose destinations and decide on the attributes
Retrieve from Internet with Google queries
Generate dendrograms and maps
Evaluate results
Comment on relative competitiveness
5. 7/28/2015
5
J. A. Mazanec, https://raptor.mazanec.com 9
Classifying online customer reviews
Objective: significant word items
Underlying hypothesis
Practical usage: identify symptomatic words
as early warning signal
Analytical method: Penalized Support Vector
Machines
Demo and exercising
J. A. Mazanec, https://raptor.mazanec.com 10
SVM-Support Vector Machine
(Meyer, 2012)
6. 7/28/2015
6
J. A. Mazanec, https://raptor.mazanec.com 11
Extracting topics from positive &
negative online reviews
Objective: explore customers‘ use of
language
Underlying hypothesis
Practical use: automatic doc annotation;
structure of customer language
Analytical method: Latent Dirichlet Analysis
Demo and exercising
LDA basics (Blei, 2012)
topic:= probability distribution over a fixed
vocabulary
distribution over topics
per-document distribution over topics
all documents share the same set of topics
topics, per-document topic distributions,
per-document per-word topic assignments =
hidden structure (that likely generated the
observed documents)
J. A. Mazanec, http://raptor.mazanec.com:3000 12
7. 7/28/2015
7
Latent topics (Blei, 2012)
J. A. Mazanec, http://raptor.mazanec.com:3000 13
Graphical model for LDA (Blei, 2012)
J. A. Mazanec, http://raptor.mazanec.com:3000 14
compute the hidden topic structure ( = posterior distribution = conditional
distribution of the hidden variables given the documents)
β: topic distribution over words θ : proportion for topic k in document d
z: topic assignment for word n in doc d w: word n in document d
α, η
8. 7/28/2015
8
The generative process and posterior
distribution
J. A. Mazanec, http://raptor.mazanec.com:3000 15
References
Becker, N., Werft, W., Toedt, G., Lichter, P. and Benner, A. (2009).
penalizedSVM: a R-package for feature selection SVM
classification. Bioinformatics 25(13): 1711–1712.
Blei, D. (2012). Probabilistic Topic Models. Communications of the
ACM 55(4): 77-84.
Grün, B. and K. Hornik (2011). topicmodels: An R Package for Fitting
Topic Models. Journal of Statistical Software 10(13): 1-30.
Mazanec, J. A. (2010). Tourism-Receiving Countries in Connotative
Google Space. Journal of Travel Research 49 (Nov): 501-512.
Meyer, D. (2011). Support Vector Machines: The Interface to libsvm
in Package e1071. University of Technology, Vienna.
J. A. Mazanec, http://raptor.mazanec.com:3000 16