A new visualization tool to display the words of a text (newspaper article, blog content, political speech) is presented, the tree cloud, a kind of improved tag cloud: http://www.treecloud.org.
Nuages arborés et analyse textuelle - Présentation de l’outil TreeCloudPhilippe Gambette
24 mai 2013 - Séminaire "Réflexion sur les visualisations en sciences humaines, quels apports pour la textométrie ?" - CEDITEC (Université Paris-Est Créteil)
Nuages arborés et analyse textuelle - Présentation de l’outil TreeCloudPhilippe Gambette
24 mai 2013 - Séminaire "Réflexion sur les visualisations en sciences humaines, quels apports pour la textométrie ?" - CEDITEC (Université Paris-Est Créteil)
Méthodes combinatoires de reconstruction de réseaux phylogénétiquesPhilippe Gambette
Soutenance de thèse le 30 novembre 2010 au LIRMM à Montpellier.
Jury :
- Guillaume Fertin & Vincent Moulton (rapporteurs)
- Alain Guénoche, Violaine Prince & Eric Tannier
- Vincent Berry & Christophe Paul (directeurs de thèse)
Utilisation de la visualisation en nuage arboré pour l'analyse littérairePhilippe Gambette
Présentation aux JADT2010 à Rome, avec Delphine Amstutz.
Pris par le temps, nous avons dû abréger les commentaires des diapos 47 à 53, ils sont insérés dans cette version web.
L'article associé se trouve à l'adresse http://hal-lirmm.ccsd.cnrs.fr/lirmm-00448436/fr/
हम आग्रह करते हैं कि जो भी सत्ता में आए, वह संविधान का पालन करे, उसकी रक्षा करे और उसे बनाए रखे।" प्रस्ताव में कुल तीन प्रमुख हस्तक्षेप और उनके तंत्र भी प्रस्तुत किए गए। पहला हस्तक्षेप स्वतंत्र मीडिया को प्रोत्साहित करके, वास्तविकता पर आधारित काउंटर नैरेटिव का निर्माण करके और सत्तारूढ़ सरकार द्वारा नियोजित मनोवैज्ञानिक हेरफेर की रणनीति का मुकाबला करके लोगों द्वारा निर्धारित कथा को बनाए रखना और उस पर कार्यकरना था।
31052024_First India Newspaper Jaipur.pdfFIRST INDIA
Find Latest India News and Breaking News these days from India on Politics, Business, Entertainment, Technology, Sports, Lifestyle and Coronavirus News in India and the world over that you can't miss. For real time update Visit our social media handle. Read First India NewsPaper in your morning replace. Visit First India.
CLICK:- https://firstindia.co.in/
#First_India_NewsPaper
‘वोटर्स विल मस्ट प्रीवेल’ (मतदाताओं को जीतना होगा) अभियान द्वारा जारी हेल्पलाइन नंबर, 4 जून को सुबह 7 बजे से दोपहर 12 बजे तक मतगणना प्रक्रिया में कहीं भी किसी भी तरह के उल्लंघन की रिपोर्ट करने के लिए खुला रहेगा।
03062024_First India Newspaper Jaipur.pdfFIRST INDIA
Find Latest India News and Breaking News these days from India on Politics, Business, Entertainment, Technology, Sports, Lifestyle and Coronavirus News in India and the world over that you can't miss. For real time update Visit our social media handle. Read First India NewsPaper in your morning replace. Visit First India.
CLICK:- https://firstindia.co.in/
#First_India_NewsPaper
An astonishing, first-of-its-kind, report by the NYT assessing damage in Ukraine. Even if the war ends tomorrow, in many places there will be nothing to go back to.
In a May 9, 2024 paper, Juri Opitz from the University of Zurich, along with Shira Wein and Nathan Schneider form Georgetown University, discussed the importance of linguistic expertise in natural language processing (NLP) in an era dominated by large language models (LLMs).
The authors explained that while machine translation (MT) previously relied heavily on linguists, the landscape has shifted. “Linguistics is no longer front and center in the way we build NLP systems,” they said. With the emergence of LLMs, which can generate fluent text without the need for specialized modules to handle grammar or semantic coherence, the need for linguistic expertise in NLP is being questioned.
01062024_First India Newspaper Jaipur.pdfFIRST INDIA
Find Latest India News and Breaking News these days from India on Politics, Business, Entertainment, Technology, Sports, Lifestyle and Coronavirus News in India and the world over that you can't miss. For real time update Visit our social media handle. Read First India NewsPaper in your morning replace. Visit First India.
CLICK:- https://firstindia.co.in/
#First_India_NewsPaper
Méthodes combinatoires de reconstruction de réseaux phylogénétiquesPhilippe Gambette
Soutenance de thèse le 30 novembre 2010 au LIRMM à Montpellier.
Jury :
- Guillaume Fertin & Vincent Moulton (rapporteurs)
- Alain Guénoche, Violaine Prince & Eric Tannier
- Vincent Berry & Christophe Paul (directeurs de thèse)
Utilisation de la visualisation en nuage arboré pour l'analyse littérairePhilippe Gambette
Présentation aux JADT2010 à Rome, avec Delphine Amstutz.
Pris par le temps, nous avons dû abréger les commentaires des diapos 47 à 53, ils sont insérés dans cette version web.
L'article associé se trouve à l'adresse http://hal-lirmm.ccsd.cnrs.fr/lirmm-00448436/fr/
हम आग्रह करते हैं कि जो भी सत्ता में आए, वह संविधान का पालन करे, उसकी रक्षा करे और उसे बनाए रखे।" प्रस्ताव में कुल तीन प्रमुख हस्तक्षेप और उनके तंत्र भी प्रस्तुत किए गए। पहला हस्तक्षेप स्वतंत्र मीडिया को प्रोत्साहित करके, वास्तविकता पर आधारित काउंटर नैरेटिव का निर्माण करके और सत्तारूढ़ सरकार द्वारा नियोजित मनोवैज्ञानिक हेरफेर की रणनीति का मुकाबला करके लोगों द्वारा निर्धारित कथा को बनाए रखना और उस पर कार्यकरना था।
31052024_First India Newspaper Jaipur.pdfFIRST INDIA
Find Latest India News and Breaking News these days from India on Politics, Business, Entertainment, Technology, Sports, Lifestyle and Coronavirus News in India and the world over that you can't miss. For real time update Visit our social media handle. Read First India NewsPaper in your morning replace. Visit First India.
CLICK:- https://firstindia.co.in/
#First_India_NewsPaper
‘वोटर्स विल मस्ट प्रीवेल’ (मतदाताओं को जीतना होगा) अभियान द्वारा जारी हेल्पलाइन नंबर, 4 जून को सुबह 7 बजे से दोपहर 12 बजे तक मतगणना प्रक्रिया में कहीं भी किसी भी तरह के उल्लंघन की रिपोर्ट करने के लिए खुला रहेगा।
03062024_First India Newspaper Jaipur.pdfFIRST INDIA
Find Latest India News and Breaking News these days from India on Politics, Business, Entertainment, Technology, Sports, Lifestyle and Coronavirus News in India and the world over that you can't miss. For real time update Visit our social media handle. Read First India NewsPaper in your morning replace. Visit First India.
CLICK:- https://firstindia.co.in/
#First_India_NewsPaper
An astonishing, first-of-its-kind, report by the NYT assessing damage in Ukraine. Even if the war ends tomorrow, in many places there will be nothing to go back to.
In a May 9, 2024 paper, Juri Opitz from the University of Zurich, along with Shira Wein and Nathan Schneider form Georgetown University, discussed the importance of linguistic expertise in natural language processing (NLP) in an era dominated by large language models (LLMs).
The authors explained that while machine translation (MT) previously relied heavily on linguists, the landscape has shifted. “Linguistics is no longer front and center in the way we build NLP systems,” they said. With the emergence of LLMs, which can generate fluent text without the need for specialized modules to handle grammar or semantic coherence, the need for linguistic expertise in NLP is being questioned.
01062024_First India Newspaper Jaipur.pdfFIRST INDIA
Find Latest India News and Breaking News these days from India on Politics, Business, Entertainment, Technology, Sports, Lifestyle and Coronavirus News in India and the world over that you can't miss. For real time update Visit our social media handle. Read First India NewsPaper in your morning replace. Visit First India.
CLICK:- https://firstindia.co.in/
#First_India_NewsPaper
1. IFCS 2009
Dresden – 17/03/2009
Visualising a text with a tree cloud
Philippe Gambette, Jean Véronis
2. Outline
• Tag and word clouds
• Enhanced tag clouds
• Tree clouds
• Construction steps
• Quality control
3. Tag clouds
• Built from a set of tags
• Font size related to frequency
What is considered the
first tag cloud, from
D. Coupland: Microserfs,
HarperCollins, Toronto,
1995
4. Tag clouds
• Built from a set of tags
• Font size related to frequency
• Gained popularity with Flickr
Flickr's all time most popular tags
5. Word clouds
• Built from a set of words from a text
• Font size related to frequency
• Gained popularity with Wordle
6. Word clouds
• Built from a set of words from a text
• Font size related to frequency
• Gained popularity with Wordle
GoogleImage(obama inaugural address wordle)
7. Word clouds
• Built from a set of words from a text
• Font size related to frequency
• Gained popularity with Wordle
GoogleImage(obama inaugural address wordle)
8. Word clouds
• Built from a set of words from a text
• Font size related to frequency
• Gained popularity with Wordle
GoogleImage(obama inaugural address wordle)
9. Enhanced tag / word clouds
Add information from the text:
• color intensity to express recency in Amazon
• shared tags in red on del.icio.us
• group together cooccurring tags on the same line
Hassan-Montero & Herrero-Solana, InScit'06
• optimize blank space and semantic proximity
Kaser & Lemire, WWW'07
• “topigraphy”: 2D placement according to cooccurrence
Fujimura, Fujimura, Matsubayashi, Yamada & Okuda, WWW'08
10. Enhanced tag / word clouds
Add information from the text:
• color intensity to express recency in Amazon
• shared tags in red on del.icio.us
• group together cooccurring tags on the same line
Hassan-Montero & Herrero-Solana, InScit'06
• optimize blank space and semantic proximity
Kaser & Lemire, WWW'07
• “topigraphy”: 2D placement according to cooccurrence
Fujimura, Fujimura, Matsubayashi, Yamada & Okuda, WWW'08
11. Enhanced tag / word clouds
Add information from the text:
• color intensity to express recency in Amazon
• shared tags in red on del.icio.us
• group together cooccurring tags on the same line
Hassan-Montero & Herrero-Solana, InScit'06
• optimize blank space and semantic proximity
Kaser & Lemire, WWW'07
• “topigraphy”: 2D placement according to cooccurrence
Fujimura, Fujimura, Matsubayashi, Yamada & Okuda, WWW'08
12. Enhanced tag / word clouds
Add information from the text:
• color intensity to express recency in Amazon
• shared tags in red on del.icio.us
• group together cooccurring tags on the same line
Hassan-Montero & Herrero-Solana, InScit'06
• optimize blank space and semantic proximity
Kaser & Lemire, WWW'07
• “topigraphy”: 2D placement according to cooccurrence
Fujimura, Fujimura, Matsubayashi, Yamada & Okuda, WWW'08
13. Enhanced tag / word clouds
Add information from the text:
• color intensity to express recency in Amazon
• shared tags in red on del.icio.us
• group together cooccurring tags on the same line
Hassan-Montero & Herrero-Solana, InScit'06
• optimize blank space and semantic proximity
Kaser & Lemire, WWW'07
• “topigraphy”: 2D placement according to cooccurrence
Fujimura, Fujimura, Matsubayashi, Yamada & Okuda, WWW'08
14. Extract semantic information from a text
• literature analysis:
philological approach: only consider the text
Brody
• discourse analysis:
tree analysis or cooccurrence graph, geodesic projection
Brunet (Hyperbase), Viprey (Astartex)
• text mining:
semantic graph
Grimmer (Wordmapper)
• natural language processing:
sense desamibiguation
Véronis (Hyperlex)
15. Extract semantic information from a text
• literature analysis:
philological approach: only consider the text
Brody
• discourse analysis:
tree analysis or cooccurrence graph, geodesic projection
Brunet (Hyperbase), Viprey (Astartex)
• text mining:
semantic graph
Grimmer (Wordmapper)
• natural language processing:
sense desamibiguation
Véronis (Hyperlex)
Mayaffre, Quand travail, famille,
et patrie cooccurrent dans le
discours de Nicolas
Sarkozy, JADT'08
16. Extract semantic information from a text
• literature analysis:
philological approach: only consider the text
Brody
• discourse analysis:
tree analysis or cooccurrence graph, geodesic projection
Brunet (Hyperbase), Viprey (Astartex)
• text mining:
semantic graph
Grimmer (Wordmapper)
• natural language processing:
sense desamibiguation
Véronis (Hyperlex)
Brunet, Les séquences (suite),
JADT'08
17. Extract semantic information from a text
• literature analysis:
philological approach: only consider the text
Brody
• discourse analysis:
tree analysis or cooccurrence graph, geodesic projection
Brunet (Hyperbase), Viprey (Astartex)
• text mining:
semantic graph
Grimmer (Wordmapper)
• natural language processing:
word sense disamibiguation
Véronis (Hyperlex)
Barry, Viprey,
Approche comparative
des résultats d'exploration
textuelle des discours
de deux leaders africains
Keita et Touré,
JADT'08
18. Extract semantic information from a text
• literature analysis:
philological approach: only consider the text
Brody
• discourse analysis:
tree analysis or cooccurrence graph, geodesic projection
Brunet (Hyperbase), Viprey (Astartex)
• text mining:
semantic graph
Grimmer (Wordmapper)
• natural language processing:
word sense disamibiguation
Véronis (Hyperlex)
Peyrat-Guillard,
Analyse du discours
syndical sur l’entreprise,
JADT'08
19. Extract semantic information from a text
• literature analysis:
philological approach: only consider the text
Brody
• discourse analysis:
tree analysis or cooccurrence graph, geodesic projection
Brunet (Hyperbase), Viprey (Astartex)
• text mining:
semantic graph
Grimmer (Wordmapper)
• natural language processing:
word sense disambiguation
Véronis (Hyperlex)
Disambiguation of word
“barrage”: dam, play-off,
roadblock, police cordon.
Véronis, HyperLex:
Lexical Cartography for
Information Retrieval, 2004
20. Tag cloud + tree = tree cloud
SplitsTree: Huson 1998,
Huson & Bryant 2006
Built with
TreeCloud and GPL-licensed Treecloud in Python,
available at http://www.treecloud.org
21. The first tree cloud
Tree cloud of the blog posts containing “Laurence Ferrari”
from 25/11/2007 to 10/12/2007, by Jean Véronis
http://aixtal.blogspot.com/2007/12/actu-une-ferrari-dans-un-arbre.html
22. Building a tree cloud – extracting the words
Extract words with frequency:
• stoplist?
without stoplist with stoplist
23. Building a tree cloud – extracting the words
Extract words with frequency:
• lemmatization?
• groups words with similar meaning?
no lemmatization...
sometimes interesting
70 most frequent words in
Obama's campaign speeches,
winsize=30, distance=dice, NJ-tree.
24. Building a tree cloud – dissimilarity matrix
Many semantic distance formulas based on cooccurrence
25. Building a tree cloud – dissimilarity matrix
Many semantic distance formulas based on cooccurrence
Text
sliding
window S sliding
step s
width w
cooccurrence matrices semantic dissimilarity
O11, O12, O21, O22 matrix
chi squared, mutual
information, liddel, dice,
jaccard, gmean, hyperlex,
minimum sensitivity, odds
ratio, zscore, log likelihood,
poisson-stirling...
Evert,
Statistics of words cooccurrences,
PhD Thesis, 2005
26. Building a tree cloud – dissimilarity matrix
Transformations needed on the dissimilarity:
• transform similarity into dissimilarity
• linear normalization for positive matrices to get distances
in [0,1]
• affine normalization for matrices with positive or negative
numbers, to get distances in [α,1] (for example α=0.1)
27. Building a tree cloud – tree reconstruction
Many existing methods:
• Neighbor-Joining
Saitou & Nei, 1987
• Addtree variants
Barthelemy & Luong, 1987
• Quartet heuristic
Cilibrasi & Vitanyi, 2007
28. Building a tree cloud – tree decoration
Choice of word sizes:
• computed directly from frequency (apply a log!)
or
• computed from frequency ranking (exponential distribution)
or
• statistical significance with respect to a reference corpus
29. Building a tree cloud – tree decoration
Colors: chronology old
recent
150 most frequent words in Built with
Obama's campaign speeches, TreeCloud and
winsize=30, distance=oddsratio,
color=chronology, NJ-tree.
30. Building a tree cloud – tree decoration
Colors: dispersion sparse
dense
standard
deviation
of the position
too blue?
150 most frequent words in
Obama's campaign speeches,
winsize=30, distance=oddsratio,
color=dispersion, NJ-tree.
31. Building a tree cloud – tree decoration
Colors: dispersion sparse
dense
standard
deviation
of the position
word frequency
too red?
150 most frequent words in
Obama's campaign speeches,
winsize=30, distance=oddsratio,
color=norm-dispersion, NJ-tree.
32. Building a tree cloud – tree decoration
Edge color or thickness:
quality of the induced
cluster.
150 most frequent words in
Obama's campaign speeches,
winsize=30, distance=oddsratio,
color=chronology, NJ-tree.
34. Quality control
Is there an objective quality measure of tree clouds?
What is the best method to build a tree cloud from my data?
35. Quality control
Is there an objective quality measure of tree clouds?
What is the best method to build a tree cloud from my data?
Tree cloud variations if small changes?
bootstrap to evaluate:
- stability of the result
- robustness of the method
36. Quality control
Is there an objective quality measure of tree clouds?
What is the best method to build a tree cloud from my data?
Tree cloud variations if small changes?
bootstrap to evaluate:
- stability of the result
- robustness of the method
Still, is there a more direct method?
arboricity to show whether the distance matrix
fits with a tree, which should imply stability?
Guénoche & Garreta, 2001
Guénoche & Darlu, 2009
37. Quality control – bootstrap
• Randomly delete words with probability 50%.
• Built tree cloud of original text, and altered text.
• Compute similarity of both trees (1-normalized RobinsonFoulds)
similarity
0,95
0,9
0,85
0,8
0,75
0,7
0,65
0,6
0,55
0,5
4 altered versions of 10
mi dice gmean ms zscore poissonstirling
Obama's speeches,
chisquared liddell jaccard hyperlex oddsratio loglikelihood
3000 words in average,
width=30, NJ-tree.
40. Perspectives
• Make the tool available on a web interface http://www.treecloud.org
• Evaluate tree clouds for discourse analysis
• Build the daily tree cloud of people popular on blogs,
with
http://labs.wikio.net
41. Thank you for your attention!
Tree cloud of the words appearing
twice or more in the IFCS 2009 call for paper
lemmatization, width=20, distance=dice, NJ-tree.
http://www.treecloud.org - http://www.splitstree.org
42. Tree clouds focused on one word
Tree cloud of the neighborhood of “McCain” in
Obama's campaign speeches
http://www.treecloud.org - http://www.splitstree.org
43. Tree clouds focused on one word
Tree cloud of the neighborhood of “Bush” in
Obama's campaign speeches
http://www.treecloud.org - http://www.splitstree.org
44. Tree clouds focused on one word
Tree cloud of the neighborhood of “world” in
Obama's campaign speeches
http://www.treecloud.org - http://www.splitstree.org