2. Lexicometry methodology 1980
Raw text word counts (graphical word form)
Contrastive analysis : Factorial CA, AH Classification
Collocations
Textometry methodology 2003
XML encoded texts
Tagged texts
TXM platform 2007→ French Research Agency Grant 2007-2010
XML TEI
NLP Automatic tagging
Open-source development model
3. TXM today (1/2)
Full range of text analysis tools
– Qualitative: Word lists, Kwic
concordances, Text edition reading &
navigation
– Quantitative: FCA, AHC, Collocates,
Specificity
– Corpus configurations: sub-corpus &
partitions building
Countables based on CQP word pattern full text
search engine
Statistical models based on R environment
4. TXM today (2/2)
Large spectrum of input formats
– TXT (Unicode) > XML > TEI (BVH, BFM,
etc.)
–
Speech transcriptions (timing, speech
turns, audio/video...)
– Aligned corpora (translation or versioning)
Two end user applications
– TXM RCP - Cross-platform desktop
(Windows, Mac OS X, Linux)
– TXM GWT - Web Portals
User community (French speaking)
Developer community (Lyon, Besançon, Caen)
5. TXM introduction workshop
TXM 0.7.2
(0.7.5 this week)
Brown corpus
(Kucera & al)
TreeTagger English model
Brown TXT and XML sources (import)
Main concepts & tools
CQL queries
XML import
6. TXM introduction workshop
TXM 0.7.2
(0.7.5 this week)
Brown corpus
(Kucera & al)
TreeTagger English model
Brown TXT and XML sources (import)
Main concepts & tools
CQL queries
XML import