AACL 2018 - Going Beyond Simple Word-list Creation Using CasualConc

Going beyond simple word-list
creation using CasualConc
Yasuhiro IMAO
Osaka University, Japan
casualconc@gmail.com
AACL 2018 at Georgia State University, Atlanta GA

A few questions
How many of you are Mac users?
How many of you have used CasualConc?

A few observations
Through attending presentations / reading papers
Methods of analysis
Use of statistics
Hugely depend on
the access to the resources
the tools one uses
specialized application
programing skills
someone who can write scripts

To advance the ﬁeld
more easy-to-use and accessible tools are necessary

Current Situation
AntConc and Antxxx
and other more small specialized application
WordSmith Tools / Monoconc Pro
The gold standard?

A little bit of background
I started developing a concordancer around 2005
I released the ﬁrst, limited version around 2008
It is a Mac native app!
KWIC, Word/n-gram Lists, Collocation

Small Scale Corpus Research
Building your own specialized corpus
Possibly adding annotations (POS, syntactic, etc.)
Which tools to use?

A suggestion (not the answer)
I have developed few companion apps
CasualTranscriber (transcription helper)
CasualTextractor (text extractor/editor)
CasualTagger (tagging helper)
CasualPConc (parallel concordancer)

CasualTagger
KWIC search + short-cut tag insertion
Batch Processing
Tagging (TreeTagger, STF CoreNLP, MeCab)
Tokenizing (macOS built-in tagger)
Sentence Splitting (macOS built-in tagger, STF CoreNLP)
Misspelling detection (non-dictionary words)
compound-2-word variation detection

Today’s highlights
Corpus file management - going beyond loading files
More informative word/n-gram lists
Individual file word/n-gram lists
More keyword extraction functions - going beyond LL/χ2
Visualizing frequency data - incl. multivariate analyses
Some other niche functions

Corpus File Management
Accept Drag & Drop!

Where were those ﬁles I want to use?
Of course, you can use Spotlight to search them, but…
If the application remembers where they are…

More Informative Word/n-gram List

Proportion, # of Files, % of Files

Lemma / Spelling variations/ multi-word

Individual File Word/n-gram Lists

Individual File/Corpus Word/n-gram Lists

More keyword extraction functions

Sample
ICNALE - Writing
Written learner English corpus
College students in Asian countries/regions
JPN, CHN, HKG, IDN, KOR, PAK, PHI, SIN, THA, TWN
Two topics
Ave. 220-230 words

Simple Keyword Extraction (LL)

TF-IDF
(term-frequency / inverse document frequency)

Mann-Whitney-U (using multiple ﬁles)

Let’s go back to keyword analysis

Keyword analysis is done
How can you check the validity?
Let’s see how well they separate the groups
Use the LL result above 6.63 (p < .01)

Principle Component Analysis (PCA)

What about different ability levels?

By the way, do you check raw data?
Not a lot of people really look at the data…
Let’s check the use of “I think”

What about a word
that are too common to be looked at?

Relative Position Plot (paragraph)
therefore / however
Japanese
Non-Japanese
ENS

Relative Position Polot (sentence)
therefore / however
Japanese
Non-Japanese
ENS

The Way Forward
Utilizing Stanford CoreNLP - dependency information

Grammatical Relationship Search

CasualConc
I just released version 2.1.0 with the updated manual
The manual is full of screenshots with over 250 pages
It is a FREEWARE
Downloadable from
https://sites.google.com/site/casualconc
Or just google ‘casualconc’

AACL 2018 - Going Beyond Simple Word-list Creation Using CasualConc

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AACL 2018 - Going Beyond Simple Word-list Creation Using CasualConc

Similar to AACL 2018 - Going Beyond Simple Word-list Creation Using CasualConc (20)

Recently uploaded

Recently uploaded (20)

AACL 2018 - Going Beyond Simple Word-list Creation Using CasualConc