Are the results of your corpus
research really reliable?
Getting automatic result analysis on
GICR.
Tatiana Shavrina, Daniil Selegey
AINL FRUCT, SPb, 12.11.2016
Big Corpora Problem:
1. Billions of words, mostly coming from
social media
2. Getting just the IPM and search
results in KWIC format doesn’t tell
you if the results are biased
3. A lot of metatext attributes – URLs,
doc IDs, author IDs, region, gender,
genre etc. – all are potential source
of bias
Users need corpus tools to see all statistics of the
search area to check for homogeneity with the
whole corpus.
Our solution:
Search results analysis right in the interface!
See you at our
Demo stand!

AINL 2016: Shavrina, Selegey

  • 1.
    Are the resultsof your corpus research really reliable? Getting automatic result analysis on GICR. Tatiana Shavrina, Daniil Selegey AINL FRUCT, SPb, 12.11.2016
  • 2.
    Big Corpora Problem: 1.Billions of words, mostly coming from social media 2. Getting just the IPM and search results in KWIC format doesn’t tell you if the results are biased 3. A lot of metatext attributes – URLs, doc IDs, author IDs, region, gender, genre etc. – all are potential source of bias Users need corpus tools to see all statistics of the search area to check for homogeneity with the whole corpus.
  • 3.
    Our solution: Search resultsanalysis right in the interface!
  • 4.
    See you atour Demo stand!