SchemaCMD - An XML-based storage schema for the compilation of mixed-source CMD corpora
1. SchemaCMD An XML-based storage schema for the compilation of mixed-source CMD corpora Cornelius Puschmann University of Düsseldorf [email_address] Towards a Reference Corpus of Web Genres University of Birmingham 27 July 2007
13. MySQL data structure - sources - BNC top 100 words - sub-types of corp. blogs - blogs, press eds., ... - n-grams (not computed due to cost) - POS frequencies by post - post data (via RSS/Atom) - additional post statistics - tokens (depends on types) - types (string + POS)
24. SchemaCMD An XML-based storage schema for the compilation of mixed-source CMD corpora Cornelius Puschmann University of Düsseldorf [email_address] Towards a Reference Corpus of Web Genres University of Birmingham 27 July 2007