Uploaded on

Foreword …

Foreword
What is Corpus?
Corpus (Latin plural corpora, English plural corpuses or corpora) is Latin for body. It may refer to: Habeas corpus, a legal mechanism to end detention of a suspect Corpus delicti, a legal term meaning "body of the crime... http://en.wikipedia.org/wiki/Corpus :
• [Corpus in Linguistics, Applied Linguistics and Corpus Linguistics]
– “Text corpus, in linguistics, a large and structured set of texts
– Speech corpus, in linguistics, a large set of speech audio files”

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
234
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
5
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Copyright by Charles Ko Ka Shing, 2012. And Know more aboutIHBCM, www.ihbcm.webs.comGoogle/Yahoo: IHBCM/Charles KoKaShingwww.charlesksko.webs.comOriginally published on SlideShare in ppt formatCorpus Report: Comparison of the popularity of universities of Hong Kong inglobeOriginally published on SlideShare in ppt formatEmail: ihbcmcharles@gmail.comAim and Objectives:• To introduce the readers and lead them to the power of CorpusLinguistics• To stimulate the readers’ curiosity of the use of corpus as a methodologyof research of any kindForewordWhat is Corpus?Corpus (Latin plural corpora, English plural corpuses or corpora) is Latin forbody. It may refer to: Habeas corpus, a legal mechanism to end detention of asuspect Corpus delicti, a legal term meaning "body of the crime...http://en.wikipedia.org/wiki/Corpus :• [Corpus in Linguistics, Applied Linguistics and Corpus Linguistics]– “Text corpus, in linguistics, a large and structured set of texts– Speech corpus, in linguistics, a large set of speech audio files”What is Corpus Linguistics?“[…] corpus linguistics is a whole system of methods and principles ofhow to apply corpora in language studies and teaching/learning, itcertainly has a theoretical status. Yet theoretical status is not theory initself…”(McEnery et al. 2006: 7f.)IntroductionIn this paper, I will use the most updated corpus, Corpus of Global Web-basedENGLISH (GloWbE)1to compare the 10 selected Hong Kong “universities”’s“fame” … in order to initially and informally report how people from around theworld mention the names of the following selected education institutions:1. The Open University of Hong Kong (OUHK)2. The Hong Kong Polytechnic University (PolyU)3. City University of Hong Kong (CityU)1The Corpus of Global Web-based English (GloWbE) http://corpus2.byu.edu/glowbe/that is developedon the corpus2 website involves 20 varieties of English.
  • 2. Copyright by Charles Ko Ka Shing, 2012. And Know more aboutIHBCM, www.ihbcm.webs.comGoogle/Yahoo: IHBCM/Charles KoKaShingwww.charlesksko.webs.comOriginally published on SlideShare in ppt format4. Hong Kong Shue Yan University (SYU)5. Hong Kong Academy for Performing Arts (HKAPA)6. The Hong Kong Institute of Education (HKIEd)7. The Chinese University of Hong Kong (CUHK)8. The University of Hong Kong (HKU)9. Hong Kong Baptist University (HKBU)10. The Hong Kong University of Science & Technology (HKUST)MethodologyI would type the abbreviation of the institutions’ names one by one insearching, to find out which institution’s name is mentioned the most amongstvarieties of English (or Englishes) around the world.Analysis and EvaluationIt is found that, the world “status” (occurrences in the GloWbE corpus) of theUniversity of Hong Kong is the highest; while the one of the Hong Kong ShueYan University is the lowest.In addition, the following table shows all the ten selected Hong Kong institutions’word frequency2, the order is as follows:1.HKU2.CUHK3.HKUST4.PolyU5.CityU6.OUHK7.HKIEd8.HKBU9.HKAPA10.SYU1124 640 590 341 164 161 78 65 28 14Above row represents the words, or word types (and 1=highest frequency;10=lowest frequency.)Below row represents the number of occurrences in GloWbE corpus.(See more data in Appendix 2.)2In computational linguistics, a frequency list is a sorted list of words (word types) together with theirfrequency, where frequency here usually means the number of occurrences in a given corpus.
  • 3. Copyright by Charles Ko Ka Shing, 2012. And Know more aboutIHBCM, www.ihbcm.webs.comGoogle/Yahoo: IHBCM/Charles KoKaShingwww.charlesksko.webs.comOriginally published on SlideShare in ppt formatIt is surprising that in the world of the GloWbE corpus, the OUHK has a higherranking than the HKBU and the HKIEd, although it is generally agreed that theacademic status of OUHK is lower than the two tertiary institutions (e.g., inhttp://www.4icu.org/hk/, see figure 2 below, sort by 2013 university web rankingaccording to 4icu.org: in terms of general facilities and academic support, theOUHK does not provide more enough than the HKBU and the HKIEd.)Figure 1However, it may also be interpreted that the OUHK’s promotion of the openlearning, especially during the 2012-133has been done better than the ones ofthe two universities (i.e. HKBU and HKIEd), so more people can receive itseducation by the OUHK through online and the people know its name andmention more of its name, hence the occurrence of its name in the corpus ishigher than the HKBU and HKIEd (however I did not have a peek into eachword types’ KWIC, Key Word in Context precisely so the results may not becompletely accurate that in actual case there could be other things not referringto the selected universities are encoded, or there may be other names encodedto represent the HKBU and the HKIEd.4)In the future, there is definitely the room for potential scholars to research onthe most mentioned name of universities in Hong Kong in each variety ofEnglishes, creating an extra dimension into the research; or even theresearchers can conduct research on the most mentioned name of world3The GloWbE corpus is released in April, 2013, and it is a 2012-2013 corpus. (http://corpus.byu.edu/)4N.B. in the whole process of data collecting, I did not use any wildcards, and any use of tag set is notinvolved.
  • 4. Copyright by Charles Ko Ka Shing, 2012. And Know more aboutIHBCM, www.ihbcm.webs.comGoogle/Yahoo: IHBCM/Charles KoKaShingwww.charlesksko.webs.comOriginally published on SlideShare in ppt formatuniversities in corpus, further increasing one more probable dimension.Limitation and ConclusionI want to clarify, this is NOT an academic report, or any form of article, which itjust wants to stimulate the readers’ curiosity of the use of corpus as amethodology of research of any kind. The report may only succeed to concludethat the names of OUHK, HKIEd, HKBU, HKAPA, and SYU are not quitefamiliar on the stage of academicians in the world, or say at least within the 10selected institutions in GloWbE corpus.ReferenceMcEnery, T., Xiao, R. & Tono, Y. (2006).Corpus-based language studies: anadvanced resource book. London/New York: Routledge.Appendices
  • 5. Copyright by Charles Ko Ka Shing, 2012. And Know more aboutIHBCM, www.ihbcm.webs.comGoogle/Yahoo: IHBCM/Charles KoKaShingwww.charlesksko.webs.comOriginally published on SlideShare in ppt formatAppendix 1Find Corpora on Yahoo!http://tw.search.yahoo.com/search?fr=fp-tab-web-t&ei=UTF-8&p=corpusExample:The Corpus of Contemporary American English (COCA) is the largestfreely-available corpus of English, and the only large and balancedcorpus of American English. The corpus was created by Mark Davies ofBrigham Young University, and it is used by tens of thousands of usersevery month (linguists, teachers, translators, and other researchers).COCA is also related to other large corpora that we have created.Source: http://corpus.byu.edu/coca/Appendix 2Some more data ofHKAPACITYUPOLYUHKUOUHK