1. Malay Corpus
Nurul Adilla Adree 1324422
Nur Fareena Eleesha 1322422
Nur Hanisah Hamzah 1323150
Wan Aliaa Adibah Wan Omar 1327062
2. Definition of Malay Corpus
AMalaycorpusisacollectionoftextsofwritten(orspoken)language presentedin
electronicformanddictionary.
It’sacollectionofMalaywordsusedforlinguisticanalyses andeducational
purposes
MalayCorpusprovidestheevidenceofhowlanguage isusedinrealsituations,
fromwhichlexicographers canwriteaccurateandmeaningfuldictionaryentries.
3. Objectives
Developed suitable teaching and educational
materials.
The representativeness of its data and the
flexibility of the search criteria, which hence
ease the transferring and manipulation of data
for the practical use.
4. Specific educational and
research functions
The provision of suitable word set for
assessing reading and writing skills
The selection of suitable word choice for
teaching and intervention activities
The selection of appropriate vocabulary to be
used in literature.
5. The three basic morphological
operations of Malay Corpus are:
Affixation
There are three types of
affixes: prefixes, suffixes
and infixes.
• Prefixes include me-, pe-
, be-, ter-, se-, ke and di-,
• Suffixes include -i, -kan, -
nya, lah, -kah, -mu and -
ku.
• There are three infixes in
Malay: -el-, -em- and -er-
. Examples of infixation
are geletar
“shiver/tremble”,gemila
ng“bright”andgerigis“ser
rated”.
Reduplication
Three basic types of
reduplication: full
duplication, partial
duplication, and rhyming
and chiming.
• Full duplication : kuda-
kuda “trestle” (from
kuda “horse”) whilst
• Partial re-duplication
kekura “tortoise” (from
the stem kura
“tortoise”).
• Rhyming and chiming :
lauk “dish” becomes
lauk-pauk “all sorts of
dishes”
Compounding
Compounding fuses simplex
words together into single-
word compounds
A Malay example of this is
adat-istiadat “customs and
traditions”, which
constitutes a single word
but is made up of the
component words adat
“custom” and istiadat
“custom/tradition”.
6. DEVELOPMENT OF MALAY CORPUS IN
MALAYSIA
Influenced by the Brown Corpus in the 1970s.
Was lead by Dewan Bahasa dan Pustaka (DBP)
The project began in 1983 and involved the
compilation of texts for language analysis to
develop a database of two million Malay words.
Inclusion of complete old Malay texts as and
modern texts in developing the corpus.
In its early stages, the DBP corpus was designed
only for researchers. E:g, UKM-DBP corpus.
7. The corpus also use in Kamus Dewan.
New development of an online Malay lexical
and grammar database of Malay textbooks.
Kamus Besar Bahasa Melayu Dewan, is
reportedly in the making.
13. Corpus System in Malaysia
• Established by the group of Researcher for
Computer Translation at University Sains
Malaysia (USM) in 1993.
• The method searching corpus are by using :
i) Keyword
ii)Keyword + any symbols (*),(?)
ex : b*t?l = betul
14. EXAMPLE OF SOFTWARE FOR MALAY
CORPUS
• The Malay Analyze Text (MATA) can analyze :
a) word count
b) frequency of the word
c) the list number of the root words
d) the list number of the new words
e) the number of ambiguous words.
15. THE RESEARCHER AND
ORGANIZATION OF MALAY CORPUS
• The Australian National University (ANU).
• The Malaysian Language Planning Agency (Dewan
Bahasa dan Pustaka).
• Professor Ahmad Murad Merican
16. Example of Malay Corpus
1. ADAT RAJA MELAYU
AYAPAN – frequency 3
Ex : Dipertuan pun keluarlah ke balairung hendak memberi ayapan
akan mereka sekalian. Pertama yang diangkat terenang air
Hematan – frequency 1
Ex: faedahnya yang engkau dapat yang demikian itu? Pada hematan
aku, terlebih terutamanya engkau pergi berlari-lari dengan ....
17. 2. SALASILAH MELAYU DAN BUGIS
JONGOS – frequency 1
Ex : Maka menyuruhlah Jeneral Himhoff itu kepada jongosnya
mengangkatkan baginda itu air teh. Telah sudah diletakkan oleh
jongos itu di atas meja di hadapan baginda air teh itu maka Jeneral
pun bertanya kepada baginda Mayor itu, "Apa khabarnya Tuan
Mayor datang ini?" Maka jawab baginda, "Adapun sahaya
SAYOGIANYA – FREQUENCY 2
Ex : Maka sayogianya kita sediakan memang-memang akan bekal kita di
akhirat yang baka itu supaya kita tiada menyesal pada negeri akhirat.
Syahadan apabila sudah selesailah daripada pekerjaan
19. TITLE YEAR AUTHOR URL PURPOSE OF
STUDY
Malay
Interrogative
Knowledge
Corpus
2011 Fatimah Sidi
Marzanah A. Jabar
Mohd Hasan
Selamat
Abdul Azim Abdul
Ghani
Md Nasir Sulaiman
Salmi Baharom
http://thescipu
b.com/pdf/10.3
844/ajebasp.20
11.171.176
To investigate
the availability
of Malay
knowledge
representation
in online
sources of
Malay
documents
To produce
Malay
knowledge
representation
in knowledge-
base system
To identify
knowledge from
unstructured
documents
20. RESEARCH QUESTION FRAMEWORK &
METHODOLOGY
FINDINGS
How to identify and extract
Malay knowledge
representation from
unstructured documents?
Framework of analysis:
Interrogative Knowledge
Identification Framework
Methods:
1. create attributes for
corpus
2. extract lexicons from the
document collection
3. verify the lexicons entries
4. insert lexicons entries
5. extend ambiguous words
encountered
6. Refer opinion of Malay
language expert
The only interrogative
element which has shown a
significant accuracy in
identifying knowledge is
‘why’.
The interrogative elemets of
‘what’ and ‘who’ have
shown significant accuracy in
identifying and extracting
information
The reasons for differences:
quality of various formats
and styles of Malay writing
21. Example of entries of MalayIK-Corpus
Root Word Lexicon Grammatical
Information
Interrogative
Element
Status
Rumah (house)
Sejak (since)
Selidik
(research)
Rumah (house)
Sejak (since)
Penyelidik
(researcher)
Kata nama am
benda (noun)
Kata sendi
nama masa
(preposition)
Kata nama am
orang (noun)
Apa (what)
Bila (when)
Siapa (who)
1 (noun/adj)
2 (stop word)
1 (noun/adj)
22. Conclusion
• The development of MalayIK-Corpus is
important to identify knowledge in Malay
documents and to provide Malay knowledge
representation in a knowledge-base system.
Thus, it will lead to a potential increment of
sharable and reusable of the knowledge in
documents among the community
23. REFERENCES
• Granger, S. (2010). Corpus-based approaches to contrastive linguistics and
translation studies.
• Merican, A. M. (2017, June 28). Spirit of the Malay Concordance Project.
NEW STRAITS TIMES. Retrieved December 1, 2017, from
https://www.nst.com.my/education/2017/06/252769/spirit-malay-
concordance-project
• What is a corpus? | Oxford Dictionaries. (n.d.). Retrieved December 03,
2017, from https://en.oxforddictionaries.com/explore/what-is-a-corpus
• Abdul Rahim, H. (2014). CORPORA IN LANGUAGE RESEARCH IN MALAYSIA.
32(1), 1-16. Retrieved December 1, 2017, from
http://web.usm.my/km/32(Supp.1)2014/KM%2032%20Supp%201%20201
4%20-%20Art%201(1-16).pdf
• http://mcp.anu.edu.au/Q/mcp.html
• http://lamanweb.dbp.gov.my/index.php/pages/view/76?mid=61