SlideShare a Scribd company logo
1 of 23
Malay Corpus
Nurul Adilla Adree 1324422
Nur Fareena Eleesha 1322422
Nur Hanisah Hamzah 1323150
Wan Aliaa Adibah Wan Omar 1327062
Definition of Malay Corpus
AMalaycorpusisacollectionoftextsofwritten(orspoken)language presentedin
electronicformanddictionary.
It’sacollectionofMalaywordsusedforlinguisticanalyses andeducational
purposes
MalayCorpusprovidestheevidenceofhowlanguage isusedinrealsituations,
fromwhichlexicographers canwriteaccurateandmeaningfuldictionaryentries.
Objectives
Developed suitable teaching and educational
materials.
The representativeness of its data and the
flexibility of the search criteria, which hence
ease the transferring and manipulation of data
for the practical use.
Specific educational and
research functions
 The provision of suitable word set for
assessing reading and writing skills
 The selection of suitable word choice for
teaching and intervention activities
 The selection of appropriate vocabulary to be
used in literature.
The three basic morphological
operations of Malay Corpus are:
 Affixation
There are three types of
affixes: prefixes, suffixes
and infixes.
• Prefixes include me-, pe-
, be-, ter-, se-, ke and di-,
• Suffixes include -i, -kan, -
nya, lah, -kah, -mu and -
ku.
• There are three infixes in
Malay: -el-, -em- and -er-
. Examples of infixation
are geletar
“shiver/tremble”,gemila
ng“bright”andgerigis“ser
rated”.
 Reduplication
Three basic types of
reduplication: full
duplication, partial
duplication, and rhyming
and chiming.
• Full duplication : kuda-
kuda “trestle” (from
kuda “horse”) whilst
• Partial re-duplication
kekura “tortoise” (from
the stem kura
“tortoise”).
• Rhyming and chiming :
lauk “dish” becomes
lauk-pauk “all sorts of
dishes”
 Compounding
Compounding fuses simplex
words together into single-
word compounds
A Malay example of this is
adat-istiadat “customs and
traditions”, which
constitutes a single word
but is made up of the
component words adat
“custom” and istiadat
“custom/tradition”.
DEVELOPMENT OF MALAY CORPUS IN
MALAYSIA
Influenced by the Brown Corpus in the 1970s.
Was lead by Dewan Bahasa dan Pustaka (DBP)
The project began in 1983 and involved the
compilation of texts for language analysis to
develop a database of two million Malay words.
Inclusion of complete old Malay texts as and
modern texts in developing the corpus.
In its early stages, the DBP corpus was designed
only for researchers. E:g, UKM-DBP corpus.
The corpus also use in Kamus Dewan.
New development of an online Malay lexical
and grammar database of Malay textbooks.
Kamus Besar Bahasa Melayu Dewan, is
reportedly in the making.
THE AVAILABLE WEBSITE FOR MALAY
CORPUS
Corpus System in Malaysia
• Established by the group of Researcher for
Computer Translation at University Sains
Malaysia (USM) in 1993.
• The method searching corpus are by using :
i) Keyword
ii)Keyword + any symbols (*),(?)
ex : b*t?l = betul
EXAMPLE OF SOFTWARE FOR MALAY
CORPUS
• The Malay Analyze Text (MATA) can analyze :
a) word count
b) frequency of the word
c) the list number of the root words
d) the list number of the new words
e) the number of ambiguous words.
THE RESEARCHER AND
ORGANIZATION OF MALAY CORPUS
• The Australian National University (ANU).
• The Malaysian Language Planning Agency (Dewan
Bahasa dan Pustaka).
• Professor Ahmad Murad Merican
Example of Malay Corpus
1. ADAT RAJA MELAYU
AYAPAN – frequency 3
Ex : Dipertuan pun keluarlah ke balairung hendak memberi ayapan
akan mereka sekalian. Pertama yang diangkat terenang air
Hematan – frequency 1
Ex: faedahnya yang engkau dapat yang demikian itu? Pada hematan
aku, terlebih terutamanya engkau pergi berlari-lari dengan ....
2. SALASILAH MELAYU DAN BUGIS
JONGOS – frequency 1
Ex : Maka menyuruhlah Jeneral Himhoff itu kepada jongosnya
mengangkatkan baginda itu air teh. Telah sudah diletakkan oleh
jongos itu di atas meja di hadapan baginda air teh itu maka Jeneral
pun bertanya kepada baginda Mayor itu, "Apa khabarnya Tuan
Mayor datang ini?" Maka jawab baginda, "Adapun sahaya
SAYOGIANYA – FREQUENCY 2
Ex : Maka sayogianya kita sediakan memang-memang akan bekal kita di
akhirat yang baka itu supaya kita tiada menyesal pada negeri akhirat.
Syahadan apabila sudah selesailah daripada pekerjaan
Literature Review
TITLE YEAR AUTHOR URL PURPOSE OF
STUDY
Malay
Interrogative
Knowledge
Corpus
2011 Fatimah Sidi
Marzanah A. Jabar
Mohd Hasan
Selamat
Abdul Azim Abdul
Ghani
Md Nasir Sulaiman
Salmi Baharom
http://thescipu
b.com/pdf/10.3
844/ajebasp.20
11.171.176
To investigate
the availability
of Malay
knowledge
representation
in online
sources of
Malay
documents
To produce
Malay
knowledge
representation
in knowledge-
base system
To identify
knowledge from
unstructured
documents
RESEARCH QUESTION FRAMEWORK &
METHODOLOGY
FINDINGS
How to identify and extract
Malay knowledge
representation from
unstructured documents?
Framework of analysis:
Interrogative Knowledge
Identification Framework
Methods:
1. create attributes for
corpus
2. extract lexicons from the
document collection
3. verify the lexicons entries
4. insert lexicons entries
5. extend ambiguous words
encountered
6. Refer opinion of Malay
language expert
The only interrogative
element which has shown a
significant accuracy in
identifying knowledge is
‘why’.
The interrogative elemets of
‘what’ and ‘who’ have
shown significant accuracy in
identifying and extracting
information
The reasons for differences:
quality of various formats
and styles of Malay writing
Example of entries of MalayIK-Corpus
Root Word Lexicon Grammatical
Information
Interrogative
Element
Status
Rumah (house)
Sejak (since)
Selidik
(research)
Rumah (house)
Sejak (since)
Penyelidik
(researcher)
Kata nama am
benda (noun)
Kata sendi
nama masa
(preposition)
Kata nama am
orang (noun)
Apa (what)
Bila (when)
Siapa (who)
1 (noun/adj)
2 (stop word)
1 (noun/adj)
Conclusion
• The development of MalayIK-Corpus is
important to identify knowledge in Malay
documents and to provide Malay knowledge
representation in a knowledge-base system.
Thus, it will lead to a potential increment of
sharable and reusable of the knowledge in
documents among the community
REFERENCES
• Granger, S. (2010). Corpus-based approaches to contrastive linguistics and
translation studies.
• Merican, A. M. (2017, June 28). Spirit of the Malay Concordance Project.
NEW STRAITS TIMES. Retrieved December 1, 2017, from
https://www.nst.com.my/education/2017/06/252769/spirit-malay-
concordance-project
• What is a corpus? | Oxford Dictionaries. (n.d.). Retrieved December 03,
2017, from https://en.oxforddictionaries.com/explore/what-is-a-corpus
• Abdul Rahim, H. (2014). CORPORA IN LANGUAGE RESEARCH IN MALAYSIA.
32(1), 1-16. Retrieved December 1, 2017, from
http://web.usm.my/km/32(Supp.1)2014/KM%2032%20Supp%201%20201
4%20-%20Art%201(1-16).pdf
• http://mcp.anu.edu.au/Q/mcp.html
• http://lamanweb.dbp.gov.my/index.php/pages/view/76?mid=61

More Related Content

What's hot

Marathi Text-To-Speech Synthesis using Natural Language Processing
Marathi Text-To-Speech Synthesis using Natural Language ProcessingMarathi Text-To-Speech Synthesis using Natural Language Processing
Marathi Text-To-Speech Synthesis using Natural Language Processingiosrjce
 
A New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in MalayalamA New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in Malayalamijcsit
 
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEkevig
 
Language Identification from a Tri-lingual Printed Document: A Simple Approach
Language Identification from a Tri-lingual Printed Document: A Simple ApproachLanguage Identification from a Tri-lingual Printed Document: A Simple Approach
Language Identification from a Tri-lingual Printed Document: A Simple ApproachIJERA Editor
 
Determining the Effectiveness of the Developed Prototype That Translate Pakis...
Determining the Effectiveness of the Developed Prototype That Translate Pakis...Determining the Effectiveness of the Developed Prototype That Translate Pakis...
Determining the Effectiveness of the Developed Prototype That Translate Pakis...Premier Publishers
 
Code-Switching in Urdu Books of Punjab Text Book Board, Lahore, Pakistan
Code-Switching in Urdu Books of Punjab Text Book Board, Lahore, PakistanCode-Switching in Urdu Books of Punjab Text Book Board, Lahore, Pakistan
Code-Switching in Urdu Books of Punjab Text Book Board, Lahore, PakistanBahram Kazemian
 
MACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGE
MACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGEMACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGE
MACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGEJomy Jose
 
A Review on Marathi Language Speech Database Development for Automatic Speech...
A Review on Marathi Language Speech Database Development for Automatic Speech...A Review on Marathi Language Speech Database Development for Automatic Speech...
A Review on Marathi Language Speech Database Development for Automatic Speech...IJERA Editor
 
Development of morphological analyzer for hindi
Development of morphological analyzer for hindiDevelopment of morphological analyzer for hindi
Development of morphological analyzer for hindiMade Artha
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...Syeful Islam
 

What's hot (16)

Hadia c.v
Hadia c.vHadia c.v
Hadia c.v
 
Marathi Text-To-Speech Synthesis using Natural Language Processing
Marathi Text-To-Speech Synthesis using Natural Language ProcessingMarathi Text-To-Speech Synthesis using Natural Language Processing
Marathi Text-To-Speech Synthesis using Natural Language Processing
 
**JUNK** (no subject)
**JUNK** (no subject)**JUNK** (no subject)
**JUNK** (no subject)
 
A New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in MalayalamA New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in Malayalam
 
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
 
Language Identification from a Tri-lingual Printed Document: A Simple Approach
Language Identification from a Tri-lingual Printed Document: A Simple ApproachLanguage Identification from a Tri-lingual Printed Document: A Simple Approach
Language Identification from a Tri-lingual Printed Document: A Simple Approach
 
Determining the Effectiveness of the Developed Prototype That Translate Pakis...
Determining the Effectiveness of the Developed Prototype That Translate Pakis...Determining the Effectiveness of the Developed Prototype That Translate Pakis...
Determining the Effectiveness of the Developed Prototype That Translate Pakis...
 
Code-Switching in Urdu Books of Punjab Text Book Board, Lahore, Pakistan
Code-Switching in Urdu Books of Punjab Text Book Board, Lahore, PakistanCode-Switching in Urdu Books of Punjab Text Book Board, Lahore, Pakistan
Code-Switching in Urdu Books of Punjab Text Book Board, Lahore, Pakistan
 
MACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGE
MACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGEMACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGE
MACHINE TRANSLATION WITH SPECIAL REFERENCE TO MALAYALAM LANGUAGE
 
A Review on Marathi Language Speech Database Development for Automatic Speech...
A Review on Marathi Language Speech Database Development for Automatic Speech...A Review on Marathi Language Speech Database Development for Automatic Speech...
A Review on Marathi Language Speech Database Development for Automatic Speech...
 
Development of morphological analyzer for hindi
Development of morphological analyzer for hindiDevelopment of morphological analyzer for hindi
Development of morphological analyzer for hindi
 
Cf32516518
Cf32516518Cf32516518
Cf32516518
 
551 466-472
551 466-472551 466-472
551 466-472
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
 
Statistical Analysis Of Myanmar Words On The World Wide Web For+ Search Engin...
Statistical Analysis Of Myanmar Words On The World Wide Web For+ Search Engin...Statistical Analysis Of Myanmar Words On The World Wide Web For+ Search Engin...
Statistical Analysis Of Myanmar Words On The World Wide Web For+ Search Engin...
 

Similar to Adilla's group corpus linguistic sec2

Developing a Checklist for Arabic for Academic Purposes (AAP) Textbooks Evalu...
Developing a Checklist for Arabic for Academic Purposes (AAP) Textbooks Evalu...Developing a Checklist for Arabic for Academic Purposes (AAP) Textbooks Evalu...
Developing a Checklist for Arabic for Academic Purposes (AAP) Textbooks Evalu...ufo_ana
 
LEXICOGRAPHY
LEXICOGRAPHY LEXICOGRAPHY
LEXICOGRAPHY mimisy
 
Comp app lexicography
Comp app lexicographyComp app lexicography
Comp app lexicographysyila239
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Pascual Pérez-Paredes
 
lexicography
lexicographylexicography
lexicographyayfa
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguisticsRaul Vargas
 
Summarizing 4 Journal Articles about English textbook evaluation
Summarizing 4 Journal Articles about English textbook evaluation Summarizing 4 Journal Articles about English textbook evaluation
Summarizing 4 Journal Articles about English textbook evaluation Samialsadi1
 
A summarise of Extensive Reading
A summarise of Extensive ReadingA summarise of Extensive Reading
A summarise of Extensive ReadingSoraya Ghoddousi
 
Teaching ESP Reading Skills to Students of Library and Information Science at...
Teaching ESP Reading Skills to Students of Library and Information Science at...Teaching ESP Reading Skills to Students of Library and Information Science at...
Teaching ESP Reading Skills to Students of Library and Information Science at...NuioKila
 
ESP_ENGLISH_FOR_SPECIFIC_PURPOSES.pptx
ESP_ENGLISH_FOR_SPECIFIC_PURPOSES.pptxESP_ENGLISH_FOR_SPECIFIC_PURPOSES.pptx
ESP_ENGLISH_FOR_SPECIFIC_PURPOSES.pptxMarceloSpitzner1
 
Developing Teaching Materials with Authentic Data and Corpus Analysis Tools
Developing Teaching Materials with Authentic Data and Corpus Analysis ToolsDeveloping Teaching Materials with Authentic Data and Corpus Analysis Tools
Developing Teaching Materials with Authentic Data and Corpus Analysis ToolsCALPER
 
A Pragmatic Analysis Of Pragmatics Of Speech Acts In English Language Classro...
A Pragmatic Analysis Of Pragmatics Of Speech Acts In English Language Classro...A Pragmatic Analysis Of Pragmatics Of Speech Acts In English Language Classro...
A Pragmatic Analysis Of Pragmatics Of Speech Acts In English Language Classro...Sandra Long
 
Scaffolding_Strategies_for_ELLs.pdf
Scaffolding_Strategies_for_ELLs.pdfScaffolding_Strategies_for_ELLs.pdf
Scaffolding_Strategies_for_ELLs.pdfssuser5aec2f1
 
5 relevance of annotated corpus
5 relevance of annotated corpus5 relevance of annotated corpus
5 relevance of annotated corpusThennarasuSakkan
 
Corpus approaches to discourse analysis
Corpus approaches to discourse analysisCorpus approaches to discourse analysis
Corpus approaches to discourse analysisAseel K. Mahmood
 

Similar to Adilla's group corpus linguistic sec2 (20)

SLIDE ASSIGNMENT 3
SLIDE ASSIGNMENT 3SLIDE ASSIGNMENT 3
SLIDE ASSIGNMENT 3
 
Developing a Checklist for Arabic for Academic Purposes (AAP) Textbooks Evalu...
Developing a Checklist for Arabic for Academic Purposes (AAP) Textbooks Evalu...Developing a Checklist for Arabic for Academic Purposes (AAP) Textbooks Evalu...
Developing a Checklist for Arabic for Academic Purposes (AAP) Textbooks Evalu...
 
LEXICOGRAPHY
LEXICOGRAPHY LEXICOGRAPHY
LEXICOGRAPHY
 
Graded assignment #3
Graded assignment #3Graded assignment #3
Graded assignment #3
 
Lexicography
 Lexicography Lexicography
Lexicography
 
Comp app lexicography
Comp app lexicographyComp app lexicography
Comp app lexicography
 
Lexicography
 Lexicography Lexicography
Lexicography
 
Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"Developing corpus-based resources for language learning: looking back in "hope"
Developing corpus-based resources for language learning: looking back in "hope"
 
The teaching of efl reading
The teaching of efl readingThe teaching of efl reading
The teaching of efl reading
 
lexicography
lexicographylexicography
lexicography
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Summarizing 4 Journal Articles about English textbook evaluation
Summarizing 4 Journal Articles about English textbook evaluation Summarizing 4 Journal Articles about English textbook evaluation
Summarizing 4 Journal Articles about English textbook evaluation
 
A summarise of Extensive Reading
A summarise of Extensive ReadingA summarise of Extensive Reading
A summarise of Extensive Reading
 
Teaching ESP Reading Skills to Students of Library and Information Science at...
Teaching ESP Reading Skills to Students of Library and Information Science at...Teaching ESP Reading Skills to Students of Library and Information Science at...
Teaching ESP Reading Skills to Students of Library and Information Science at...
 
ESP_ENGLISH_FOR_SPECIFIC_PURPOSES.pptx
ESP_ENGLISH_FOR_SPECIFIC_PURPOSES.pptxESP_ENGLISH_FOR_SPECIFIC_PURPOSES.pptx
ESP_ENGLISH_FOR_SPECIFIC_PURPOSES.pptx
 
Developing Teaching Materials with Authentic Data and Corpus Analysis Tools
Developing Teaching Materials with Authentic Data and Corpus Analysis ToolsDeveloping Teaching Materials with Authentic Data and Corpus Analysis Tools
Developing Teaching Materials with Authentic Data and Corpus Analysis Tools
 
A Pragmatic Analysis Of Pragmatics Of Speech Acts In English Language Classro...
A Pragmatic Analysis Of Pragmatics Of Speech Acts In English Language Classro...A Pragmatic Analysis Of Pragmatics Of Speech Acts In English Language Classro...
A Pragmatic Analysis Of Pragmatics Of Speech Acts In English Language Classro...
 
Scaffolding_Strategies_for_ELLs.pdf
Scaffolding_Strategies_for_ELLs.pdfScaffolding_Strategies_for_ELLs.pdf
Scaffolding_Strategies_for_ELLs.pdf
 
5 relevance of annotated corpus
5 relevance of annotated corpus5 relevance of annotated corpus
5 relevance of annotated corpus
 
Corpus approaches to discourse analysis
Corpus approaches to discourse analysisCorpus approaches to discourse analysis
Corpus approaches to discourse analysis
 

Recently uploaded

Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 

Recently uploaded (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 

Adilla's group corpus linguistic sec2

  • 1. Malay Corpus Nurul Adilla Adree 1324422 Nur Fareena Eleesha 1322422 Nur Hanisah Hamzah 1323150 Wan Aliaa Adibah Wan Omar 1327062
  • 2. Definition of Malay Corpus AMalaycorpusisacollectionoftextsofwritten(orspoken)language presentedin electronicformanddictionary. It’sacollectionofMalaywordsusedforlinguisticanalyses andeducational purposes MalayCorpusprovidestheevidenceofhowlanguage isusedinrealsituations, fromwhichlexicographers canwriteaccurateandmeaningfuldictionaryentries.
  • 3. Objectives Developed suitable teaching and educational materials. The representativeness of its data and the flexibility of the search criteria, which hence ease the transferring and manipulation of data for the practical use.
  • 4. Specific educational and research functions  The provision of suitable word set for assessing reading and writing skills  The selection of suitable word choice for teaching and intervention activities  The selection of appropriate vocabulary to be used in literature.
  • 5. The three basic morphological operations of Malay Corpus are:  Affixation There are three types of affixes: prefixes, suffixes and infixes. • Prefixes include me-, pe- , be-, ter-, se-, ke and di-, • Suffixes include -i, -kan, - nya, lah, -kah, -mu and - ku. • There are three infixes in Malay: -el-, -em- and -er- . Examples of infixation are geletar “shiver/tremble”,gemila ng“bright”andgerigis“ser rated”.  Reduplication Three basic types of reduplication: full duplication, partial duplication, and rhyming and chiming. • Full duplication : kuda- kuda “trestle” (from kuda “horse”) whilst • Partial re-duplication kekura “tortoise” (from the stem kura “tortoise”). • Rhyming and chiming : lauk “dish” becomes lauk-pauk “all sorts of dishes”  Compounding Compounding fuses simplex words together into single- word compounds A Malay example of this is adat-istiadat “customs and traditions”, which constitutes a single word but is made up of the component words adat “custom” and istiadat “custom/tradition”.
  • 6. DEVELOPMENT OF MALAY CORPUS IN MALAYSIA Influenced by the Brown Corpus in the 1970s. Was lead by Dewan Bahasa dan Pustaka (DBP) The project began in 1983 and involved the compilation of texts for language analysis to develop a database of two million Malay words. Inclusion of complete old Malay texts as and modern texts in developing the corpus. In its early stages, the DBP corpus was designed only for researchers. E:g, UKM-DBP corpus.
  • 7. The corpus also use in Kamus Dewan. New development of an online Malay lexical and grammar database of Malay textbooks. Kamus Besar Bahasa Melayu Dewan, is reportedly in the making.
  • 8. THE AVAILABLE WEBSITE FOR MALAY CORPUS
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. Corpus System in Malaysia • Established by the group of Researcher for Computer Translation at University Sains Malaysia (USM) in 1993. • The method searching corpus are by using : i) Keyword ii)Keyword + any symbols (*),(?) ex : b*t?l = betul
  • 14. EXAMPLE OF SOFTWARE FOR MALAY CORPUS • The Malay Analyze Text (MATA) can analyze : a) word count b) frequency of the word c) the list number of the root words d) the list number of the new words e) the number of ambiguous words.
  • 15. THE RESEARCHER AND ORGANIZATION OF MALAY CORPUS • The Australian National University (ANU). • The Malaysian Language Planning Agency (Dewan Bahasa dan Pustaka). • Professor Ahmad Murad Merican
  • 16. Example of Malay Corpus 1. ADAT RAJA MELAYU AYAPAN – frequency 3 Ex : Dipertuan pun keluarlah ke balairung hendak memberi ayapan akan mereka sekalian. Pertama yang diangkat terenang air Hematan – frequency 1 Ex: faedahnya yang engkau dapat yang demikian itu? Pada hematan aku, terlebih terutamanya engkau pergi berlari-lari dengan ....
  • 17. 2. SALASILAH MELAYU DAN BUGIS JONGOS – frequency 1 Ex : Maka menyuruhlah Jeneral Himhoff itu kepada jongosnya mengangkatkan baginda itu air teh. Telah sudah diletakkan oleh jongos itu di atas meja di hadapan baginda air teh itu maka Jeneral pun bertanya kepada baginda Mayor itu, "Apa khabarnya Tuan Mayor datang ini?" Maka jawab baginda, "Adapun sahaya SAYOGIANYA – FREQUENCY 2 Ex : Maka sayogianya kita sediakan memang-memang akan bekal kita di akhirat yang baka itu supaya kita tiada menyesal pada negeri akhirat. Syahadan apabila sudah selesailah daripada pekerjaan
  • 19. TITLE YEAR AUTHOR URL PURPOSE OF STUDY Malay Interrogative Knowledge Corpus 2011 Fatimah Sidi Marzanah A. Jabar Mohd Hasan Selamat Abdul Azim Abdul Ghani Md Nasir Sulaiman Salmi Baharom http://thescipu b.com/pdf/10.3 844/ajebasp.20 11.171.176 To investigate the availability of Malay knowledge representation in online sources of Malay documents To produce Malay knowledge representation in knowledge- base system To identify knowledge from unstructured documents
  • 20. RESEARCH QUESTION FRAMEWORK & METHODOLOGY FINDINGS How to identify and extract Malay knowledge representation from unstructured documents? Framework of analysis: Interrogative Knowledge Identification Framework Methods: 1. create attributes for corpus 2. extract lexicons from the document collection 3. verify the lexicons entries 4. insert lexicons entries 5. extend ambiguous words encountered 6. Refer opinion of Malay language expert The only interrogative element which has shown a significant accuracy in identifying knowledge is ‘why’. The interrogative elemets of ‘what’ and ‘who’ have shown significant accuracy in identifying and extracting information The reasons for differences: quality of various formats and styles of Malay writing
  • 21. Example of entries of MalayIK-Corpus Root Word Lexicon Grammatical Information Interrogative Element Status Rumah (house) Sejak (since) Selidik (research) Rumah (house) Sejak (since) Penyelidik (researcher) Kata nama am benda (noun) Kata sendi nama masa (preposition) Kata nama am orang (noun) Apa (what) Bila (when) Siapa (who) 1 (noun/adj) 2 (stop word) 1 (noun/adj)
  • 22. Conclusion • The development of MalayIK-Corpus is important to identify knowledge in Malay documents and to provide Malay knowledge representation in a knowledge-base system. Thus, it will lead to a potential increment of sharable and reusable of the knowledge in documents among the community
  • 23. REFERENCES • Granger, S. (2010). Corpus-based approaches to contrastive linguistics and translation studies. • Merican, A. M. (2017, June 28). Spirit of the Malay Concordance Project. NEW STRAITS TIMES. Retrieved December 1, 2017, from https://www.nst.com.my/education/2017/06/252769/spirit-malay- concordance-project • What is a corpus? | Oxford Dictionaries. (n.d.). Retrieved December 03, 2017, from https://en.oxforddictionaries.com/explore/what-is-a-corpus • Abdul Rahim, H. (2014). CORPORA IN LANGUAGE RESEARCH IN MALAYSIA. 32(1), 1-16. Retrieved December 1, 2017, from http://web.usm.my/km/32(Supp.1)2014/KM%2032%20Supp%201%20201 4%20-%20Art%201(1-16).pdf • http://mcp.anu.edu.au/Q/mcp.html • http://lamanweb.dbp.gov.my/index.php/pages/view/76?mid=61