SlideShare a Scribd company logo
1 of 4
Download to read offline
 

           Exploiting Cross‐linguistic Similarities in Zulu and Xhosa 
                         Computational Morphology 
                         Laurette Pretorius                                Sonja Bosch 
                        School of Computing                       Department of African Languages 
                     University of South Africa &                    University of South Africa 
                       Meraka Institute, CSIR                          Pretoria, South Africa 
                       Pretoria, South Africa                          boschse@unisa.ac.za
                         pretol@unisa.ac.za
 
Summary 
 
       •   An existing Zulu morphological analyser prototype (ZulMorph) serves as basis for the 
           bootstrapping of a Xhosa analyser.  
       •   The investigation is structured around the morphotactics and the morphophonological 
           alternations of the languages involved.  
       •   Special attention is given to the so‐called “open” class, which represents the word root 
           lexicons for specifically nouns and verbs.  
       •   The acquisition and coverage of these lexicons prove to be crucial for the success of the 
           analysers under development.   
       •   The bootstrapped morphological analyser is applied to parallel test corpora and the results 
           are discussed.  
       •   A variety of cross‐linguistic effects is illustrated with examples from the corpora. 
 
Background on ZulMorph 
 
Xerox finite‐state toolkit:  
lexc for modelling the morphotactics;  
xfst (regular expression language) for modelling morphophonological alternations 
Word roots include 15 800 nouns, 7 600 verbs, 408 relatives, 47 adjectives, 2 735 ideophones, 176 
conjunctions.  
      
    Morphotactics (lexc)   Affixes  for  all  parts‐of‐speech  Word roots (e.g.         Rules  for  legal  combinations  and 
                             (e.g. subject & object concords,  nouns, verbs,            orders  of  morphemes  (e.g.  u‐ya‐
                             noun  class  prefixes,  verb  relatives,                   ngi‐thand‐a  and  not  *ya‐u‐a‐
                             extensions etc.)                   ideophones)             thand‐ngi)  
    Morpho‐                  Rules that determine the form of each morpheme 
    phonological             (e.g. ku‐lob‐w‐a > ku‐lotsh‐w‐a, u‐mu‐lomo > u‐m‐lomo)  
    alternations    (xfst)  
                                  Table 1: Zulu Morphological Analyser Components 

Morphotactics 
We distinguish between so‐called closed and open classes:  
   • The  open  class  accepts  the  addition  of  new  items  by  means  of  processes  such  as  borrowing, 
        coining, compounding and derivation. In the context of this paper, the open class represents word 
        roots including verb roots and noun stems.  
   • The closed class represents affixes that model the fixed morphological structure of words, as well 
        as items such as conjunctions, pronouns etc.  
 
 
 
 
 


We focus on Xhosa affixes that differ from their Zulu counterparts. A few examples are given in Table 2. 
    
Morpheme                      Zulu                                    Xhosa 
Noun Class Prefixes 
Class 1 and 3 um(u)‐          full form umu‐ with monosyllabic        um‐ with all noun stems: um‐ntu, 
                              noun stems, shortened form with         um‐fana 
                              polysyllabic noun stems: 
                              umu‐ntu, um‐fana 
Class 2a                      o‐: o‐baba                              oo‐: oo‐bawo 
Class 9                       in‐ with all noun stems:                i‐ with noun stems beginning with 
                              in‐nyama                                h, i, m, n, ny: i‐hambo  
                                                                       
Class 10                      izin‐  with  monosyllabic  and  iin‐ with polysyllabic stems: 
                              polysyllabic stems.                     iin‐dlebe 
                              izin‐ja;  izin‐dlebe 
Contracted subject concords (future tense). Examples: 
1ps                           ngo‐                                    ndo‐ 
2ps, Class 1 & 3              wo‐                                     uyo‐ 
Class 4 & 9                   yo‐                                     iyo‐ 
              Table 2. Examples of variations in Zulu and Xhosa ‘closed’ morpheme information 
 
The bootstrapping process is iterative and new information regarding dissimilar morphological 
constructions is incorporated systematically in the morphotactics component. Similarly, rules are 
adapted in a systematic manner. 
 

Morphophonological alternations 
Differences in morphophonological alternations between Zulu and Xhosa are exemplified in  
Table 3. 
 
                      Zulu                                                  Xhosa 
Class  10  class  prefix  izin‐  occurs  before  Class  10  class  prefix  izin‐  changes  to  iin‐  before 
monosyllabic  as  well  as  polysyllabic  stems,  polysyllabic stems, e.g. izinja, iindlebe 
e.g. izinja, izindlebe                                Adverb prefix na + ii > nee; e.g. neendlebe (na‐iin‐
Adverb prefix na + i > ne, e.g. nezindlebe (na‐ ndlebe) 
izin‐ndlebe) 
Palatalisation  with  passive,  diminutive  &  Palatalisation  with  passive,  diminutive  &  locative 
locative formation:                                   formation: 
 b   > tsh                                            b > ty                    
‐hlab‐w‐a > ‐hlatsh‐w‐a, intaba‐ana > intatsh‐ ‐hlab‐w‐a > ‐hlaty‐w‐a, intaba‐ana > intaty‐a na 
ana, indaba > endatsheni                              ihlobo > ehlotyeni 
ph > sh                                               ph > tsh                  
‐boph‐w‐a            > ‐bosh‐w‐a,                     –boph‐w‐a                 > ‐botsh‐w‐a,  
iphaphu‐ana      > iphash‐ana                         iphaphu‐ana       > iphatsh‐ana, 
iphaphu              > ephasheni                      usapho                > elusatsheni 
                         Table 3. Examples of variations in Zulu and Xhosa morphophonology 
 
 

 

The word root lexicons 
Zulu lexicon:  
    • Based on an extensive word list dating back to the mid 1950s, but significant 
        improvements and additions are regularly made.  
    • At present the Zulu word roots include noun stems with class information (15 759), verb 
        roots (7 567), relative stems (406), adjective stems (48), ideophones (1 360), conjunctions 
        (176).  
Xhosa lexicon:  
    • Noun stems with class information (4 959) and verb roots (5 984) extracted from various 
        recent prototype paper dictionaries whereas relative stems (27), adjective stems (17), 
        ideophones (30) and conjunctions (28) were only included as representative samples at 
        this stage. 
 

A computational approach to cross‐linguistic similarity 
    •   Application of the bootstrapped morphological analyser to parallel test corpora 
    •   Guesser variant of the morphological analyser that uses typical word root patterns for 
        identifying potential new word roots 
     
                                          Bootstrapping procedure 
                                                   




                                                                                        
        
        
        
The language resources chosen to illustrate this point are parallel corpora in the form of the South 
African Constitution. 
        
         
         
 
     
     
     
    Results of the application of the bootstrapped morphological analyser to this corpus are as follows: 
                                                                   
                                                              
                                Zulu Statistics 
                                Corpus size:    7057 types 
                                Analysed:        5748 types (81.45 %) 
                                Failures:        1309 types (18.55%) 
                                Failures analysed by guesser: 1239 types 
                                Failures not analysed by guesser: 70 types 
                                 
                                Xhosa Statistics 
                                Corpus size:   7423 types 
                                Analysed:        5380 types (72.48 %) 
                                Failures:        2043 types (27.52%) 
                                Failures analysed by guesser: 1772 types 
                                Failures not analysed by guesser: 271 types 
                                                              
             
Examples from the Zulu corpus: 
The analysis of the Zulu word ifomu ‘form’ uses the Xhosa noun stem –fomu (9/10) in the Xhosa lexicon in 
the absence of the Zulu stem: 

ifomu i[NPrePre9]fomu[Xh][NStem]
 
Examples from the Xhosa corpus: 
The analysis of the Xhosa words bephondo ‘of the province’ and esikhundleni ‘in the office’ use the Zulu 
noun stems –phondo (5/6) and –khundleni (7/8) respectively in the Zulu lexicon: 

bephondo ba[PossConc14]i[NPrePre5]li[BPre5]phondo[NStem]

esikhundleni e[LocPre]i[NPrePre7]si[BPre7]khundla[NStem]ini[LocSuf]

Examples of the guesser output from the Zulu corpus: 
The compound noun –shayamthetho (7/8) ‘legislature’ is not listed in the Zulu lexicon, but was guessed 
correctly:  

isishayamthetho i[NPrePre7]si[BPre7]shayamthetho-Guess[NStem]
                                         

Conclusion and Future Work 
        •   Zulu  informed  Xhosa  in  the  sense  that  the  systematically  developed  grammar  for  ZulMorph  was 
            directly available for the Xhosa analyser development, which significantly reduced the development 
            time for the Xhosa prototype compared to that for ZulMorph. 
        •   To a lesser extent, Xhosa also informed Zulu by providing a current (more up to date) Xhosa lexicon. 
            In addition, the guesser variant was employed in identifying possible new roots in the test corpora, 
            both for Zulu and for Xhosa. 
        •   Bootstrapping  morphological  analysers  for  languages  that  exhibit  significant  structural  and  lexical 
            similarities may be fruitfully exploited for developing analysers for lesser‐resourced languages.  
        •   Future  work  includes  the  application  of  the  approach  followed  in  this  work  to  the  other  Nguni 
            languages, namely Swati and Ndebele (Southern and Zimbabwe); the application to larger corpora, 
            and the subsequent construction of stand‐alone versions. Finally, the combined analyser could also 
            be used for (corpus‐based) quantitative studies in cross‐linguistic similarity.  

More Related Content

More from Guy De Pauw

Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusGuy De Pauw
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of SantomeGuy De Pauw
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Guy De Pauw
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTGuy De Pauw
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionGuy De Pauw
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingGuy De Pauw
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishGuy De Pauw
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsGuy De Pauw
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersGuy De Pauw
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentGuy De Pauw
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersGuy De Pauw
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemGuy De Pauw
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemGuy De Pauw
 
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Guy De Pauw
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersGuy De Pauw
 
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...Guy De Pauw
 
Amharic document clustering
Amharic document clusteringAmharic document clustering
Amharic document clusteringGuy De Pauw
 
Modeling Improved Syllabification Algorithm for Amharic
Modeling Improved Syllabification Algorithm for AmharicModeling Improved Syllabification Algorithm for Amharic
Modeling Improved Syllabification Algorithm for AmharicGuy De Pauw
 
Ge'ez Verbs Morphology and Declaration Model
Ge'ez Verbs Morphology and Declaration ModelGe'ez Verbs Morphology and Declaration Model
Ge'ez Verbs Morphology and Declaration ModelGuy De Pauw
 
Do we need linguistic knowledge for speech technology applications in African...
Do we need linguistic knowledge for speech technology applications in African...Do we need linguistic knowledge for speech technology applications in African...
Do we need linguistic knowledge for speech technology applications in African...Guy De Pauw
 

More from Guy De Pauw (20)

Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News Corpus
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of Santome
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFST
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic Inflection
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken Irish
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 years
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound Analysers
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource Development
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá Characters
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation System
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription System
 
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound Analysers
 
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
 
Amharic document clustering
Amharic document clusteringAmharic document clustering
Amharic document clustering
 
Modeling Improved Syllabification Algorithm for Amharic
Modeling Improved Syllabification Algorithm for AmharicModeling Improved Syllabification Algorithm for Amharic
Modeling Improved Syllabification Algorithm for Amharic
 
Ge'ez Verbs Morphology and Declaration Model
Ge'ez Verbs Morphology and Declaration ModelGe'ez Verbs Morphology and Declaration Model
Ge'ez Verbs Morphology and Declaration Model
 
Do we need linguistic knowledge for speech technology applications in African...
Do we need linguistic knowledge for speech technology applications in African...Do we need linguistic knowledge for speech technology applications in African...
Do we need linguistic knowledge for speech technology applications in African...
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Exploiting Cross-Linguistic Similarities in Zulu and Xhosa Computational Morphology

  • 1.   Exploiting Cross‐linguistic Similarities in Zulu and Xhosa  Computational Morphology  Laurette Pretorius  Sonja Bosch  School of Computing  Department of African Languages  University of South Africa &  University of South Africa  Meraka Institute, CSIR  Pretoria, South Africa  Pretoria, South Africa  boschse@unisa.ac.za pretol@unisa.ac.za   Summary    • An existing Zulu morphological analyser prototype (ZulMorph) serves as basis for the  bootstrapping of a Xhosa analyser.   • The investigation is structured around the morphotactics and the morphophonological  alternations of the languages involved.   • Special attention is given to the so‐called “open” class, which represents the word root  lexicons for specifically nouns and verbs.   • The acquisition and coverage of these lexicons prove to be crucial for the success of the  analysers under development.    • The bootstrapped morphological analyser is applied to parallel test corpora and the results  are discussed.   • A variety of cross‐linguistic effects is illustrated with examples from the corpora.    Background on ZulMorph    Xerox finite‐state toolkit:   lexc for modelling the morphotactics;   xfst (regular expression language) for modelling morphophonological alternations  Word roots include 15 800 nouns, 7 600 verbs, 408 relatives, 47 adjectives, 2 735 ideophones, 176  conjunctions.     Morphotactics (lexc)   Affixes  for  all  parts‐of‐speech  Word roots (e.g.  Rules  for  legal  combinations  and  (e.g. subject & object concords,  nouns, verbs,  orders  of  morphemes  (e.g.  u‐ya‐ noun  class  prefixes,  verb  relatives,  ngi‐thand‐a  and  not  *ya‐u‐a‐ extensions etc.)   ideophones)   thand‐ngi)   Morpho‐ Rules that determine the form of each morpheme  phonological  (e.g. ku‐lob‐w‐a > ku‐lotsh‐w‐a, u‐mu‐lomo > u‐m‐lomo)   alternations    (xfst)   Table 1: Zulu Morphological Analyser Components  Morphotactics  We distinguish between so‐called closed and open classes:   • The  open  class  accepts  the  addition  of  new  items  by  means  of  processes  such  as  borrowing,  coining, compounding and derivation. In the context of this paper, the open class represents word  roots including verb roots and noun stems.   • The closed class represents affixes that model the fixed morphological structure of words, as well  as items such as conjunctions, pronouns etc.      
  • 2.       We focus on Xhosa affixes that differ from their Zulu counterparts. A few examples are given in Table 2.    Morpheme  Zulu  Xhosa  Noun Class Prefixes  Class 1 and 3 um(u)‐  full form umu‐ with monosyllabic  um‐ with all noun stems: um‐ntu,    noun stems, shortened form with  um‐fana    polysyllabic noun stems:  umu‐ntu, um‐fana  Class 2a  o‐: o‐baba  oo‐: oo‐bawo  Class 9  in‐ with all noun stems:  i‐ with noun stems beginning with    in‐nyama   h, i, m, n, ny: i‐hambo     Class 10  izin‐  with  monosyllabic  and  iin‐ with polysyllabic stems:  polysyllabic stems.   iin‐dlebe  izin‐ja;  izin‐dlebe  Contracted subject concords (future tense). Examples:  1ps  ngo‐  ndo‐  2ps, Class 1 & 3  wo‐  uyo‐  Class 4 & 9  yo‐  iyo‐  Table 2. Examples of variations in Zulu and Xhosa ‘closed’ morpheme information    The bootstrapping process is iterative and new information regarding dissimilar morphological  constructions is incorporated systematically in the morphotactics component. Similarly, rules are  adapted in a systematic manner.    Morphophonological alternations  Differences in morphophonological alternations between Zulu and Xhosa are exemplified in   Table 3.    Zulu  Xhosa  Class  10  class  prefix  izin‐  occurs  before  Class  10  class  prefix  izin‐  changes  to  iin‐  before  monosyllabic  as  well  as  polysyllabic  stems,  polysyllabic stems, e.g. izinja, iindlebe  e.g. izinja, izindlebe  Adverb prefix na + ii > nee; e.g. neendlebe (na‐iin‐ Adverb prefix na + i > ne, e.g. nezindlebe (na‐ ndlebe)  izin‐ndlebe)  Palatalisation  with  passive,  diminutive  &  Palatalisation  with  passive,  diminutive  &  locative  locative formation:  formation:   b   > tsh     b > ty                     ‐hlab‐w‐a > ‐hlatsh‐w‐a, intaba‐ana > intatsh‐ ‐hlab‐w‐a > ‐hlaty‐w‐a, intaba‐ana > intaty‐a na  ana, indaba > endatsheni  ihlobo > ehlotyeni  ph > sh    ph > tsh     ‐boph‐w‐a  > ‐bosh‐w‐a,   –boph‐w‐a   > ‐botsh‐w‐a,   iphaphu‐ana      > iphash‐ana  iphaphu‐ana       > iphatsh‐ana,  iphaphu              > ephasheni  usapho                > elusatsheni  Table 3. Examples of variations in Zulu and Xhosa morphophonology   
  • 3.     The word root lexicons  Zulu lexicon:   • Based on an extensive word list dating back to the mid 1950s, but significant  improvements and additions are regularly made.   • At present the Zulu word roots include noun stems with class information (15 759), verb  roots (7 567), relative stems (406), adjective stems (48), ideophones (1 360), conjunctions  (176).   Xhosa lexicon:   • Noun stems with class information (4 959) and verb roots (5 984) extracted from various  recent prototype paper dictionaries whereas relative stems (27), adjective stems (17),  ideophones (30) and conjunctions (28) were only included as representative samples at  this stage.    A computational approach to cross‐linguistic similarity  • Application of the bootstrapped morphological analyser to parallel test corpora  • Guesser variant of the morphological analyser that uses typical word root patterns for  identifying potential new word roots    Bootstrapping procedure            The language resources chosen to illustrate this point are parallel corpora in the form of the South  African Constitution.       
  • 4.         Results of the application of the bootstrapped morphological analyser to this corpus are as follows:      Zulu Statistics  Corpus size:    7057 types  Analysed:     5748 types (81.45 %)  Failures:   1309 types (18.55%)  Failures analysed by guesser: 1239 types  Failures not analysed by guesser: 70 types    Xhosa Statistics  Corpus size:   7423 types  Analysed:    5380 types (72.48 %)  Failures:   2043 types (27.52%)  Failures analysed by guesser: 1772 types  Failures not analysed by guesser: 271 types      Examples from the Zulu corpus:  The analysis of the Zulu word ifomu ‘form’ uses the Xhosa noun stem –fomu (9/10) in the Xhosa lexicon in  the absence of the Zulu stem:  ifomu i[NPrePre9]fomu[Xh][NStem]   Examples from the Xhosa corpus:  The analysis of the Xhosa words bephondo ‘of the province’ and esikhundleni ‘in the office’ use the Zulu  noun stems –phondo (5/6) and –khundleni (7/8) respectively in the Zulu lexicon:  bephondo ba[PossConc14]i[NPrePre5]li[BPre5]phondo[NStem] esikhundleni e[LocPre]i[NPrePre7]si[BPre7]khundla[NStem]ini[LocSuf] Examples of the guesser output from the Zulu corpus:  The compound noun –shayamthetho (7/8) ‘legislature’ is not listed in the Zulu lexicon, but was guessed  correctly:   isishayamthetho i[NPrePre7]si[BPre7]shayamthetho-Guess[NStem]     Conclusion and Future Work  • Zulu  informed  Xhosa  in  the  sense  that  the  systematically  developed  grammar  for  ZulMorph  was  directly available for the Xhosa analyser development, which significantly reduced the development  time for the Xhosa prototype compared to that for ZulMorph.  • To a lesser extent, Xhosa also informed Zulu by providing a current (more up to date) Xhosa lexicon.  In addition, the guesser variant was employed in identifying possible new roots in the test corpora,  both for Zulu and for Xhosa.  • Bootstrapping  morphological  analysers  for  languages  that  exhibit  significant  structural  and  lexical  similarities may be fruitfully exploited for developing analysers for lesser‐resourced languages.   • Future  work  includes  the  application  of  the  approach  followed  in  this  work  to  the  other  Nguni  languages, namely Swati and Ndebele (Southern and Zimbabwe); the application to larger corpora,  and the subsequent construction of stand‐alone versions. Finally, the combined analyser could also  be used for (corpus‐based) quantitative studies in cross‐linguistic similarity.