SlideShare a Scribd company logo
Towards the implementation of a refined data model for a zulu machine-readable lexicon Ronell van der Merwe Laurette Pretorius Sonja Bosch 1
Goal ...  to develop .... a complete data repository (database) Bantu languages Machine Readable (MR) lexicon applications: finite state morphological analyser, .... 2
What we will be discussing .... Brief overview Lexicon update framework Data model Implementation approaches Conclusion  Future work 3
1. Introduction           - Comprehensive MR lexicon Update framework ZULU corpora Paper Dict 4
2. Lexicon Update Framework Morphological Analyser ZulMorph Corpus apply failures successes rebuild Guesser MR lexicon update new stems/roots 5
Purpose of MR Lexicon MorphologicalAnalyser Apps ZulMorph MR lexicon used Repository of all lexical information re-used Apps 6 Morphosyntactic information: Zulu morphotactics Morphophonological alternation rules Embedded stem/root lexicon
Guesser variant Phonologically possible  stems /roots Guesser identify new      candidate(s) MR lexicon update / include 7
3. Data Model - development 8 *      0 or more +     1 or more |     optional         leaf node
3.1 Verbal extensions Suffixing extensions to verb root: Applied Causative Intensive Neuter Passive Reciprocal 9 *      0 or more +     1 or more |     optional         leaf node
10
11
12
3.2 Deverbatives verb / extended verb root noun class prefix deverbative  suffix  (-o- | -i- |-a-| -e-| -u-) 13 optional: nominal suffix (Aug | Dim | Loc | Fem)
14
15
4. Towards implementation 16 Morphological Analyser ZulMorph MR lexicon The capturing of the data must be: rigorous systematic appropriately structured To ensure that: data exchange is consistent
Considerations in the choice of database... type of data different views of data different types of access to data 17
Consideration: Type of Data semi-structured allows for recursion n-depth structures  18
Consideration: views and access Rebuilding the Morphological Analyser: view:  morphological structure of all word roots/stems access:  sequential 19
XML – enabled database 20 Object-Oriented Database Relational Database <root>bon</root> <verbfeatures> ....... </verbfeatures>
Native XML database (NXD) 21 Native XML DB XQuery XUpdate <root>bon</root> <verbfeatures> ....... </verbfeatures>
XML enabled database:	Object-Oriented 22
NXD and OO Database satisfy the implementation considerations: Both ... are suitable for our type of data: semi-structured , recursive satisfy the type of access and view: sequential access to morphological information of all word roots/stems  23
Conclusion Refinement of a data model for the Zulu MR lexicon verbal extensions and deverbatives MR Lexicon embedded in an update framework Native XML and OO Databases possible approaches to implementation 24
Future work Development and evaluation of prototypes for the MR Lexicon (Zulu) Software for semi-automating the lexicon update framework Bootstrapping of MR Lexicon for other Bantu languages 25

More Related Content

What's hot

Text mining
Text miningText mining
Text mining
Koshy Geoji
 
Lesson plan proforma database management system
Lesson plan proforma database management systemLesson plan proforma database management system
Lesson plan proforma database management system
SANTOSH RATH
 
Open Data Mashups: linking fragments into mosaics
Open Data Mashups: linking fragments into mosaicsOpen Data Mashups: linking fragments into mosaics
Open Data Mashups: linking fragments into mosaics
phduchesne
 
Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrieval
mghgk
 
Improving Document Clustering by Eliminating Unnatural Language
Improving Document Clustering by Eliminating Unnatural LanguageImproving Document Clustering by Eliminating Unnatural Language
Improving Document Clustering by Eliminating Unnatural Language
Jinho Choi
 
Role of Text Mining in Search Engine
Role of Text Mining in Search EngineRole of Text Mining in Search Engine
Role of Text Mining in Search Engine
Jay R Modi
 
Week12
Week12Week12
Week12
Esha Meher
 
Union from C and Data Strutures
Union from C and Data StruturesUnion from C and Data Strutures
Union from C and Data Strutures
Acad
 
Presentation Elpub 2013
Presentation   Elpub 2013Presentation   Elpub 2013
Presentation Elpub 2013
Helder Firmino
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Estelle Delpech
 
Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text Mining
Michel Bruley
 
Basics
BasicsBasics
Basics
Rajendran
 
A Mathematical Approach to Ontology Authoring and Documentation
A Mathematical Approach to Ontology Authoring and DocumentationA Mathematical Approach to Ontology Authoring and Documentation
A Mathematical Approach to Ontology Authoring and Documentation
Christoph Lange
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
Patrice Bellot - Aix-Marseille Université / CNRS (LIS, INS2I)
 
Seaform Slides in VLDB 2010 PhD Workshop
Seaform Slides in VLDB 2010 PhD WorkshopSeaform Slides in VLDB 2010 PhD Workshop
Seaform Slides in VLDB 2010 PhD Workshop
Hao Wu
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)
Cornelius Puschmann
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extraction
guest0edcaf
 
Primary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPrimary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyana
Puneet Kulyana
 
Introduction of structure (2)
Introduction of structure (2)Introduction of structure (2)
Introduction of structure (2)
Jatin Sharma
 

What's hot (19)

Text mining
Text miningText mining
Text mining
 
Lesson plan proforma database management system
Lesson plan proforma database management systemLesson plan proforma database management system
Lesson plan proforma database management system
 
Open Data Mashups: linking fragments into mosaics
Open Data Mashups: linking fragments into mosaicsOpen Data Mashups: linking fragments into mosaics
Open Data Mashups: linking fragments into mosaics
 
Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrieval
 
Improving Document Clustering by Eliminating Unnatural Language
Improving Document Clustering by Eliminating Unnatural LanguageImproving Document Clustering by Eliminating Unnatural Language
Improving Document Clustering by Eliminating Unnatural Language
 
Role of Text Mining in Search Engine
Role of Text Mining in Search EngineRole of Text Mining in Search Engine
Role of Text Mining in Search Engine
 
Week12
Week12Week12
Week12
 
Union from C and Data Strutures
Union from C and Data StruturesUnion from C and Data Strutures
Union from C and Data Strutures
 
Presentation Elpub 2013
Presentation   Elpub 2013Presentation   Elpub 2013
Presentation Elpub 2013
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
 
Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text Mining
 
Basics
BasicsBasics
Basics
 
A Mathematical Approach to Ontology Authoring and Documentation
A Mathematical Approach to Ontology Authoring and DocumentationA Mathematical Approach to Ontology Authoring and Documentation
A Mathematical Approach to Ontology Authoring and Documentation
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
Seaform Slides in VLDB 2010 PhD Workshop
Seaform Slides in VLDB 2010 PhD WorkshopSeaform Slides in VLDB 2010 PhD Workshop
Seaform Slides in VLDB 2010 PhD Workshop
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)
 
Textmining Information Extraction
Textmining Information ExtractionTextmining Information Extraction
Textmining Information Extraction
 
Primary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPrimary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyana
 
Introduction of structure (2)
Introduction of structure (2)Introduction of structure (2)
Introduction of structure (2)
 

Viewers also liked

NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
Dimitris Kontokostas
 
Sensors, Wearables and the Internet of Things: A Revolution in the Making
Sensors, Wearables and the Internet of Things: A Revolution in the MakingSensors, Wearables and the Internet of Things: A Revolution in the Making
Sensors, Wearables and the Internet of Things: A Revolution in the Making
Matt Turck
 
Hardware Startups: The VC Perspective
Hardware Startups: The VC PerspectiveHardware Startups: The VC Perspective
Hardware Startups: The VC Perspective
Matt Turck
 
Big data landscape v 3.0 - Matt Turck (FirstMark)
Big data landscape v 3.0 - Matt Turck (FirstMark) Big data landscape v 3.0 - Matt Turck (FirstMark)
Big data landscape v 3.0 - Matt Turck (FirstMark)
Matt Turck
 
The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)
The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)
The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)
Matt Turck
 
Building an AI Startup: Realities & Tactics
Building an AI Startup: Realities & TacticsBuilding an AI Startup: Realities & Tactics
Building an AI Startup: Realities & Tactics
Matt Turck
 

Viewers also liked (6)

NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 
Sensors, Wearables and the Internet of Things: A Revolution in the Making
Sensors, Wearables and the Internet of Things: A Revolution in the MakingSensors, Wearables and the Internet of Things: A Revolution in the Making
Sensors, Wearables and the Internet of Things: A Revolution in the Making
 
Hardware Startups: The VC Perspective
Hardware Startups: The VC PerspectiveHardware Startups: The VC Perspective
Hardware Startups: The VC Perspective
 
Big data landscape v 3.0 - Matt Turck (FirstMark)
Big data landscape v 3.0 - Matt Turck (FirstMark) Big data landscape v 3.0 - Matt Turck (FirstMark)
Big data landscape v 3.0 - Matt Turck (FirstMark)
 
The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)
The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)
The Astonishing Resurrection of AI (A Primer on Artificial Intelligence)
 
Building an AI Startup: Realities & Tactics
Building an AI Startup: Realities & TacticsBuilding an AI Startup: Realities & Tactics
Building an AI Startup: Realities & Tactics
 

Similar to Towards the implementation of a refined data model for a Zulu machine-readable lexicon

INTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMINTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAM
ijcsa
 
2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal
Deborah McGuinness
 
07 04-06
07 04-0607 04-06
07 04-06
Gouranga123
 
Using Semantic and Domain-based Information in CLIR Systems
Using Semantic and Domain-based Information in CLIR SystemsUsing Semantic and Domain-based Information in CLIR Systems
Using Semantic and Domain-based Information in CLIR Systems
Mauro Dragoni
 
Accessing database using nlp
Accessing database using nlpAccessing database using nlp
Accessing database using nlp
eSAT Journals
 
Accessing database using nlp
Accessing database using nlpAccessing database using nlp
Accessing database using nlp
eSAT Publishing House
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paper
DBOnto
 
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual DictionariesOpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
RDF2Rule PRESENTATION
RDF2Rule PRESENTATIONRDF2Rule PRESENTATION
RDF2Rule PRESENTATION
Efrah Shakir
 
Research Developments and Directions in Speech Recognition and ...
Research Developments and Directions in Speech Recognition and ...Research Developments and Directions in Speech Recognition and ...
Research Developments and Directions in Speech Recognition and ...
butest
 
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early AdoptersApril 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
National Information Standards Organization (NISO)
 
WORD RECOGNITION MASLP
WORD RECOGNITION MASLPWORD RECOGNITION MASLP
WORD RECOGNITION MASLP
HimaniBansal15
 
WP3 Further specification of Functionality and Interoperability - Gradmann
WP3 Further specification of Functionality and Interoperability - GradmannWP3 Further specification of Functionality and Interoperability - Gradmann
WP3 Further specification of Functionality and Interoperability - Gradmann
Europeana
 
Reasoning on the Semantic Web
Reasoning on the Semantic WebReasoning on the Semantic Web
Reasoning on the Semantic Web
Yannis Kalfoglou
 
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
cseij
 
Benchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked DataBenchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked Data
Graph-TA
 
Presentation of HOBBIT's versioning benchmark at Graph-TA
Presentation of HOBBIT's versioning benchmark at Graph-TAPresentation of HOBBIT's versioning benchmark at Graph-TA
Presentation of HOBBIT's versioning benchmark at Graph-TA
Holistic Benchmarking of Big Linked Data
 
Matching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sourcesMatching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sources
IJwest
 
Recent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performancesRecent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performances
IJECEIAES
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
kevig
 

Similar to Towards the implementation of a refined data model for a Zulu machine-readable lexicon (20)

INTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAMINTELLIGENT QUERY PROCESSING IN MALAYALAM
INTELLIGENT QUERY PROCESSING IN MALAYALAM
 
2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal
 
07 04-06
07 04-0607 04-06
07 04-06
 
Using Semantic and Domain-based Information in CLIR Systems
Using Semantic and Domain-based Information in CLIR SystemsUsing Semantic and Domain-based Information in CLIR Systems
Using Semantic and Domain-based Information in CLIR Systems
 
Accessing database using nlp
Accessing database using nlpAccessing database using nlp
Accessing database using nlp
 
Accessing database using nlp
Accessing database using nlpAccessing database using nlp
Accessing database using nlp
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paper
 
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual DictionariesOpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
 
RDF2Rule PRESENTATION
RDF2Rule PRESENTATIONRDF2Rule PRESENTATION
RDF2Rule PRESENTATION
 
Research Developments and Directions in Speech Recognition and ...
Research Developments and Directions in Speech Recognition and ...Research Developments and Directions in Speech Recognition and ...
Research Developments and Directions in Speech Recognition and ...
 
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early AdoptersApril 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
 
WORD RECOGNITION MASLP
WORD RECOGNITION MASLPWORD RECOGNITION MASLP
WORD RECOGNITION MASLP
 
WP3 Further specification of Functionality and Interoperability - Gradmann
WP3 Further specification of Functionality and Interoperability - GradmannWP3 Further specification of Functionality and Interoperability - Gradmann
WP3 Further specification of Functionality and Interoperability - Gradmann
 
Reasoning on the Semantic Web
Reasoning on the Semantic WebReasoning on the Semantic Web
Reasoning on the Semantic Web
 
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
 
Benchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked DataBenchmarking Versioning for Big Linked Data
Benchmarking Versioning for Big Linked Data
 
Presentation of HOBBIT's versioning benchmark at Graph-TA
Presentation of HOBBIT's versioning benchmark at Graph-TAPresentation of HOBBIT's versioning benchmark at Graph-TA
Presentation of HOBBIT's versioning benchmark at Graph-TA
 
Matching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sourcesMatching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sources
 
Recent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performancesRecent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performances
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 

More from Guy De Pauw

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...
Guy De Pauw
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...
Guy De Pauw
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech Tagging
Guy De Pauw
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh Language
Guy De Pauw
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik Language
Guy De Pauw
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)
Guy De Pauw
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Guy De Pauw
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News Corpus
Guy De Pauw
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of Santome
Guy De Pauw
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Guy De Pauw
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFST
Guy De Pauw
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic Inflection
Guy De Pauw
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Guy De Pauw
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken Irish
Guy De Pauw
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 years
Guy De Pauw
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound Analysers
Guy De Pauw
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource Development
Guy De Pauw
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá Characters
Guy De Pauw
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation System
Guy De Pauw
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription System
Guy De Pauw
 

More from Guy De Pauw (20)

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech Tagging
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh Language
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik Language
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News Corpus
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of Santome
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFST
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic Inflection
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken Irish
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 years
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound Analysers
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource Development
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá Characters
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation System
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription System
 

Recently uploaded

Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 

Recently uploaded (20)

Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 

Towards the implementation of a refined data model for a Zulu machine-readable lexicon

  • 1. Towards the implementation of a refined data model for a zulu machine-readable lexicon Ronell van der Merwe Laurette Pretorius Sonja Bosch 1
  • 2. Goal ... to develop .... a complete data repository (database) Bantu languages Machine Readable (MR) lexicon applications: finite state morphological analyser, .... 2
  • 3. What we will be discussing .... Brief overview Lexicon update framework Data model Implementation approaches Conclusion Future work 3
  • 4. 1. Introduction - Comprehensive MR lexicon Update framework ZULU corpora Paper Dict 4
  • 5. 2. Lexicon Update Framework Morphological Analyser ZulMorph Corpus apply failures successes rebuild Guesser MR lexicon update new stems/roots 5
  • 6. Purpose of MR Lexicon MorphologicalAnalyser Apps ZulMorph MR lexicon used Repository of all lexical information re-used Apps 6 Morphosyntactic information: Zulu morphotactics Morphophonological alternation rules Embedded stem/root lexicon
  • 7. Guesser variant Phonologically possible stems /roots Guesser identify new candidate(s) MR lexicon update / include 7
  • 8. 3. Data Model - development 8 * 0 or more + 1 or more | optional leaf node
  • 9. 3.1 Verbal extensions Suffixing extensions to verb root: Applied Causative Intensive Neuter Passive Reciprocal 9 * 0 or more + 1 or more | optional leaf node
  • 10. 10
  • 11. 11
  • 12. 12
  • 13. 3.2 Deverbatives verb / extended verb root noun class prefix deverbative suffix (-o- | -i- |-a-| -e-| -u-) 13 optional: nominal suffix (Aug | Dim | Loc | Fem)
  • 14. 14
  • 15. 15
  • 16. 4. Towards implementation 16 Morphological Analyser ZulMorph MR lexicon The capturing of the data must be: rigorous systematic appropriately structured To ensure that: data exchange is consistent
  • 17. Considerations in the choice of database... type of data different views of data different types of access to data 17
  • 18. Consideration: Type of Data semi-structured allows for recursion n-depth structures 18
  • 19. Consideration: views and access Rebuilding the Morphological Analyser: view: morphological structure of all word roots/stems access: sequential 19
  • 20. XML – enabled database 20 Object-Oriented Database Relational Database <root>bon</root> <verbfeatures> ....... </verbfeatures>
  • 21. Native XML database (NXD) 21 Native XML DB XQuery XUpdate <root>bon</root> <verbfeatures> ....... </verbfeatures>
  • 23. NXD and OO Database satisfy the implementation considerations: Both ... are suitable for our type of data: semi-structured , recursive satisfy the type of access and view: sequential access to morphological information of all word roots/stems 23
  • 24. Conclusion Refinement of a data model for the Zulu MR lexicon verbal extensions and deverbatives MR Lexicon embedded in an update framework Native XML and OO Databases possible approaches to implementation 24
  • 25. Future work Development and evaluation of prototypes for the MR Lexicon (Zulu) Software for semi-automating the lexicon update framework Bootstrapping of MR Lexicon for other Bantu languages 25