The document describes a system for converting Arabic numbers to their Yoruba lexical equivalents. It discusses Yoruba numerals and their derivation using addition, subtraction and multiplication. A computational model is presented using pushdown automata to capture the number conversion. The system was implemented in Python and evaluated using Mean Opinion Score testing. Examples of number conversions like 19679 are provided to demonstrate the system.
Ancient Mesopotamian houses called ziggurats were made of mud bricks dried under the sun and were roughly 90 square meters. Egyptian houses were also constructed of mud bricks and called pers, sometimes resembling mansions with 25-30 rooms. Greek homes lacked a specific name but used sun-dried mud bricks and terra cotta or wooden roofs and were one to two stories tall. Modern Canadian houses include duplexes, apartments, and detached homes made of materials like wood, stone, metal and glass with civilians as residents, and duplex prices can reach millions depending on location.
Ancient Greek civilization began between the Ionian Sea and the Aegean Sea in a mountainous region called Hellas. This terrain influenced the development of independent city-states rather than large kingdoms. Two early civilizations, the Minoans and Mycenaeans, preceded ancient Greek civilization. The Archaic period saw the rise of poleis and the spread of Greek culture through colonization. The Classical period was defined by the growth of Athenian democracy and Spartan oligarchy and wars with Persia. The Hellenistic period began with the conquests of Philip and Alexander the Great, spreading Greek culture further and dividing Greece into successor kingdoms.
The Beginnings of Ancient Rome About 750 B.C., the villages joined together to form a city called Rome. It was ruled by kings for more than 200 years. Eventually, Rome became a republic, and the people elected representatives. These representatives formed the Senate, Rome's most powerful body of government
The Middle Kingdom of Egypt lasted from around 2030 to 1640 BCE. After a period of decentralization following the Old Kingdom, two kings helped regain order and centralize power under the pharaoh once more. Egypt engaged in increased trade during this period and a new writing system was developed. The economy was based on agriculture along the fertile Nile River valley. Art from this period depicted more realistic human figures and stories from Egyptian mythology and the afterlife. Architecture consisted primarily of simpler pyramids and temples. Eventually, foreign influences weakened royal power, leading to the Second Intermediate Period.
Ancient Greece developed out of two early civilizations - the Minoan civilization on Crete and the Mycenaean civilization on the Greek mainland. Geographic factors like mountains and islands led to the rise of independent city-states across Greece. These city-states experimented with different forms of government, with Athens developing the first democratic system and Sparta developing a totalitarian military state. The Persian Wars in the 5th century BC united the Greek city-states against a common enemy but also intensified rivalry between Athens and Sparta for dominance over Greece.
The document discusses the major developments of the Industrial Revolution from the late 18th century to early 20th century, including inventions like the steam engine, cotton gin, and sewing machine that drove mechanization and transformed manufacturing. Key figures like Samuel Slater helped bring industrialization to America by establishing early factories. The innovations of the Industrial Revolution fundamentally changed economies and societies around the world by increasing production and changing the way goods were made.
Here is a 3 sentence response:
The most important contribution made by the ancient civilizations of the Fertile Crescent was the development of bureaucracy by the Persians. By creating a standardized system of administration, taxation, and record keeping to manage their vast empire, the Persians established organizational practices that are still used in governments worldwide today. The bureaucratic structures developed by the Persians have had an enduring influence and allow modern societies to function effectively at a large scale.
Ancient Mesopotamian houses called ziggurats were made of mud bricks dried under the sun and were roughly 90 square meters. Egyptian houses were also constructed of mud bricks and called pers, sometimes resembling mansions with 25-30 rooms. Greek homes lacked a specific name but used sun-dried mud bricks and terra cotta or wooden roofs and were one to two stories tall. Modern Canadian houses include duplexes, apartments, and detached homes made of materials like wood, stone, metal and glass with civilians as residents, and duplex prices can reach millions depending on location.
Ancient Greek civilization began between the Ionian Sea and the Aegean Sea in a mountainous region called Hellas. This terrain influenced the development of independent city-states rather than large kingdoms. Two early civilizations, the Minoans and Mycenaeans, preceded ancient Greek civilization. The Archaic period saw the rise of poleis and the spread of Greek culture through colonization. The Classical period was defined by the growth of Athenian democracy and Spartan oligarchy and wars with Persia. The Hellenistic period began with the conquests of Philip and Alexander the Great, spreading Greek culture further and dividing Greece into successor kingdoms.
The Beginnings of Ancient Rome About 750 B.C., the villages joined together to form a city called Rome. It was ruled by kings for more than 200 years. Eventually, Rome became a republic, and the people elected representatives. These representatives formed the Senate, Rome's most powerful body of government
The Middle Kingdom of Egypt lasted from around 2030 to 1640 BCE. After a period of decentralization following the Old Kingdom, two kings helped regain order and centralize power under the pharaoh once more. Egypt engaged in increased trade during this period and a new writing system was developed. The economy was based on agriculture along the fertile Nile River valley. Art from this period depicted more realistic human figures and stories from Egyptian mythology and the afterlife. Architecture consisted primarily of simpler pyramids and temples. Eventually, foreign influences weakened royal power, leading to the Second Intermediate Period.
Ancient Greece developed out of two early civilizations - the Minoan civilization on Crete and the Mycenaean civilization on the Greek mainland. Geographic factors like mountains and islands led to the rise of independent city-states across Greece. These city-states experimented with different forms of government, with Athens developing the first democratic system and Sparta developing a totalitarian military state. The Persian Wars in the 5th century BC united the Greek city-states against a common enemy but also intensified rivalry between Athens and Sparta for dominance over Greece.
The document discusses the major developments of the Industrial Revolution from the late 18th century to early 20th century, including inventions like the steam engine, cotton gin, and sewing machine that drove mechanization and transformed manufacturing. Key figures like Samuel Slater helped bring industrialization to America by establishing early factories. The innovations of the Industrial Revolution fundamentally changed economies and societies around the world by increasing production and changing the way goods were made.
Here is a 3 sentence response:
The most important contribution made by the ancient civilizations of the Fertile Crescent was the development of bureaucracy by the Persians. By creating a standardized system of administration, taxation, and record keeping to manage their vast empire, the Persians established organizational practices that are still used in governments worldwide today. The bureaucratic structures developed by the Persians have had an enduring influence and allow modern societies to function effectively at a large scale.
Technological Tools for Dictionary and Corpora Building for Minority Language...Guy De Pauw
This project aims to build and maintain a lexicographical resource for French-based Creole languages through three main steps:
1) Compiling existing lexicographical resources like dictionaries into an electronic format
2) Creating corpora of Creole language texts from literary, educational and journalistic sources online
3) Maintaining the dictionary by analyzing the corpora to identify unknown words and improve the database through an iterative process.
The results will be a lexicographical database detailing variations in French-based Creoles and an annotated corpora for linguistic research.
Semi-automated extraction of morphological grammars for Nguni with special re...Guy De Pauw
This document summarizes research that semi-automatically extracted a morphological grammar for Southern Ndebele, an under-resourced language, from a general Nguni morphological analyzer bootstrapped from a Zulu analyzer. The Southern Ndebele analyzer produced surprisingly good results, showing significant similarities across Nguni languages that can accelerate documentation and resource development for these languages. The project followed best practices for encoding resources to ensure sustainability, access, and adaptability to future formats and platforms.
Resource-Light Bantu Part-of-Speech TaggingGuy De Pauw
This document proposes a bag-of-substrings approach to part-of-speech tagging for under-resourced Bantu languages using available digital dictionaries and word lists instead of large annotated corpora. Experimental results showed the technique established a low-resource, high accuracy method for bootstrapping POS tagging that compares favorably to state-of-the-art data-driven approaches. The method extracts substring features from words to train a maximum entropy classifier and bootstrap POS tagging for Bantu languages that lack extensive annotated resources.
Natural Language Processing for Amazigh LanguageGuy De Pauw
The document discusses natural language processing challenges for the Amazigh (Berber) language. It outlines Amazigh language characteristics like its writing system and complex phonology/morphology. It then describes the current state of Amazigh NLP technology, including Tifinaghe encoding, optical character recognition tools, basic processing tools like transliterators and stemmers, and language resources like corpora and dictionaries. Finally, it proposes future directions such as developing larger corpora, machine translation systems, and growing human resources for Amazigh language technology.
POS Annotated 50m Corpus of Tajik LanguageGuy De Pauw
This document presents a new 50+ million word corpus of the Tajik language, the largest available. It was created by crawling over a dozen Tajik news websites and other sources. The texts were joined and cleaned to remove duplicates. The corpus was then annotated with morphological analysis of Tajik using a new analyzer created by modifying an existing one to be faster and allow lemmatization. The analyzer recognizes over 87% of words and tags them with part of speech. This annotated corpus containing lemmas, tags and frequencies is available online through the Sketch Engine for researchers.
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Guy De Pauw
This document describes using a metagrammar called XMG to formally capture morphological generalizations of verbs in the Ikota language. It provides an XMG specification for Ikota verbal morphology that describes subject, tense, verb root, aspect, active voice, and proximity. This specification can automatically derive a lexicon of fully inflected verb forms. The methodology allows for quickly testing ideas and validating results against language data.
Tagging and Verifying an Amharic News CorpusGuy De Pauw
This document summarizes an Amharic news corpus tagging and verification project. It discusses the Amharic language background, the corpus creation from Ethiopian news sources, the manual tagging process, previous tagging experiments, and the current efforts to clean and re-tag the corpus which involves removing errors and inconsistencies from the original tagging. Baseline tagging performance on the corpus using different part-of-speech tagsets ranges from 58.3% to 90.8% correct depending on the tagset and machine learning approach used.
This document describes the process of constructing a corpus of spoken and written Santome, a Portuguese-related creole language spoken in Sao Tome and Principe. The corpus contains over 184,000 words from written sources like newspapers and books, as well as transcribed spoken recordings. Efforts were made to standardize the orthography and develop part-of-speech tags for annotation. Metadata is encoded for each text, and the corpus will be made available through a concordancing tool to allow searches while copyright permissions are obtained. The goal is for this and related Gulf of Guinea creole corpora to enable comparative linguistic research.
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Guy De Pauw
The document describes a system for automatically structuring and correcting Hungarian clinical records. The system separates clinical records into structured XML elements, tags metadata, and separates text from tables. It also corrects spelling errors using language models and weighted edit distances to generate and score candidate corrections. Evaluation showed the system could provide the right correction in the top 5 suggestions for 99% of errors. Areas for improvement include handling insertion/deletion errors and using larger language resources to better handle non-standard usage.
Compiling Apertium Dictionaries with HFSTGuy De Pauw
This document discusses compiling Apertium dictionaries with HFST to leverage generalised compilation formulas and get more applications from fewer language descriptions. Compiling Apertium dictionaries natively in HFST provides benefits like uniform compilation across tools, improved resulting automata using HFST algorithms, and integrated complex finite-state morphology features. Additional applications like spellcheckers can also be automatically generated from the dictionaries.
The Database of Modern Icelandic InflectionGuy De Pauw
The Database of Modern Icelandic Inflection (DMII) is a database that stores the full inflectional forms of Icelandic words. It contains over 5.8 million inflected forms. The DMII aims to represent Icelandic inflection accurately without overgeneration by including all inflected forms and variants. A rule-based system was not feasible due to insufficient data and the tendency for rules to overgenerate. The DMII supports language technology projects and is accessible online for the general public.
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingGuy De Pauw
This document discusses learning Amharic verb morphology using inductive logic programming (ILP). Amharic verbs are complex, conveying information about subject, object, tense, aspect, mood and more through affixation, reduplication and compounding. The authors apply ILP to learn morphological rules from a training set of 216 Amharic verbs. They achieve 86.9% accuracy on a test set of 1,784 verb forms. Key challenges include a lack of similar examples in the training data and learning inappropriate alternation rules. This work contributes to advancing the automatic learning of morphology for under-resourced languages like Amharic.
Issues in Designing a Corpus of Spoken IrishGuy De Pauw
The document discusses the design of a corpus of spoken Irish. It outlines the linguistic background of Irish and issues with existing spoken language resources. It then describes the pilot corpus, including data collection from podcasts and conversations. Transcription guidelines were adapted from CHAT and LDC conventions to balance accuracy with transcription speed. The goal is to create a large, balanced corpus to support research and language preservation.
How to build language technology resources for the next 100 yearsGuy De Pauw
The document discusses how to build sustainable language technology resources for lesser-resourced languages over the next 100 years. It outlines an vision of linguistic diversity and language survival. Key challenges include limited resources, small language communities, and technological limitations. Approaches proposed to work around these include minimizing redundant work, maximizing reuse of resources, building user and developer communities, and preparing resources to work with future technologies. Specific topics covered are types of language technology resources, issues around character encoding, text input methods, and future-proofing keyboard layouts and recognition technologies for many languages.
Towards Standardizing Evaluation Test Sets for Compound AnalysersGuy De Pauw
The document proposes standardizing test sets for evaluating compound analyzers by establishing parameters for a standard test set. It discusses evaluating compound analyzers on different sized test sets containing compound words, non-compound words, and error words. Experiments compare analyzer performance on test sets of varying sizes, finding sizes below 250 words are too small and sizes above 1250 words show no significant differences in results. The proposed standard test set consists of 500 examples each of compounds, non-compounds, and errors for a total of 1500 words.
The PALDO Concept - New Paradigms for African Language Resource DevelopmentGuy De Pauw
The document discusses new paradigms for developing African language resources through the Pan African Living Dictionary Online (PALDO) project. The paradigms include open community participation under scholarly supervision, paying for data development and making the data freely available, and linking monolingual dictionaries for multiple languages by concept to create rich resources for each language.
A System for the Recognition of Handwritten Yorùbá CharactersGuy De Pauw
1. The document presents a system for recognizing handwritten Yoruba characters using a Bayesian classifier and decision tree approach.
2. Key stages of the system include preprocessing, segmentation, feature extraction, Bayesian classification, decision tree processing, and result fusion.
3. The system was tested on independent and non-independent character samples, achieving recognition rates of 91.18% and 94.44% respectively.
IFE-MT: An English-to-Yorùbá Machine Translation SystemGuy De Pauw
The document describes the development of an English to Yoruba machine translation system called IFE-MT. It discusses the theoretical and practical issues in building the system, including the differences between the languages. It outlines the data collection and annotation process. It also describes the software tools and modules used to implement the system and demonstrates its capabilities. The system is being further developed by expanding the database and evaluating the translations.
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Guy De Pauw
This document discusses experiments conducted on developing an English to Amharic statistical machine translation system. It summarizes the collection and processing of two parallel corpora: (1) An ENA news corpus containing over 35,000 documents and 500,000 words aligned at the sentence level. (2) A parliamentary corpus containing over 1,200 documents and 500,000 words aligned at the sentence level. The document outlines challenges in aligning the corpora and reports on automatic alignment results. It concludes that further increasing corpus size and integrating linguistic knowledge can help improve translation quality.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Technological Tools for Dictionary and Corpora Building for Minority Language...Guy De Pauw
This project aims to build and maintain a lexicographical resource for French-based Creole languages through three main steps:
1) Compiling existing lexicographical resources like dictionaries into an electronic format
2) Creating corpora of Creole language texts from literary, educational and journalistic sources online
3) Maintaining the dictionary by analyzing the corpora to identify unknown words and improve the database through an iterative process.
The results will be a lexicographical database detailing variations in French-based Creoles and an annotated corpora for linguistic research.
Semi-automated extraction of morphological grammars for Nguni with special re...Guy De Pauw
This document summarizes research that semi-automatically extracted a morphological grammar for Southern Ndebele, an under-resourced language, from a general Nguni morphological analyzer bootstrapped from a Zulu analyzer. The Southern Ndebele analyzer produced surprisingly good results, showing significant similarities across Nguni languages that can accelerate documentation and resource development for these languages. The project followed best practices for encoding resources to ensure sustainability, access, and adaptability to future formats and platforms.
Resource-Light Bantu Part-of-Speech TaggingGuy De Pauw
This document proposes a bag-of-substrings approach to part-of-speech tagging for under-resourced Bantu languages using available digital dictionaries and word lists instead of large annotated corpora. Experimental results showed the technique established a low-resource, high accuracy method for bootstrapping POS tagging that compares favorably to state-of-the-art data-driven approaches. The method extracts substring features from words to train a maximum entropy classifier and bootstrap POS tagging for Bantu languages that lack extensive annotated resources.
Natural Language Processing for Amazigh LanguageGuy De Pauw
The document discusses natural language processing challenges for the Amazigh (Berber) language. It outlines Amazigh language characteristics like its writing system and complex phonology/morphology. It then describes the current state of Amazigh NLP technology, including Tifinaghe encoding, optical character recognition tools, basic processing tools like transliterators and stemmers, and language resources like corpora and dictionaries. Finally, it proposes future directions such as developing larger corpora, machine translation systems, and growing human resources for Amazigh language technology.
POS Annotated 50m Corpus of Tajik LanguageGuy De Pauw
This document presents a new 50+ million word corpus of the Tajik language, the largest available. It was created by crawling over a dozen Tajik news websites and other sources. The texts were joined and cleaned to remove duplicates. The corpus was then annotated with morphological analysis of Tajik using a new analyzer created by modifying an existing one to be faster and allow lemmatization. The analyzer recognizes over 87% of words and tags them with part of speech. This annotated corpus containing lemmas, tags and frequencies is available online through the Sketch Engine for researchers.
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Guy De Pauw
This document describes using a metagrammar called XMG to formally capture morphological generalizations of verbs in the Ikota language. It provides an XMG specification for Ikota verbal morphology that describes subject, tense, verb root, aspect, active voice, and proximity. This specification can automatically derive a lexicon of fully inflected verb forms. The methodology allows for quickly testing ideas and validating results against language data.
Tagging and Verifying an Amharic News CorpusGuy De Pauw
This document summarizes an Amharic news corpus tagging and verification project. It discusses the Amharic language background, the corpus creation from Ethiopian news sources, the manual tagging process, previous tagging experiments, and the current efforts to clean and re-tag the corpus which involves removing errors and inconsistencies from the original tagging. Baseline tagging performance on the corpus using different part-of-speech tagsets ranges from 58.3% to 90.8% correct depending on the tagset and machine learning approach used.
This document describes the process of constructing a corpus of spoken and written Santome, a Portuguese-related creole language spoken in Sao Tome and Principe. The corpus contains over 184,000 words from written sources like newspapers and books, as well as transcribed spoken recordings. Efforts were made to standardize the orthography and develop part-of-speech tags for annotation. Metadata is encoded for each text, and the corpus will be made available through a concordancing tool to allow searches while copyright permissions are obtained. The goal is for this and related Gulf of Guinea creole corpora to enable comparative linguistic research.
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Guy De Pauw
The document describes a system for automatically structuring and correcting Hungarian clinical records. The system separates clinical records into structured XML elements, tags metadata, and separates text from tables. It also corrects spelling errors using language models and weighted edit distances to generate and score candidate corrections. Evaluation showed the system could provide the right correction in the top 5 suggestions for 99% of errors. Areas for improvement include handling insertion/deletion errors and using larger language resources to better handle non-standard usage.
Compiling Apertium Dictionaries with HFSTGuy De Pauw
This document discusses compiling Apertium dictionaries with HFST to leverage generalised compilation formulas and get more applications from fewer language descriptions. Compiling Apertium dictionaries natively in HFST provides benefits like uniform compilation across tools, improved resulting automata using HFST algorithms, and integrated complex finite-state morphology features. Additional applications like spellcheckers can also be automatically generated from the dictionaries.
The Database of Modern Icelandic InflectionGuy De Pauw
The Database of Modern Icelandic Inflection (DMII) is a database that stores the full inflectional forms of Icelandic words. It contains over 5.8 million inflected forms. The DMII aims to represent Icelandic inflection accurately without overgeneration by including all inflected forms and variants. A rule-based system was not feasible due to insufficient data and the tendency for rules to overgenerate. The DMII supports language technology projects and is accessible online for the general public.
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingGuy De Pauw
This document discusses learning Amharic verb morphology using inductive logic programming (ILP). Amharic verbs are complex, conveying information about subject, object, tense, aspect, mood and more through affixation, reduplication and compounding. The authors apply ILP to learn morphological rules from a training set of 216 Amharic verbs. They achieve 86.9% accuracy on a test set of 1,784 verb forms. Key challenges include a lack of similar examples in the training data and learning inappropriate alternation rules. This work contributes to advancing the automatic learning of morphology for under-resourced languages like Amharic.
Issues in Designing a Corpus of Spoken IrishGuy De Pauw
The document discusses the design of a corpus of spoken Irish. It outlines the linguistic background of Irish and issues with existing spoken language resources. It then describes the pilot corpus, including data collection from podcasts and conversations. Transcription guidelines were adapted from CHAT and LDC conventions to balance accuracy with transcription speed. The goal is to create a large, balanced corpus to support research and language preservation.
How to build language technology resources for the next 100 yearsGuy De Pauw
The document discusses how to build sustainable language technology resources for lesser-resourced languages over the next 100 years. It outlines an vision of linguistic diversity and language survival. Key challenges include limited resources, small language communities, and technological limitations. Approaches proposed to work around these include minimizing redundant work, maximizing reuse of resources, building user and developer communities, and preparing resources to work with future technologies. Specific topics covered are types of language technology resources, issues around character encoding, text input methods, and future-proofing keyboard layouts and recognition technologies for many languages.
Towards Standardizing Evaluation Test Sets for Compound AnalysersGuy De Pauw
The document proposes standardizing test sets for evaluating compound analyzers by establishing parameters for a standard test set. It discusses evaluating compound analyzers on different sized test sets containing compound words, non-compound words, and error words. Experiments compare analyzer performance on test sets of varying sizes, finding sizes below 250 words are too small and sizes above 1250 words show no significant differences in results. The proposed standard test set consists of 500 examples each of compounds, non-compounds, and errors for a total of 1500 words.
The PALDO Concept - New Paradigms for African Language Resource DevelopmentGuy De Pauw
The document discusses new paradigms for developing African language resources through the Pan African Living Dictionary Online (PALDO) project. The paradigms include open community participation under scholarly supervision, paying for data development and making the data freely available, and linking monolingual dictionaries for multiple languages by concept to create rich resources for each language.
A System for the Recognition of Handwritten Yorùbá CharactersGuy De Pauw
1. The document presents a system for recognizing handwritten Yoruba characters using a Bayesian classifier and decision tree approach.
2. Key stages of the system include preprocessing, segmentation, feature extraction, Bayesian classification, decision tree processing, and result fusion.
3. The system was tested on independent and non-independent character samples, achieving recognition rates of 91.18% and 94.44% respectively.
IFE-MT: An English-to-Yorùbá Machine Translation SystemGuy De Pauw
The document describes the development of an English to Yoruba machine translation system called IFE-MT. It discusses the theoretical and practical issues in building the system, including the differences between the languages. It outlines the data collection and annotation process. It also describes the software tools and modules used to implement the system and demonstrates its capabilities. The system is being further developed by expanding the database and evaluating the translations.
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Guy De Pauw
This document discusses experiments conducted on developing an English to Amharic statistical machine translation system. It summarizes the collection and processing of two parallel corpora: (1) An ENA news corpus containing over 35,000 documents and 500,000 words aligned at the sentence level. (2) A parliamentary corpus containing over 1,200 documents and 500,000 words aligned at the sentence level. The document outlines challenges in aligning the corpora and reports on automatic alignment results. It concludes that further increasing corpus size and integrating linguistic knowledge can help improve translation quality.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
1. A Number to Yor`b´ Text Transcription System
u a
Akinad´ Olugb´nga Ol´w´l´
e e . a ae
and
e ı . eu ı `a ı
Od´job´ Od´t´nj´ Aj`d´
. . .
Computer Sci. & Engr. Department
Ob´f´mi Aw´l´w` University, Il´-If`
ae. oo o
. . e e .
AGIS 2011 Conference, Addis
Ababa, Ethiopia
01 December, 2011
2. Presentation Outline
Introduction
Numbers and Numerals
Normalisation of Numbers for TTS
Objectives
The Yor`b´ numeral system
u a
The Yor`b´ numeral generation
u a
methodology
Software Implementation
Evaluation
Overview
2 of 24
3. Numbers and Numerals
Number is an abstract concept that is represented by symbols
within a numeral system.
A Numeral system is concerned with the written representation of
spoken positive whole numbers.
The Hindu-Arabic numeral has being adopted as a universally
acceptable representation for numbers and mathematical
expressions.
The Hindu-Arabic numeral has ten symbols i.e. 0, 1, 2, 3, 4, 5, 6,
7, 8 and 9
3 of 24
4. Numbers and Numerals
Numbers can take different formats, Which include
` ` a
Cardinal Numbers: 3, 10, Eta, Ew´
. .
` . ta, ` . w`´
Ordinal Numbers: 3rd , 10th, Ike Ike aa
Monetary Value: $300, Naira M´w`´
e aa
.
Phone Numbers: 234802234****
Percentage: 10%, ` a m´`w´ n´ u og´run
Id´ ee a ın´ . o
.. .
The Yor`b´ has many dialects with varying linguistic variation
u a
(Fabunmi, 2010). But they can all communicate using the
Standard Yor`b´ (SY).
u a
So, our focus is to develop a system that can convert cardinal
numbers to their SY lexical forms.
4 of 24
5. Normalisation of Numbers for TTS
Text normalisation is one of the important steps in high level
speech synthesis.
Numbers, abbreviations, symbols etc are converted into their
lexical (textual) equivalence.
Normalisation of numbers is thus an important stage in this step.
5 of 24
6. Objectives
To specify and design a computational model to capture the
conversion of numbers to their Standard Yor`b´ (SY) lexical
u a
equivalence
6 of 24
7. Objectives
To specify and design a computational model to capture the
conversion of numbers to their Standard Yor`b´ (SY) lexical
u a
equivalence
To implement a software for the number to SY text transcription
model designed above
6 of 24
8. Objectives
To specify and design a computational model to capture the
conversion of numbers to their Standard Yor`b´ (SY) lexical
u a
equivalence
To implement a software for the number to SY text transcription
model designed above
To evaluate the system implemented
6 of 24
9. The Yor`b´ numeral system
u a
Yor`b´ numerals is Vigesimal ie based on 20. *(Ekundayo, 1975),
u a
(Zaslavsky, 2000), (Longe, 2009)
The Yoruba has 16 lexicons which serves as the basic building
blocks. (Ekunday0, 1975)
1=`kan, 2=`j` 3=`ta, 4=`rin, 5=`r´n, 6=`f`, 7=`je, 8=`jo,
o
. e ı, e
. e
. au ea
. e e.
.
9=esan, 10=ewa, 20=og´n, 30=ogbon, 200=igba, 300=odunrun,
u
400=ir´ o, 20,000=oke
ınw´
Special positional words exists for addition (l´), subtraction (d´
e ın)
and multiplication (`n`).
o a
.
7 of 24
10. What a Yor`b´ Speaker/Hearer knows about Yor`b´
u a u a
numeral
All numbers can be represented within the Yor`b´ numerals.
u a
There are subgroups within the Yor`b´ numerals based on their
u a
syntactic derivational rules.
Linguistic skills (contraction, vowel harmony, elision and euphonic
assimilation) are required for the representations of some numerals.
There exist multiple representations for numerals with low
functional load.
The largest single number that can be represented is 20,000 `k´
o e
. .
Higher numbers are derived from 20,000 `k´
o e
. .
Subtraction has a heavy functional load than addition
8 of 24
11. Subgroups of Yoruba Numeral
1 - 14: Behaves as decimal
15 - 199: Derived with 20 as the multiplicative base.
200 - 1999: Derived with 200 as the multiplicative base.
2000 - 19999: Derived with 2000 as the multiplicative base.
2000 and above: Derived with 20000 as the multiplicative base.
So from above, the multiplicative base of Yoruba numeral are:
20, 200, 2000, 20000
which can be represented as: 2(10)1 ,2(10)2 ,2(10)3 , 2(10)4
9 of 24
12. SY Numerals Derivation
Three of the four basic arithmetical operations (addition,
subtraction & multiplication) are employed for the derivation of an
infinite set of SY numerals from the sixteen vocabulary items.
(Ekundayo, 1977)(Zaslavsky, 1999)
A single number can be generatedfrom multiple subtractions.
Number Yor`b´u a Derivation
15 ``d´g´n
ee o u
.. 20-5
65 `´d´rin
aa o
. (4*20)-10-5
565 `´d´rin l´ l`´d´gb`ta
aa o
. e ee e e
.. . . (3*200)-100+(4*20)-
10-5
17,565 `´d´rin l´ l`´d´gb`ta (9*2000)-
aa o . e ee e e
.. . .
l´ l`´d´gb`as´n
e ee e a a
.. . 1,000+(3*200)-
100+(4*20)-10-5
10 of 24
13. Special Type of Subtraction
A special type of subtraction exist when you subtract 5, 10, 100
and 1000
This brings about the eedin phenomenum
eedin(A)
we will assume an implied subtraction from A of the following
kinds.
1. 5 iff A = 20 or 30
2. 10 iff A = 60,80,100,....,200
3. 100 iff A = 600,800,1000,....,2000
4. 1000 iff A = 4000,6000,8000,....,20000
11 of 24
14. Methodology
A review of the theory, process and computation underlying the SY
numerals was conducted by consulting appropriate literature.
A computational model was formulated using automata theory
based approach. The proposed model was captured with a set of
Push-Down Automata (PDA) using Java Formal Language and
Automata Package (JFLAP), a simulation tool for experimenting
with Formal Languages and Automata Theory.
The model was implemented using the Python programming
language.
Evaluation of the system was carried out using the Mean Opinion
Score (MOS) which is a Turing test for computational intelligence.
12 of 24
15. Software Design
Arabic
Start
Number
Is number in basic
numerals? Yes
No
Decomposition of Translation
Number to Basic Number to
Units Yoruba
Yoruba Morphological
Stop
Numeral Analysis
13 of 24
16. Push Down Automata
A non-deterministic PDA is defined as a sextuple
Q, Σ, Γ, q0 , F , δ, z0 ,
Q is a finite set of states
Σ is a finite set of input alphabet
Γ is a finite set of stack alphabet
z0 Γ is a the initial symbol on top of the stack
q0 Q is the initial state
F ⊆ Q is the set of final states
δ is the set of transitions is a finite subset of
Q × (Σ ∪ ε ∪ #) × (Γ) → Q × (Γ ∪ ε)
14 of 24
18. Processing of 67
Generate Magnitude
60 + 7#
Decompose to Vigesimal
4*20-10-3#
PDA processing of string
eta d´ aad´ og´n erin
. ın ın u .
Apply linguistic skills
eta d´ aad´ ogorin
. ın ın . .
eta d´ `´dota
. ın aa .
16 of 24
19. Example: Processing of 19669
Ekundayo(1977) presented 7 canonical representations for 19669, but
we were able to produce 3 more forms for 19669. Which are
eedegb``w´ ´ l´ ota-l´-legbeta ´ l´ mes´n
. . . aa a o e . e . . o e . a
eedegb``w´ ´ l´ orin-´-legbeta d´ mokanl`´
. . . aa a o e . e . . ın . aa
eedegb``w´ ´ l´ ´j` e-legbeta ´ l´ mokan-d´ . gbon
. . . aa a o e o ı-l´ . . o e . ın-lo .
eedegb``w´ ´ l´ egbeta ´ l´ mokan-d´
. . . aa a o e . . o e . ın-laadorin
.
eedegb``w´ ´ l´ eedegberin ´ d´ mokan-l´-logbon
.. . aa a o e . . . . o ın . e . .
oke ´ d´ ir´ o ´ l´ mokan-d´
. . o ın ınw´ o e . ın-laadorin
.
oke ´ d´ od´nr´n ´ d´ mokan-l´-logbon
. . o ın . u u o ın . e . .
oke ´ d´ oj` e-lod´nr´n ´ l´ mes´n
. . o ın ı-l´ . u u o e . a
´gb`j` ın-logor´n ´ l´ mokan-d´
e e ı-d´ . . u o e .
. ın-laadorin
.
eed´gbokan-d´ . gor´n ´ d´ mokan-l´-logbon
.. e .
. ın-lo . u o ın . e . .
17 of 24
22. Grammar for Yor`b´ Numeral
u a
This is a slight modification of grammar discussed by Hurford (2006)
to accomodate subtraction
NUMBER → PHRASE (NUMBER) #addition
PHRASE → DIGIT
REDUCE
PHRASE → PHRASE #Subtraction
SUB
PHRASE → M PHRASE #Multiplication
The rule follows that curly braces implies that either of the options
can be used and parentheses indicate that the content of the
parentheses can be left out.
20 of 24
23. Parse tree for 19669
19669 = 1000- 2000* 10 200* 3 10- 20* 4 - 1
21 of 24
24. Parse tree for 19669
19669 = 1000- 2000* 10 40 200* 3 1 - 30
22 of 24
25. Ongoing Work
Evaluation of the software is on-going
Effort is being made to develop the software for handheld devices
23 of 24
26. References
Ekundayo, S. A. (1977).
Vigesimal numeral derivational morphology: Yor`b´ grammatical
u a
competence epitomized.
Anthropological Linguistics, 19(9):436–453.
Fab`nmi, F. A. (2010).
u
Vigesimal numerals on if` (togo) and if` (nigeria) dialects of yor`b´.
e
. e
. u a
Linguistik online, 43:pages.
Longe, O. (2009).
A Yor`b´ Decimal Number System.
u a
Bookbuilders, Ibadan.
24 of 24