This document discusses efforts to digitize Philippine languages through the eWika project. The project aims to build a Philippine corpus containing text, speech, and sign language data across the country's 171 languages and 17 regions. It also outlines challenges in developing needed language resources and tools, such as lexicons, morphological analyzers, part-of-speech taggers, and translation rules, given the lack of existing digital language data for many Philippine languages. The goal is to connect the islands through language by creating a web-based application to facilitate collaborative development and sharing of corpus resources.
An engaging workshop intended to showcase community efforts to implement LGR Procedure for current and potential Generation Panel members. The workshop will also discuss how Generation Panels of related scripts should coordinate with each other going forward.
The document outlines the agenda for OneGlobal Inc.'s annual meeting. The morning session will include an introduction and year in review. Revisions to Spanish, Chinese, and Japanese language programs are discussed. The afternoon will cover title lists, competitive analyses, new electronic materials, and acquiring new titles. Product revisions, analyses, and improvements are also on the agenda.
Language policy and language planning in moroccoyounes Anas
This document provides an overview of the various languages spoken in Morocco, including their histories, statuses, and representations in media and education. The major languages discussed are Standard Arabic, Tamazight (Berber), Hassaniya Arabic, Spanish, Darija (Moroccan Arabic), and French. It outlines the policies of Arabization and increasing recognition of Berber, and notes the ongoing influences and roles of multiple languages in Moroccan society and identity.
Kiva offers a translation program where volunteers can translate entrepreneurs' profiles from other languages into English to be posted on Kiva's micro-lending website. Volunteers need high proficiency in a foreign language like Arabic, French, Portuguese, or Russian as well as native English writing skills. Translating for at least two hours per week for six months helps empower individuals to lend directly to entrepreneurs around the world.
Language planning, policy and implementation in south AfircaSelf employed
- South Africa has 11 official languages and recognizes several unofficial ones. The official languages are a result of politics to balance ethnic diversity.
- Nearly 25 languages are used daily by over 45 million people in South Africa, with Zulu, Xhosa, and Afrikaans being the most commonly spoken first languages.
- The language policy aims to promote multilingualism and the development of all languages, though implementing this policy fully faces challenges.
This document discusses different types of companies and the funding sources available to them. It categorizes companies as normal growth, high growth, extreme high growth, or social venture companies. Each category has different funding needs and options. Common funding sources include friends and family, accelerators, angels, angel groups, micro VCs, and VC funds. Later stages may involve strategic corporate investors. The document provides details on typical investment sizes, stages, and tradeoffs between angel and VC funding. It also lists several angel groups active in New England.
O documento descreve os dados que contribuem para os modelos da estrutura interna da Terra, incluindo pressão, temperatura e densidade. Explica como esses fatores variam com a profundidade e apresenta dois modelos principais: um físico baseado em propriedades e um geoquímico baseado na composição.
An engaging workshop intended to showcase community efforts to implement LGR Procedure for current and potential Generation Panel members. The workshop will also discuss how Generation Panels of related scripts should coordinate with each other going forward.
The document outlines the agenda for OneGlobal Inc.'s annual meeting. The morning session will include an introduction and year in review. Revisions to Spanish, Chinese, and Japanese language programs are discussed. The afternoon will cover title lists, competitive analyses, new electronic materials, and acquiring new titles. Product revisions, analyses, and improvements are also on the agenda.
Language policy and language planning in moroccoyounes Anas
This document provides an overview of the various languages spoken in Morocco, including their histories, statuses, and representations in media and education. The major languages discussed are Standard Arabic, Tamazight (Berber), Hassaniya Arabic, Spanish, Darija (Moroccan Arabic), and French. It outlines the policies of Arabization and increasing recognition of Berber, and notes the ongoing influences and roles of multiple languages in Moroccan society and identity.
Kiva offers a translation program where volunteers can translate entrepreneurs' profiles from other languages into English to be posted on Kiva's micro-lending website. Volunteers need high proficiency in a foreign language like Arabic, French, Portuguese, or Russian as well as native English writing skills. Translating for at least two hours per week for six months helps empower individuals to lend directly to entrepreneurs around the world.
Language planning, policy and implementation in south AfircaSelf employed
- South Africa has 11 official languages and recognizes several unofficial ones. The official languages are a result of politics to balance ethnic diversity.
- Nearly 25 languages are used daily by over 45 million people in South Africa, with Zulu, Xhosa, and Afrikaans being the most commonly spoken first languages.
- The language policy aims to promote multilingualism and the development of all languages, though implementing this policy fully faces challenges.
This document discusses different types of companies and the funding sources available to them. It categorizes companies as normal growth, high growth, extreme high growth, or social venture companies. Each category has different funding needs and options. Common funding sources include friends and family, accelerators, angels, angel groups, micro VCs, and VC funds. Later stages may involve strategic corporate investors. The document provides details on typical investment sizes, stages, and tradeoffs between angel and VC funding. It also lists several angel groups active in New England.
O documento descreve os dados que contribuem para os modelos da estrutura interna da Terra, incluindo pressão, temperatura e densidade. Explica como esses fatores variam com a profundidade e apresenta dois modelos principais: um físico baseado em propriedades e um geoquímico baseado na composição.
The document provides information about the IIFA Awards ceremonies that were held from 2000-2006. Some key details include:
- The inaugural IIFA ceremony in 2000 was held at the Millennium Dome in London. Hum Dil De Chuke Sanam won several major awards.
- Ceremonies were later held in locations like South Africa, Malaysia, and the Netherlands. Winners each year included films like Lagaan, Devdas, Kal Ho Naa Ho, and Black.
- Technical categories recognized best cinematography, music direction, choreography, and other film crafts each year.
- Special awards honored lifetime achievements and contributions to the Indian film industry.
The document summarizes several medical negligence cases handled by Fieldfisher solicitors. It includes:
- A £24 million settlement for a girl whose brain was accidentally injected with glue instead of dye during a procedure at Great Ormond Street Hospital.
- A £1.5 million award for a woman who suffered brain damage after Ealing Hospital A&E failed to properly diagnose and treat her aneurysm, leading to a stroke.
- A £50,000 settlement for a woman whose appendicitis was missed by University College London Hospital, resulting in her appendix perforating and requiring multiple hospital visits.
- Several other medical negligence cases involving delayed diagnoses, surgical errors, and nursing negligence resulting in injuries.
Bhopal is India's fastest growing capital city of state of Madhya Pradesh, having a unique combination of high population & scenic beauty. Bhopal is an Education Hub with more than 100 Engineering Colleges, 5 plus Medical College, 10 Universities. It has unique sport facilities and a big SAI Center, along with unique positioning of it as Indian Cultural Hub. The city also adjoin 4 big Industrial areas and have developed facilities for IT Park, Electronics park.
El documento proporciona información sobre la drogadicción. Define qué es una droga y clasifica las drogas en estimulantes, depresores, tranquilizantes y otras. Describe las características y efectos negativos de la drogadicción en la salud, la conducta y la capacidad reproductiva. También incluye estadísticas sobre el consumo de drogas como alcohol, tabaco, marihuana y cocaína entre los jóvenes.
a short presentation about the process of oil extraction, the prime oil extraction regions, natural hazard and a few case studies about various oil spills which have occurred in the past and caused a lot of harm to the marine ecosystem, along with wastage of precious oil.
Este documento resume las principales características de las neoplasias linfoides. Describe las diferencias entre leucemias y linfomas, y clasifica las neoplasias linfoides en cinco categorías según la célula de origen. También explica los patrones de diseminación y alojamiento de las células linfoides neoplásicas, así como los factores moleculares involucrados.
This document provides details about typical cross-sections of roads and highways, including pavement surfaces and drainage elements. It discusses the importance of friction between wheels and pavement, pavement smoothness, light reflection characteristics, and drainage. It also describes typical layers in flexible pavements like seal coats, surface courses, binder courses, and subgrades. Finally, it outlines other cross-section elements such as shoulders, medians, footpaths, barriers, and bus bays.
This document analyzes three segments of County Route 553 in Pittsgrove, New Jersey to determine their level of service and propose upgrades to achieve an LOS of A. Initial analysis found LOS B for MP 27.98 and LOS C for MP 29.89 and 31.53. Upgrades proposed include adding a passing lane at MP 27.98 and widening shoulders and increasing speed limits at MP 29.89 and 31.53. A benefit-cost analysis found positive ratios for all segments, supporting the economic viability of the proposed upgrades.
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdfDr.Badriya Al Mamari
Applied linguistics is a branch of linguistics that applies linguistic theories and methods to solve language-related problems. It originated in the 1950s and draws from various fields like sociology, psychology, and computing. Applied linguistics covers areas like second language teaching, language disorders, and the use of technology for language learning. It aims to improve language efficiency and address issues like how best to teach languages based on social and cultural factors. Corpora, or large electronic collections of authentic texts, are an important tool used in applied linguistics research to study language quantitatively and qualitatively.
Multilingualism and language choice in sub saharan africaManuela Noske
This document discusses language use in Sub-Saharan Africa. It notes that the region is highly multilingual with over a thousand languages spoken. While colonial powers established some European languages, indigenous African languages are increasingly taking on official status. Most Africans are multilingual, switching between local, national, and international languages depending on context. Some languages like Swahili have emerged as lingua francas across multiple countries. Effective localization requires understanding these complex language environments and dynamics.
The document discusses the implementation of the three-language formula in Tamil Nadu, India and its effects on the Tamil language. Specifically, it notes that due to the three language teaching system, the Tamil language has lost some of its originality. Many students are moving to English medium schools and as a result, younger generations do not properly know Tamil - they can speak Tamil slang but do not understand its full grammatical structure. The document calls for reforms to strengthen the teaching of Tamil as the regional mother tongue language.
A Two-Speed Language Evolution - Protolang Torun - September 2011Olaf Witkowski
1) The document discusses a two-speed model of language evolution based on r/K selection strategies from biology. r-strategist words spread widely and are useful in unpredictable environments, while K-strategist words are specialized for stable contexts.
2) It proposes the concept of a linguistic carrying capacity, determined by limits of individual memory and the transmission channel. Above this capacity, a K-strategy becomes more efficient than an r-strategy.
3) Agent-based simulations are suggested to model language transmission between generations of learners and observe the emergence of r/K tendencies, helping to validate hypotheses about factors influencing carrying capacity.
The document describes a project to develop dynamic syllabi for teaching historical languages through eLearning. It discusses the need to support localization for learners of different languages and the challenges of internationalization. It describes the user experience design for the eLearning platform, including how to introduce users to the system, provide goals and feedback, and visualize learning progress. It also discusses using games to cover different tasks involved in digital editing projects, like transcription, translation, and annotation. Finally, it explains how a graph database is used to store and query the interrelated linguistic data from digital editing projects in a scalable way that is optimized for performance.
Laura Welcher - The Rosetta Project and The Language Commonslongnow
This document provides information about the Rosetta Project and its goal of creating a 10,000 year library of all human languages. It discusses the motivation to preserve languages and cultural knowledge for future generations. Specific initiatives described include creating Rosetta Disks with parallel text in multiple languages, building an open digital collection of language resources, and developing the proposed Language Commons Encyclopedia of Human Language to aggregate information on all 6,900 human languages. The role of the Long Now Foundation in supporting these initiatives is also outlined.
This document discusses improving access to historic public broadcasting content through speech-to-text transcription, crowdsourcing, and machine learning. It describes the American Archive of Public Broadcasting's (AAPB) efforts to transcribe over 72,000 digitized television and radio programs with incomplete metadata using automated speech recognition supplemented by a crowdsourcing game to correct transcripts. The corrected transcripts would then be used to improve search and access to programs in the AAPB collection. It also discusses using audio waveform analysis and the HiPSTAS project to enable new types of scholarly research on spoken word collections.
Role of Language Engineering to Preserve Endangered Language Dr. Amit Kumar Jha
Language engineering can help preserve endangered languages by developing tools like language translators, speech generation systems, and language teaching systems. If these tools are applied to endangered languages, they may help prevent the languages from going extinct by increasing the number of people who understand and use them. Key applications of language engineering that could support endangered languages include speech synthesis, machine translation, speech recognition, text-to-speech conversion, and language teaching systems. Developing digital language documentation and creating transcription tools can also help endangered languages survive by making more materials available to study and learn them.
LIWC-ing at Texts for Insights from Linguistic PatternsShalin Hai-Jew
Since the mid-1990s, researchers have been using the Linguistic Inquiry and Word Count (LIWC pronounced “luke”) software tool to explore various text corpora for hidden insights from linguistic patterns. The LIWC tool has evolved over the years. Simultaneously, research using computational text analysis has evolved and shed light on areas of deception, threat assessment, personality, predictive analytics, and other areas. This presentation will highlight some of the applications of LIWC in the research literature and showcase the tool on some original text sets.
The document provides information about the IIFA Awards ceremonies that were held from 2000-2006. Some key details include:
- The inaugural IIFA ceremony in 2000 was held at the Millennium Dome in London. Hum Dil De Chuke Sanam won several major awards.
- Ceremonies were later held in locations like South Africa, Malaysia, and the Netherlands. Winners each year included films like Lagaan, Devdas, Kal Ho Naa Ho, and Black.
- Technical categories recognized best cinematography, music direction, choreography, and other film crafts each year.
- Special awards honored lifetime achievements and contributions to the Indian film industry.
The document summarizes several medical negligence cases handled by Fieldfisher solicitors. It includes:
- A £24 million settlement for a girl whose brain was accidentally injected with glue instead of dye during a procedure at Great Ormond Street Hospital.
- A £1.5 million award for a woman who suffered brain damage after Ealing Hospital A&E failed to properly diagnose and treat her aneurysm, leading to a stroke.
- A £50,000 settlement for a woman whose appendicitis was missed by University College London Hospital, resulting in her appendix perforating and requiring multiple hospital visits.
- Several other medical negligence cases involving delayed diagnoses, surgical errors, and nursing negligence resulting in injuries.
Bhopal is India's fastest growing capital city of state of Madhya Pradesh, having a unique combination of high population & scenic beauty. Bhopal is an Education Hub with more than 100 Engineering Colleges, 5 plus Medical College, 10 Universities. It has unique sport facilities and a big SAI Center, along with unique positioning of it as Indian Cultural Hub. The city also adjoin 4 big Industrial areas and have developed facilities for IT Park, Electronics park.
El documento proporciona información sobre la drogadicción. Define qué es una droga y clasifica las drogas en estimulantes, depresores, tranquilizantes y otras. Describe las características y efectos negativos de la drogadicción en la salud, la conducta y la capacidad reproductiva. También incluye estadísticas sobre el consumo de drogas como alcohol, tabaco, marihuana y cocaína entre los jóvenes.
a short presentation about the process of oil extraction, the prime oil extraction regions, natural hazard and a few case studies about various oil spills which have occurred in the past and caused a lot of harm to the marine ecosystem, along with wastage of precious oil.
Este documento resume las principales características de las neoplasias linfoides. Describe las diferencias entre leucemias y linfomas, y clasifica las neoplasias linfoides en cinco categorías según la célula de origen. También explica los patrones de diseminación y alojamiento de las células linfoides neoplásicas, así como los factores moleculares involucrados.
This document provides details about typical cross-sections of roads and highways, including pavement surfaces and drainage elements. It discusses the importance of friction between wheels and pavement, pavement smoothness, light reflection characteristics, and drainage. It also describes typical layers in flexible pavements like seal coats, surface courses, binder courses, and subgrades. Finally, it outlines other cross-section elements such as shoulders, medians, footpaths, barriers, and bus bays.
This document analyzes three segments of County Route 553 in Pittsgrove, New Jersey to determine their level of service and propose upgrades to achieve an LOS of A. Initial analysis found LOS B for MP 27.98 and LOS C for MP 29.89 and 31.53. Upgrades proposed include adding a passing lane at MP 27.98 and widening shoulders and increasing speed limits at MP 29.89 and 31.53. A benefit-cost analysis found positive ratios for all segments, supporting the economic viability of the proposed upgrades.
Applied Linguistics session 111 0_07_12_2021 Applied linguistics challenges.pdfDr.Badriya Al Mamari
Applied linguistics is a branch of linguistics that applies linguistic theories and methods to solve language-related problems. It originated in the 1950s and draws from various fields like sociology, psychology, and computing. Applied linguistics covers areas like second language teaching, language disorders, and the use of technology for language learning. It aims to improve language efficiency and address issues like how best to teach languages based on social and cultural factors. Corpora, or large electronic collections of authentic texts, are an important tool used in applied linguistics research to study language quantitatively and qualitatively.
Multilingualism and language choice in sub saharan africaManuela Noske
This document discusses language use in Sub-Saharan Africa. It notes that the region is highly multilingual with over a thousand languages spoken. While colonial powers established some European languages, indigenous African languages are increasingly taking on official status. Most Africans are multilingual, switching between local, national, and international languages depending on context. Some languages like Swahili have emerged as lingua francas across multiple countries. Effective localization requires understanding these complex language environments and dynamics.
The document discusses the implementation of the three-language formula in Tamil Nadu, India and its effects on the Tamil language. Specifically, it notes that due to the three language teaching system, the Tamil language has lost some of its originality. Many students are moving to English medium schools and as a result, younger generations do not properly know Tamil - they can speak Tamil slang but do not understand its full grammatical structure. The document calls for reforms to strengthen the teaching of Tamil as the regional mother tongue language.
A Two-Speed Language Evolution - Protolang Torun - September 2011Olaf Witkowski
1) The document discusses a two-speed model of language evolution based on r/K selection strategies from biology. r-strategist words spread widely and are useful in unpredictable environments, while K-strategist words are specialized for stable contexts.
2) It proposes the concept of a linguistic carrying capacity, determined by limits of individual memory and the transmission channel. Above this capacity, a K-strategy becomes more efficient than an r-strategy.
3) Agent-based simulations are suggested to model language transmission between generations of learners and observe the emergence of r/K tendencies, helping to validate hypotheses about factors influencing carrying capacity.
The document describes a project to develop dynamic syllabi for teaching historical languages through eLearning. It discusses the need to support localization for learners of different languages and the challenges of internationalization. It describes the user experience design for the eLearning platform, including how to introduce users to the system, provide goals and feedback, and visualize learning progress. It also discusses using games to cover different tasks involved in digital editing projects, like transcription, translation, and annotation. Finally, it explains how a graph database is used to store and query the interrelated linguistic data from digital editing projects in a scalable way that is optimized for performance.
Laura Welcher - The Rosetta Project and The Language Commonslongnow
This document provides information about the Rosetta Project and its goal of creating a 10,000 year library of all human languages. It discusses the motivation to preserve languages and cultural knowledge for future generations. Specific initiatives described include creating Rosetta Disks with parallel text in multiple languages, building an open digital collection of language resources, and developing the proposed Language Commons Encyclopedia of Human Language to aggregate information on all 6,900 human languages. The role of the Long Now Foundation in supporting these initiatives is also outlined.
This document discusses improving access to historic public broadcasting content through speech-to-text transcription, crowdsourcing, and machine learning. It describes the American Archive of Public Broadcasting's (AAPB) efforts to transcribe over 72,000 digitized television and radio programs with incomplete metadata using automated speech recognition supplemented by a crowdsourcing game to correct transcripts. The corrected transcripts would then be used to improve search and access to programs in the AAPB collection. It also discusses using audio waveform analysis and the HiPSTAS project to enable new types of scholarly research on spoken word collections.
Role of Language Engineering to Preserve Endangered Language Dr. Amit Kumar Jha
Language engineering can help preserve endangered languages by developing tools like language translators, speech generation systems, and language teaching systems. If these tools are applied to endangered languages, they may help prevent the languages from going extinct by increasing the number of people who understand and use them. Key applications of language engineering that could support endangered languages include speech synthesis, machine translation, speech recognition, text-to-speech conversion, and language teaching systems. Developing digital language documentation and creating transcription tools can also help endangered languages survive by making more materials available to study and learn them.
LIWC-ing at Texts for Insights from Linguistic PatternsShalin Hai-Jew
Since the mid-1990s, researchers have been using the Linguistic Inquiry and Word Count (LIWC pronounced “luke”) software tool to explore various text corpora for hidden insights from linguistic patterns. The LIWC tool has evolved over the years. Simultaneously, research using computational text analysis has evolved and shed light on areas of deception, threat assessment, personality, predictive analytics, and other areas. This presentation will highlight some of the applications of LIWC in the research literature and showcase the tool on some original text sets.
2014 EVA/Minerva Jerusalem International Conference on Digitisation of Cultural Heritage
http://2014.minervaisrael.org.il
http://www.digital-heritage.org.il
2014 EVA/Minerva Jerusalem International Conference on Digitisation of Cultural Heritage
http://2014.minervaisrael.org.il
http://www.digital-heritage.org.il
Localization - It's Big in Japan 20070408Jon Ashley
Case study from a 2006 Localization project that successfully brought an internal corporate HR site from US to Japan, and the UX process followed to do so.
Understanding Community Needs: Scalable SMS Processing for UNICEF Nigeria and...Idibon1
This document summarizes a project involving UNICEF, Idibon, and NLP for minority languages. It discusses:
- UNICEF's U-Report program which collects SMS messages in countries like Nigeria and Burundi.
- Idibon's collaboration with UNICEF to automatically label these messages by urgency, category, and language in real time.
- The challenges of creating NLP tools for minority languages which often lack large text corpora for training models.
Presentation by project directors Barbara E. Bullock and Almeida Jacqueline Toribio at the 24th Conference on Spanish in the United States, March 2013 in McAllen, Texas.
Access, Skills and Development in Africa : Local Knowledge in Local LanguagesMcNulty Consulting
This document discusses the Ulwazi Programme, a case study of a South African initiative to improve access to information, skills, and development in rural communities. The key points are:
1) The Ulwazi Programme is a wiki-based initiative run by eThekwini Municipality libraries to collect and share local knowledge and histories in local languages.
2) It aims to preserve indigenous knowledge, build digital skills, develop a sustainable digital library of local content, and promote social inclusion.
3) The program model involves collaboration between communities, libraries, and open-source technology to collect and validate community knowledge and make it accessible online.
4) It has experienced significant growth in visits, content,
Cultural Identities in Wikipedia (Wikimania 2016)Marc Miquel
Unlike in most social network platforms, in Wikipedia editors are not encouraged to disclose personal traits, hobbies or affiliations. In fact, I think the identity issue has not been discussed enough. Since the project is dedicated to promote a common good, there is no content ownership, and the personal aspects become uncomfortable, or partly taboo. However, I defend that identity matters, in terms of building a Wikipedian reputation, and that editors' identities are tightly related to the content. As a Wikipedian, would you contribute equally if you couldn't choose the topics?
In this presentation I want to address the creation process and composition of Wikipedia language editions as a matter of identity. Our research on the issue has shown us that an identity-based motivation allows editors to conciliate the Wikipedian identity in the community along with their other identities. Therefore, in order to act congruently with each of such identities, they contribute with content related to them. To assess the influence of this motivation type, we developed a method and identified articles related to each Wikipedia language edition's Cultural Identities. The results on 40 Wikipedias show that this kind of content represents almost a quarter of each language edition. We analyze the content in terms of topical coverage and find that different specific topics emerge as important for each of them, although the most important topics are generally Geography, People and Culture. Inspecting how articles related to each language edition's cultural identities are exported to other languages, we show relationships between Wikipedias.
The selection of articles reflecting each Wikipedia language based cultural identities is a rich source for research, but can be also a useful base to establish an intercultural exchange between Wikipedia language editions. We propose the diversity of content across languages to be seen as an asset, and the spread of content specific to a language edition to be facilitated by automatic tools. The main point is to recognize the power of identity as a motivator for action and as a driver for change. Finally, we present a project called Wikiidentities in which we will disseminate the results of the research, make the datasets available, and provide some ideas and debate on how identities can be key to bridge the culture gap in any Wikipedia.
This document discusses using SignWriting (SW) to represent lexical entries in a dictionary of Peruvian Sign Language (LSP). It notes that there is almost no prior research or published dictionaries on LSP. The planned LSP dictionary (DALSP) will include a video, gloss, translation, description, and SW transcription for each sign. Using SW will allow for deeper analysis of LSP's phonological features and help visualize that sign languages can be written phonologically like spoken languages. However, challenges include students needing training in SW conventions and the system potentially engaging the Peruvian Deaf community's interest in a writing system for LSP.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxEduSkills OECD
Iván Bornacelly, Policy Analyst at the OECD Centre for Skills, OECD, presents at the webinar 'Tackling job market gaps with a skills-first approach' on 12 June 2024
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
2. Machine Translation
• Automate translation
• A study under Natural
Language Processing
MT System
Sentence in
SOURCE LANGUAGE
Sentence in
TARGET LANGUAGE
3. ENG-FIL MT System Project
• 3-year project
• started last year
• funded by DOST-PCASTRD
• composition:
– 6 faculty members of College of
Computer Studies
– 15 computer science majors
– assisted by the Filipino Dept and
Dept in English & Applied
Linguistics of DLSU-M
4. Agenda
• Architecture of the MT System
• Linguistic resources
• Demo of the Translation Engine
• Results for English to Japanese translation
5. Architectural Design of the Program
Language Resources:
• Lexicon (electronic dictionary),
• Morphological Analyzer & Generator
• Part-of-Speech tagger
• Grammar,
• Corpus (Tagged)
MT: Example-based
MT: Rule-based
User Interface
Output Modeller
Source Text Target Text
Translator Engine
6. Challenge!
• Language resources
– Quality of translation is dependent on it.
– Built from almost non-existent digital forms
– manual vs. automatic construction
7. Lexicon Builder
• Used IsaWika! database as initial lexicon
• Created a lexicon extraction program to
automatically determine candidate translation
pairs from corpora
• Currently contains about 23,000 entries
• Co-occurring words are likely translation
• Challenge: Lexical resources
– parallel corpora
– part-of-speech tagger
Database
8. Morphological Analyzer
• Initially collected morphological rules from
grammar books
• Developed an example-based morphological
phenomenon learner
– learn from <inflected word, root-word>
– example: <kumakain, kain>
• Challenge : Lexical resources
– lexicon
– part-of-speech tagger
– morphological rules Generator
9. Part-Of-Speech Tagger
• automatic association of parts-of-speech to
words in a document
• existing Filipino tagger achieves < 80%
accuracy
• Challenge : Lexical resource
– tagged parallel corpora
– lexicon
– morphological analyzer
– grammar
10. Grammar
• Derived manually
• Challenge: Free word order in sentence
formation.
The man bought an umbrella from the store.
• Bumili ang lalaki ng payong sa tindahan.
• Bumili sa tindahan ng payong ang lalaki.
• Ang lalaki ay bumili ng payong sa tindahan.
11. Corpora
• used by the lexicon extractor and part-of-
speech tagger, example-based MT
• came from translation works of DLSU English
majors, verified by linguists
• consists of 207,000 words, 5000 of which are
tagged
12. Translation Rules
• currently learned from the corpora
• disadvantages
– garbage-in-garbage-out
– comprehensiveness
• need for linguistic-verified rules
13.
14. Bringing it home …
• 171 Philippine Languages (SIL)
• No Philippine Corpora
• Unfortunately, today, the Philippines has one of
the highest rates of dying languages (Solfed
Foundation Inc)
• “Without our language, we have no culture, we
have no identity, we are nothing.” (Thorrson)
15. eWika: Digitalization of
Philippine Languages
• Build the Philippine Corpus
• Build software tools to study or
use the corpus
–Across Languages
– Across Regions
– Across Forms and Genres
– Across Land and Sea
16. Across Languages
• 171 Philippine Languages (SIL List)
• Summer Institute of Linguistics
http://www.ethnologue.com/
• Major languages
• Near extinction languages
• How about the languages in-between?
17. Filipino Sign Language
• The History of Sign Language in the
Philippines: Piecing Together the Puzzle (Abat
& Martinez, 9th
Phil Linguistics Congress, 2006)
• Deaf individuals: handicapped vs members of a
linguistic minority
• Sign languages as true languages
19. Across Regions
• e-Wika: Connecting the Philippine Islands through Language
• 17 Regions: The regions are: Ilocos Region (Region I),
Cagayan Valley (Region II), Central Luzon (Region III),
CALABARZON (Region IV-A) , MIMAROPA (Region IV-B) ,
Bicol Region (Region V), Western Visayas (Region VI), Central
Visayas (Region VII), Eastern Visayas (Region VIII),
Zamboanga Peninsula (Region IX), Northern Mindanao (Region
X), Davao Region (Region XI), SOCCSKSARGEN (Region XII),
Caraga (Region XIII), Autonomous Region in Muslim Mindanao
(ARMM), Cordillera Administrative Region (CAR), National
Capital Region (NCR) (Metro Manila)
20.
21. Across Boundaries
• Across Time: historical, contemporary
• Across Languages
• Across Regions
• Across Forms and Genres
• Across Land and Sea
22. Across Forms and Genres
• In various forms:
• Text
• Speech: speech to text system (ongoing
project)
• Video: Filipino sign language
• In various Genres: categories of entries in the
corpus
23. Across Boundaries
• Across Time: historical, contemporary
• Across Languages
• Across Regions
• Across Forms and Genres
• Across Land and Sea
24. Across Land and Sea
• Web-based application: c/o Solomon See
(upload, download, tools)
• Contributors (Main players)
• Verify-ers
• Facilitators
• Server: DLSU-M commits to host the server for
the next three years.
• Terms of Use: Research purposes.
25. • The dream of building Philippine language
resources and tools
• Many many many major hurdles to overcome
• Language Resources, Tools, & Peopleware:
Needed
Editor's Notes
Good morning. I’m happy to be here today. I am representing our group from DLSU-Manila. I belong to a team of computer scientists developing a hybrid English-to-Filipino, bidirectional machine translation system. I would say that we are specialists in different types languages. You are specialists in natural languages, while ours is in artificial programming languages.
For discussion purposes, let me define what a machine translation system is. It is a computer program that aims to automate part or ultimately all of the processes of translating documents written in one natural language to another. This study on computational linguistics falls under the computer science area of natural language processing, which is under the area of artificial intelligence.
We are developing an English-Filipino, Filipino-English machine translation system. This is a 3-year project funded by the Department of Science and Technology’s Philippine Council on Advanced Science and Technology Research and Development or DOST-PCASTRD. We recently completed our first year. Our group is composed of 6 faculty members from the College of Computer Studies of De La Salle-Manila. We have about 10 research assistants consisting of undergraduate and graduate student whose thesis is related to this project. On the linguistic side aspect, we consult are colleagues from the Filipino Department and the Department of English and Applied Linguistics.
My presentation today will focus on the following. I will briefly describe the architecture of the machine translation system. Followed by the challenges we are facing regarding the linguistic resources needed by the machine translator. Then I’ll show you the actual machine translation program we developed. Finally, the results we got when we applied this system to the Japanese Language.
---------Switch to the program-----------
This is how our system looks like. We place here the sentences to be translated. It could translate from English to Filipino and vice-versa. When we click this button, it will perform the translation then show the results here. Let’s consider the sentence – “The cat is happy.”
This is the architectural design of the system. The input goes through the user interface which talks to the translator engine. The engine is supposed to use 2 approaches to translate, namely the example-based approach and rule-based approach. The rule-based translation engine uses a database of rules for language representation and translation created by linguists and other experts. On the other hand, the example-based translation engine automatically learns such information from sample text translations. Our program currently only uses the example-based approach, for reasons I will explain later. To be able to translate, the engine needs certain resources. First would be the lexicon or the English-Filipino bilingual electronic dictionary to translate the words in the sentence. It would need a morphological analyzer and generator to conjugate words when needed. It would also needed a Part-Of-Speech tagger to determine how a word is being used within the sentence. Next, it needs to know the grammar of the languages to understand and form valid sentences. Since the example-based engine learns from sample translations, it would need a corpus of correctly translated sentences.
The accuracy of the translation of the system is largely dependent on the comprehensiveness and correctness of the language resources for Filipino and English-Filipino translation. Language resources such as the grammar, lexicon, morphological information, and the corpora are literally built from almost non-existent digital forms. Linguistics information on Philippine languages are available, but as of yet, the focus has been on theoretical linguistics and little is done about the computational aspects of these languages. We address the manual construction of these language resources, and also automatic extraction. We report here the building of these various language resources, the problems associated with these, and the solutions provided.
The lexicon (or dictionary) is a collection of source words with the corresponding translation in the target language, and their features (such as part-of-speech tag, sample sentences, and semantic information). Since languages are in the process of evolution, it is imperative that the project provides some way to be able to determine and capture new words and probably new meanings of words in the languages considered in this study. New terms can be added into the base lexicon through a computer program that automatically extracts new dictionary entries from documents on English and Filipino. To be able to do it’s job, the lexicon extractor needs a parallel corpora of ENG-FIL translated documents and a part-of-speech tagger that will place the part-of-speech tags of each word. The lexicon extractor currently has an accuracy rate of about 57%.
Since the dictionary would not contain all the words in the English and Filipino language, there is a need to supplement the dictionary with a morphological analyzer that will determine the root word of a word not found in the dictionary; as well as a morphological generator to conjugate words when needed. With this subsystem, it is no longer no longer a requirement to have separate entries for the different forms of a word. We initially collected morphological rules from grammar books. Realizing that not all the rules are there for the Filipino language, we decided to develop a morphological phenomenon learner. Based on sample &lt;inflected word, and root word&gt; pairs, the learner will learn the morphological rules of a language. We currently have a morphological generator that can generate the different forms of a verb. Unfortunately, it still cannot determine the specific form of the word needed in a translation.
This subsystem is supposed to automatically aspociate the part-of-speech of a word, to determine how the word is being used in the sentence. The currently Filipino part-of-speech tagger achieve less than 80% accuracy. This is still unacceptable when used by the other lexical resources for the errors will propagate. The part-of-speech tagger needs a tagged parallel corpora, the lexicon, the MA and the grammar of the languages. At this point, I would like to point out that the lexical resources are completely dependent on each other. To build the part-of-speech tagger, we need an MA. The MA on the other hand needs the tagger. The part-of-speech tagger needs a comprehensive lexicon, to build a comprehensive lexicon, we need a good part-of-speech tagger. Thus our challenge becomes the chicken-and-egg problem. Where do we start?
The Filipino grammar is manually being derived with the help of linguists. In the absence of a complete grammar, we are currently relying on the part-of-speech tagger. One of the major challenges of the Filipino language is its free word order in sentence formation. Due to its free word order nature, one sentence in English can be translated to various sentences in Filipino. For instance, the English sentence “The man bought an umbrella from the store” can be translated into many different Filipino sentences while maintaining the semantics of the original English sentence, some of which Bumili ang lalaki ng payong sa tindahan, Bumili sa tindahan ng payong ang lalaki.
Ang lalaki ay bumili ng payong sa tindahan. Because of this free-word order phenomenon in Filipino sentences, there are problems in capturing the rules for the Filipino language to be able to represent all the possible combinations that the language provides. This means that the number of production rules for the Filipino grammar representation to a great extent is more than its English counterpart.
A corpora of English and Filipino documents is needed by the lexicon extractor, the part-of-speech tagger and the example-based machine translator. A mono-lingual Filipino corpus of about 4,000 words with specific and linguist-verified POS tags was gathered from various domains such as children’s books, the Bible, and news articles. The We currently have a bilingual parallel English-Filipino corpora consisting of 207,000 words from translation works of students and checked by their translation teachers, books and online articles; where only 5,000 words in the Filipino documents are tagged and verified by linguists. Unfortunately, our project encountered problems with the inconsistencies of tags associated with words in the two languages using our automatic tools, so verification has to be tediously done by the human evaluators. This particular problem has to be addressed and assessed in more detail.
To address the need of building a reliable Filipino corpora and yet minimizing the need for manual encoding, automatic methods for corpora creation was explored. We developed AutoCor, which performs automatic acquisition and classification of corpora of documents in closely-related languages, specifically, three Philippine languages: Bicolano, Cebuano and Tagalog.
Cue to the absence of translation rules, our system currently automatically learns how translation is done through examples found in a corpus of translated documents. The system can incrementally learn when new translated documents are added into the knowledge-base, thus, any changes to the language can also be accommodated through the updates on the example translations. This means it can handle translation of documents from various domains. The principle of garbage-in-garbage-out applies here; if the example translations are faulty, the learned rules will also be faulty. That is why, although human linguists do not have to specify and come up with the translation rules, the linguist will have to first verify the translated documents and consequently, the learned rules, for accuracy. Unfortunately, the rules that were learned by our systems that we developed are still not readable and understandable to expert linguists and have to be translated into a form that would be comprehensible to them.
It is not only the quality of the collection of translations that affects the overall performance of the system, but also the quantity. The collection of translations has to be comprehensive so that the translation system produced will be able to translate as much sentences as possible. The challenge here is coming up with the quantity of examples that is sufficient for accurate translation of documents.
With more data, a new problem arises when the knowledge-base grows so large that access to it and search for applicable rules during translation requires tremendous amount of access time and to an extreme becomes difficult. Exponential growth of the knowledge-base may also happen due to the free word order nature of Filipino sentence construction, such that one English sentence can be translated to several Filipino sentences. When all these combinations are part of the translation examples, a translation rule will be learned and extracted by the system for each combination, thus, causing growth of the knowledge-base. Thus, algorithms that perform generalization of rules are considered to remove specificity of translation rules extracted and thus, reduce the size of the rule knowledge-base.
Let me now demonstrate to you how the REAL Translation system or Rule Extraction Applied in Language Translation system learns translation rules through example.
171 Philippine Languages (SIL)
No Philippine Corpora: NNLPRS, workshops
Unfortunately, today, the Philippines has one of the highest rates of dying languages (Solfed Foundation Inc)
In the 1800s, Ornolfor Thorsson, an adviser of the President of Iceland, said, “Without our language, we have no culture, we have no identity, we are nothing.”
Ornolfor Thorsson said this when the Icelandic language was in danger of disappearing after years of Norwegian colonialism.
Throughout this entire century, the progressive global philosophy regarding deafness and deaf people have risen beyond a medical / infirmity model and moved towards a cultural /linguistic framework. Deaf individuals are no longer then simply viewed as hearing impaired or handicapped, but rather as Deaf, or, members of a cultural and linguistic minority.
My first personal encounter with the group was during our first consultative workshop towards building the Philippine corpus. Despite the super typhoon that later hit Taiwan as well last August 2007, the workshop was well attended by at least 10 members of the Philippine Federation for the Deaf. They are enthusiastic, they are very active, driven community and they have a cause. Personally, I felt humbled by their presence, I felt that my world was so small, and when I met them, I felt that I should enlarge my coast (as the Bible puts it).
This new linguistic framework is largely due to the emergence of sign linguistics as a discipline. The documentation and consequent acceptance of sign languages as true languages have been key to the recognition of Deaf communities. Deaf individuals of various nations throughout the world, including the Philippines, now draw from the strength of this collective identity for advocacies in various aspects of their lives.
The history of manual communication in general in the Philippines, and the emergence and development of Filipino Sign Language (FSL) as the linguistic entity and sociocultural symbol of the Filipino Deaf community is a matter of great importance to Deaf individuals as well as the community at large.