Scientific article: We propose a novel knowledge-rich approach to measuring the similarity between a pair of words. The algorithm is tailored to Bulgarian and Russian and takes into account the orthographic and the phonetic correspondences between the two Slavic languages: it combines lemmatization, hand-crafted transforma-tion rules, and weighted Levenshtein distance. The experimental results show an 11-pt interpolated average precision of 90.58%, which represents a sizeable improvement over two classic rivaling approaches.
Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...Svetlin Nakov
Scientific paper: False friends are pairs of words in two languages
that are perceived as similar, but have different
meanings, e.g., Gift in German means poison in
English. In this paper, we present several unsupervised
algorithms for acquiring such pairs
from a sentence-aligned bi-text. First, we try different
ways of exploiting simple statistics about
monolingual word occurrences and cross-lingual
word co-occurrences in the bi-text. Second, using
methods from statistical machine translation, we
induce word alignments in an unsupervised way,
from which we estimate lexical translation probabilities,
which we use to measure cross-lingual
semantic similarity. Third, we experiment with
a semantic similarity measure that uses the Web
as a corpus to extract local contexts from text
snippets returned by a search engine, and a bilingual
glossary of known word translation pairs,
used as “bridges”. Finally, all measures are combined
and applied to the task of identifying likely
false friends. The evaluation for Russian and
Bulgarian shows a significant improvement over
previously-proposed algorithms.
Automatic Identification of False Friends in Parallel Corpora: Statistical an...Svetlin Nakov
Scientific article: False friends are pairs of words in two languages that are perceived as similar but have different meanings. We present an improved algorithm for acquiring false friends from sentence-level aligned parallel corpus based on statistical observations of words occurrences and co-occurrences in the parallel sentences. The results are compared with an entirely semantic measure for cross-lingual similarity between words based on using the Web as a corpus through analyzing the wordsâ local contexts extracted from the text snippets returned by searching in Google. The statistical and semantic measures are further combined into an improved algorithm for identification of false friends that achieves almost twice better results than previously known algorithms. The evaluation is performed for identifying cognates between Bulgarian and Russian but the proposed methods could be adopted for other language pairs for which parallel corpora and bilingual glossaries are available.
Nakov S., Nakov P., Paskaleva E., Improved Word Alignments Using the Web as a...Svetlin Nakov
Nakov P., Nakov S., Paskaleva E., Improved Word Alignments Using the Web as a Corpus, Proceedings of the International Conference RANLP 2007 (Recent Advances in Natural Language Processing), pp. 400-405, ISBN 978-954-91743-7-3, Borovets, Bulgaria, 27-29 September 2007
Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...Svetlin Nakov
Scientific paper: False friends are pairs of words in two languages
that are perceived as similar, but have different
meanings, e.g., Gift in German means poison in
English. In this paper, we present several unsupervised
algorithms for acquiring such pairs
from a sentence-aligned bi-text. First, we try different
ways of exploiting simple statistics about
monolingual word occurrences and cross-lingual
word co-occurrences in the bi-text. Second, using
methods from statistical machine translation, we
induce word alignments in an unsupervised way,
from which we estimate lexical translation probabilities,
which we use to measure cross-lingual
semantic similarity. Third, we experiment with
a semantic similarity measure that uses the Web
as a corpus to extract local contexts from text
snippets returned by a search engine, and a bilingual
glossary of known word translation pairs,
used as “bridges”. Finally, all measures are combined
and applied to the task of identifying likely
false friends. The evaluation for Russian and
Bulgarian shows a significant improvement over
previously-proposed algorithms.
Automatic Identification of False Friends in Parallel Corpora: Statistical an...Svetlin Nakov
Scientific article: False friends are pairs of words in two languages that are perceived as similar but have different meanings. We present an improved algorithm for acquiring false friends from sentence-level aligned parallel corpus based on statistical observations of words occurrences and co-occurrences in the parallel sentences. The results are compared with an entirely semantic measure for cross-lingual similarity between words based on using the Web as a corpus through analyzing the wordsâ local contexts extracted from the text snippets returned by searching in Google. The statistical and semantic measures are further combined into an improved algorithm for identification of false friends that achieves almost twice better results than previously known algorithms. The evaluation is performed for identifying cognates between Bulgarian and Russian but the proposed methods could be adopted for other language pairs for which parallel corpora and bilingual glossaries are available.
Nakov S., Nakov P., Paskaleva E., Improved Word Alignments Using the Web as a...Svetlin Nakov
Nakov P., Nakov S., Paskaleva E., Improved Word Alignments Using the Web as a Corpus, Proceedings of the International Conference RANLP 2007 (Recent Advances in Natural Language Processing), pp. 400-405, ISBN 978-954-91743-7-3, Borovets, Bulgaria, 27-29 September 2007
Natural Language Generation from First-Order ExpressionsThomas Mathew
In this paper I discuss an approach for generating natural language from a first-order logic representation. The approach shows that a grammar definition for the natural language and a
lambda calculus based semantic rule collection can be applied for bi-directional translation using an overgenerate-and-prune mechanism.
SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...kevig
This paper gives complete guidelines for authors submitting papers for the AIRCC Journals. This paper refers to the syntactic analysis of phrases in Romanian, as an important process of natural language processing. We will suggest a real-time solution, based on the idea of using some words or groups of words that indicate grammatical category; and some specific endings of some parts of sentence. Our idea is based on some characteristics of the Romanian language, where some prepositions, adverbs or some specific endings can provide a lot of information about the structure of a complex sentence. Such characteristics can be found in other languages, too, such as French. Using a special grammar, we developed a system (DIASEXP) that can perform a dialogue in natural language with assertive and interogative sentences about a “story” (a set of sentences describing some events from the real life).
This presentation gives hints and guidelines how to find a job in the field of computer programming, software development and software engineering. It suggests ways to find job offers, hints how to apply for a job, how to write cover letter, how to research the company, how to distinguish between good and bad companies, how to write a CV, how to prepare for the interview and how to send your documents.
The author Svetlin Nakov (http://www.nakov.com) has long experience in training and hiring computer programmers, software engineers and IT professionals and conducting IT interviews.
Как да организираме свободен безплатен курс?Svetlin Nakov
В презентацията “Свободни курсове за обучение” д-р Светлин Наков споделя неговия 10-годишен опит с организирането на свободни курсове: събиране на екип от лектори, осигуряване на учебна зала, изграждане на инфраструктура за курс (сайт, регистрация, форум, форма за домашни и проекти, хранилище за учебни материали), изготвяне на учебно съдържание (учебна програма, лекции, демонстрации, упражнения, домашни, проекти), определяне на ръководител на курса, изготвяне на график, проследяване на графика и лекторите, запис и качване на видеозаписи от учебните занятия, провеждане на изпити, оценяване и награждаване.
Improved Word Alignments Using the Web as a CorpusSvetlin Nakov
Scientific article: We propose a novel method for improving word alignments in a parallel sentence-aligned bilingual corpus based on the idea that if two words are translations of each other then so should be many words in their local contexts. The idea is formalised using the Web as a corpus, a glossary of known word translations (dynamically augmented from the Web using bootstrapping), the vector space model, linguistically motivated weighted minimum edit distance, competitive linking, and the IBM models. Evaluation results on a Bulgarian-Russian corpus show a sizable improvement both in word alignment and in translation quality.
Natural Language Generation from First-Order ExpressionsThomas Mathew
In this paper I discuss an approach for generating natural language from a first-order logic representation. The approach shows that a grammar definition for the natural language and a
lambda calculus based semantic rule collection can be applied for bi-directional translation using an overgenerate-and-prune mechanism.
SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...kevig
This paper gives complete guidelines for authors submitting papers for the AIRCC Journals. This paper refers to the syntactic analysis of phrases in Romanian, as an important process of natural language processing. We will suggest a real-time solution, based on the idea of using some words or groups of words that indicate grammatical category; and some specific endings of some parts of sentence. Our idea is based on some characteristics of the Romanian language, where some prepositions, adverbs or some specific endings can provide a lot of information about the structure of a complex sentence. Such characteristics can be found in other languages, too, such as French. Using a special grammar, we developed a system (DIASEXP) that can perform a dialogue in natural language with assertive and interogative sentences about a “story” (a set of sentences describing some events from the real life).
This presentation gives hints and guidelines how to find a job in the field of computer programming, software development and software engineering. It suggests ways to find job offers, hints how to apply for a job, how to write cover letter, how to research the company, how to distinguish between good and bad companies, how to write a CV, how to prepare for the interview and how to send your documents.
The author Svetlin Nakov (http://www.nakov.com) has long experience in training and hiring computer programmers, software engineers and IT professionals and conducting IT interviews.
Как да организираме свободен безплатен курс?Svetlin Nakov
В презентацията “Свободни курсове за обучение” д-р Светлин Наков споделя неговия 10-годишен опит с организирането на свободни курсове: събиране на екип от лектори, осигуряване на учебна зала, изграждане на инфраструктура за курс (сайт, регистрация, форум, форма за домашни и проекти, хранилище за учебни материали), изготвяне на учебно съдържание (учебна програма, лекции, демонстрации, упражнения, домашни, проекти), определяне на ръководител на курса, изготвяне на график, проследяване на графика и лекторите, запис и качване на видеозаписи от учебните занятия, провеждане на изпити, оценяване и награждаване.
Improved Word Alignments Using the Web as a CorpusSvetlin Nakov
Scientific article: We propose a novel method for improving word alignments in a parallel sentence-aligned bilingual corpus based on the idea that if two words are translations of each other then so should be many words in their local contexts. The idea is formalised using the Web as a corpus, a glossary of known word translations (dynamically augmented from the Web using bootstrapping), the vector space model, linguistically motivated weighted minimum edit distance, competitive linking, and the IBM models. Evaluation results on a Bulgarian-Russian corpus show a sizable improvement both in word alignment and in translation quality.
Tools for Developers - presentation by Svetlin Nakov at SofiaDev.NET user group meeting (26-April-2012).
Table of Contents:
Integrated Development Environments (IDEs)
Source Control Systems
Code Generation Tools
Logging Tools
Unit Testing Tools
Automated Testing Tools
Bug Tracking / Issue Tracking Systems
Code Analysis Tools
Code Decompilation Tools
Code Obfuscators
Code Profilers
Refactoring Tools
Automated Build Tools
Continuous Integration Tools
Documentation Generators
Project Planning and Management Tools
Project Hosting and Team Collaboration Sites
Deployment in the Public Clouds
For more information see http://www.nakov.com/blog/2012/04/20/tools-for-developers-sofiadev-seminar-26-april-2012/
Академия на Телерик - безплатни курсове 2011Svetlin Nakov
Представяне на безплатните курсове и обучения по програмиране и разработка на софтуер в академията на Телерик за софтуерни инженери (към 2011 г.). Презентацията включва:
- Представяне на софтуерната корпорация Телерик
- Представяне на Telerik Academy
- Програмата "Софтуерна академия" за безплатно обучение и работа в Телерик (Telerik Software Academy)
- Училищна академия по разработка на софтуер (Telerik School Academy)
- Академия по програмиране за деца (Telerik Kids Academy)
- Безплатни студентски курсове по програмиране и софтуерни технологии (Telerik Students Courses)
- Семинари и състезания по разработка на софтуер в Академията на Телерик
- Трейнърите в Академията за софтуерни инженери на Телерик
Java 7 - New Features - by Mihail Stoynov and Svetlin NakovSvetlin Nakov
Java 7 - New Features
Introduction and Chronology
Compressed 64-bit Object Pointers
Garbage-First GC (G1)
Dynamic Languages in JVM
Java Modularity – Project Jigsaw
Language Enhancements (Project Coin)
Strings in Switch
Automatic Resource Management (ARM)
Improved Type Inference for Generic Instance Creation
Improved Type Inference for Generic Instance Creation
Simplified Varargs Method Invocation
Collection Literals
Indexing Access Syntax for Lists and Maps
Language Support for JSR 292
Underscores in Numbers
Binary Literals
Closures for Java
First-class Functions
Function Types
Lambda Expressions
Project Lambda
Extension Methods
Upgrade Class-Loader Architecture
Method to close a URLClassLoader
Unicode 5.1
JSR 203: NIO.2
SCTP (Stream Control Transmission Protocol)
SDP (Sockets Direct Protocol)
Семинар "Стартиране на ИТ кариера" - http://academy.telerik.com/seminars/it-career
Какво търсят работодателите в ИТ фирмите? Какви личностни качества, какви умения? Какви въпроси ще ви зададат на интервю? Какво очакват да намерят във вашето CV и cover letter.
Лектор: Ивайло Христов, Komfo.
Grow your own software engineers. Stop complaining. Train your developers. Establish a training center and produce your IT professionals. This is how we do it at Telerik Software Academy. A talk by Svetlin Nakov at the Bulgarian HR association annual conference, Nessebar - May 2012.
Да си отгледаме кадри! Стига сте мрънкали, че няма кадри! Създайте си специалистите сами!
Светлин Наков разказва как в софтуерната академия на Телерик отглеждат висококвалифицирани софтуерни инженери и ИТ специалисти за да позволят на компанията да расте бързо с качествени и талантливи разработчици. Презентация на годишната конференция на БАУЧР (българската HR асоциация) в Несебър (май, 2012 г.)
Automatic Acquisition of Synonyms Using the Web as a CorpusSvetlin Nakov
Scientific article: We present an original algorithm for automatic acquisition of synonyms from text. The algorithm measures the semantic similarity between pairs of words by comparing their local contexts extracted from the Web by series of queries against the Google search engine. The results show 11pt average precision of 63.16%.
Finding a Job in the IT Industry Seminar - OpeningSvetlin Nakov
Seminar "Finding a Job in the IT Industry Seminar" - Opening. Sofiq, 12th January 2012. Seminar topics:
How to Search for a Job in the IT Industry?
How to Write a Resume (CV)?
How to Write a Cover Letter?
How to Pass an Interview?
Cognate or False Friend? Ask the Web! (Distinguish between Cognates and False...Svetlin Nakov
Scientific article: We propose a novel unsupervised semantic method for distinguishing cognates from false friends. The basic intuition is that if two words are cognates, then most of the words in their respective local contexts should be translations of each other. The idea is formalised using the Web as a corpus, a glossary of known word translations used as cross-linguistic âbridgesâ, and the vector space model. Unlike traditional orthographic similarity measures, our method can easily handle words with identical spelling. The evaluation on 200 Bulgarian-Russian word pairs
shows this is a very promising approach.
Как да напишем некомерсиална книга с екип от 30 автора?Svetlin Nakov
Искате ли да напишете книга? Когато пишете сам, това не е чак толкова трудно. Обаче, когато книгата се състои от 30 теми и търсите автор за всяка от тях, нещата са много по-различни. Добавете и факта, че на вашите 30 автора не им плащате, а те искат да участват доброволно, защото принадлежат на определена социална група. Каква е рецептата за успех на такова начинание? Как да управлявате такъв проект? Как да намерите кандидат-автори? Как да ги мотивирате авторите и контролирате, за да не ви подведат?
Guidelines how to search for a job in the IT industry and how to prepare and send a job application. The presentation covers:
Where to Search for a Job?
Define Your Goals
Prepare for Starting a Job
Search Channels: Job Sites, Career Centers, etc.
The Job Application Process
Research the Employer
Prepare CV, Cover Letter, Endorsements, etc.
Preparation for an Interview
Go to an Interview
This presentation introduces the principles of high-quality programming code construction during the software development process. The quality of the code is discussed in its most important characteristics – correctness, readability and maintainability. The principles of construction of high-quality class hierarchies, classes and methods are explained. Two fundamental concepts – “loose coupling” and “strong cohesion” are defined and their effect on the construction of classes and subroutines is discussed. Some advices for correctly dealing with the variables and data are given, as well as directions for correct naming of the variables and the rest elements of the program. Best practices for organization of the logical programming constructs are explained. Attention is given also to the “refactoring” as a technique for improving the quality of the existing code. The principles of good formatting of the code are defined and explained. The concept of “self-documenting code” as a programming style is introduced.
Софтуерен университет - качествено обучение безплатно (OpenFest 2012)Svetlin Nakov
Софтуерният университет е инициатива за изграждане на истинско висше образование за софтуерни инженери в България - качествено, отговарящо на нуждите на индустрията и пазара на труда, напълно безплатно и с акредитиране диплома. В презентацията д-р Светлин Наков обяснява концепцията за "свободен софтуерен университет" и напредъка по проекта. Официален сайт: http://softuni.bg.
Nakov S., Nakov P., Paskaleva E., Cognate or False Friend? Ask the Web!Svetlin Nakov
Nakov S., Nakov P., Paskaleva E., Cognate or False Friend? Ask the Web!, Proceedings of the International Workshop "Acquisition and Management of Multilingual Lexicons", part of the International Conference RANLP 2007, pp. 55-62, ISBN 978-954-452-004-5, Borovets, Bulgaria, 30 September 2007
International Refereed Journal of Engineering and Science (IRJES)irjes
The core of the vision IRJES is to disseminate new knowledge and technology for the benefit of all, ranging from academic research and professional communities to industry professionals in a range of topics in computer science and engineering. It also provides a place for high-caliber researchers, practitioners and PhD students to present ongoing research and development in these areas.
Morpheme, morphological analysis and morphemic analysissyerencs
Structure of morphological analysis and morphemic analysis. The morpheme refers to either a class of forms or an abstraction from the concrete forms of language. A morpheme is internally indivisible, it cannot be further subdivided or analyzed into smaller meaningful unit. It is also externally transportable; it has positional mobility or free distribution, occurring in various context.
Morphemes are represented which curly brace { } using capital letters for lexemes or descriptive designations for types of morphemes.
A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)kevig
This paper is studying the problems facing languages (English, Kurdish, and Arabic) due to the lack of
symmetry between the negation morphemes in these languages, which differ in their numbers, the specifics
of their uses, and the multiplicity of their meanings, at the two levels: general language and science terms.
The negation morphemes in the Arabic and Kurdish languages are located at the beginning of the word as
prefixes. That is, neither of them has infixes or suffixes that indicate the negative (morphologically), that is,
at the level of the word or the lexical unit, and are grammatically at the level of the sentence. These
negation morphemes may enter the middle of the sentence in Kurdish, but in Arabic, the negation
morphemes are located at the forefront of the sentence. Most of the negation affixes in English are also
prefixes, and a few are the suffixes denoting negation, the most prominent of which is (-less); English is
also devoid of negation infixes. The lack of equivalence between the negation morphemes between
quantitative and qualitative languages leads to chaos and disorder when transferred between languages.
This necessitates the need to establish rules regulating the work of these multiple morphemes for the
function of negation, at morphological structure, syntactic, semantic features, and the nature of their uses,
at the level of the language itself, and at the level of contrastive between the sending and receiving
languages.
A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)kevig
This paper is studying the problems facing languages (English, Kurdish, and Arabic) due to the lack of
symmetry between the negation morphemes in these languages, which differ in their numbers, the specifics
of their uses, and the multiplicity of their meanings, at the two levels: general language and science terms.
The negation morphemes in the Arabic and Kurdish languages are located at the beginning of the word as
prefixes. That is, neither of them has infixes or suffixes that indicate the negative (morphologically), that is,
at the level of the word or the lexical unit, and are grammatically at the level of the sentence. These
negation morphemes may enter the middle of the sentence in Kurdish, but in Arabic, the negation
morphemes are located at the forefront of the sentence. Most of the negation affixes in English are also
prefixes, and a few are the suffixes denoting negation, the most prominent of which is (-less); English is
also devoid of negation infixes. The lack of equivalence between the negation morphemes between
quantitative and qualitative languages leads to chaos and disorder when transferred between languages.
This necessitates the need to establish rules regulating the work of these multiple morphemes for the
function of negation, at morphological structure, syntactic, semantic features, and the nature of their uses,
at the level of the language itself, and at the level of contrastive between the sending and receiving
languages.
Най-търсените направления в ИТ сферата за 2024Svetlin Nakov
Най-търсените направления в ИТ сферата?
д-р Светлин Наков, съосновател на СофтУни
София, май 2024 г.
Какво се случва на пазара на труда в ИТ сектора?
Какви са прогнозите за ИТ сектора за напред?
Защо има смисъл да учиш програмиране и ИТ през 2024?
Каква е ролята на AI в ИТ професиите?
Как да започна работа като junior?
Upskill програмите на СофтУни
BG-IT-Edu: отворено учебно съдържание за ИТ учителиSvetlin Nakov
Отворено учебно съдържание по програмиране и ИТ за учители
Безплатни учебни курсове и ресурси за ИТ учители
Разработени курсове към 03/2024 г. в BG-IT-Edu
https://github.com/BG-IT-Edu
Качествени учебни курсове (учебно съдържание) за ИТ учители: презентации + примери + упражнения + проекти + задачи за изпитване + judge система + насоки за учителите
Достъпни безплатно, под отворен лиценз CC-BY-NC-SA
Разработени от СофтУни Фондацията, по инициатива и под надзора на д-р Светлин Наков
Научете повече тук: https://nakov.com/blog/2024/03/27/bg-it-edu-open-education-content-for-it-teachers/
Светът на програмирането през 2024 г.
Продължава ли бумът на технологичните професии? Кои професии ще се търсят? Как да започна?
Прогнозата на д-р Светлин Наков за бъдещето на софтуерните професии в България
Има ли смисъл да учиш програмиране през 2024?
Какво се търси на пазара на труда?
Ще продължи ли търсенето на програмисти и през 2024?
Все още ли е най-търсената професия в технологиите?
Ролята на AI в сферата на софтуерните разработчици
Какво се случва на пазара на труда?
Има ли спад в търсенето на програмисти?
Как да започна с програмирането?
Видео от събитието сме качили във FB: https://fb.com/events/346653434644683
AI Tools for Business and Startups
Svetlin Nakov @ Innowave Summit 2023
Artificial Intelligence is already here!
AI Tools for Business: Where is AI Used?
ChatGPT and Bard in Daily Tasks
ChatGPT and Bard for Creativity
ChatGPT and Bard for Marketing
ChatGPT for Data Analysis
DALL-E for Image Generation
Learn more at: https://nakov.com/blog/2023/11/25/ai-for-business-and-startups-my-talk-at-innowave-summit-2023/
AI Tools for Scientists - Nakov (Oct 2023)Svetlin Nakov
Инструменти с изкуствен интелект в помощ на изследователите
Д-р Светлин Наков @ Anniversary Scientific Session dedicated to the 120th anniversary of the birth of John Atanasoff
Изкуствен интелект при стартиране и управление на бизнес
Семинар във FinanceAcademy.bg
Д-р Светлин Наков
Изкуственият интелект (ИИ) е вече тук!
Къде се ползва ИИ?
ChatGPT – демо
Bard – демо
Claude – демо
Bing Chat – демо
Perplexity – демо
Bing Image Create – демо
Bulgarian Tech Industry - Nakov at Dev.BG All in One Conference 2023Svetlin Nakov
IT industry in Bulgaria: key factors for success and the future. Deep tech, science, innovation, and education and how we can achieve more as an industry?
Dr. Svetlin Nakov
Innovation and Inspiration Manager @ SoftUni
Contents:
How big is the IT industry in Bulgaria?
Number of software professionals in Bulgaria: according to historical data from BASSCOM
Share of the software industry in GDP
Why does Bulgaria have such a successful IT industry?
Education for the tech industry: school education in software professions and profiles (2022/2023)
Education for the tech industry: Students in university in IT specialties (2022/2023)
Education for the IT industry: Learners at SoftUni (2022/2023)
Evolution of the Bulgarian software industry
How much can the industry grow?
Trends in the IT industry: AI progress, the IT market in Bulgaria, deep tech, science, and innovation
AI in the software industry
How to achieve more as an industry? education, deep tech, science, and innovation, entrepreneurship
Introduction
The IT industry in Bulgaria is one of the most successful in the country. It has grown rapidly in recent years and is now a major contributor to the economy. In this talk, Dr. Nakov explores the key factors behind the success of the Bulgarian IT industry, as well as its future prospects.
AI Tools for Business and Personal LifeSvetlin Nakov
A talk at LeaderClass.BG, Sofia, August 2023
by Svetlin Nakov, PhD
The artificial intelligence (AI) is here!
Where is AI used?
ChatGPT - demo
Bing Chat - demo
Bard - demo
Claude - demo
Bing Image Create - demo
Playground AI - demo
In this talk the speaker explains and demonstrates some AI tools for the business and personal life:
ChatGPT: a large language model that can generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.
Bing Chat: an Internet-connected AI chatbot that can search Internet and answer questions.
Bard: a large language model from Google AI, trained on a massive dataset of text and code, similar to ChatGPT.
Claude: A large AI chatbot, similar to ChatGPT, powerful in document analysis.
Bing Image Create: a tool that can generate images based on text descriptions.
Playground AI: image generator and image editor, based on generative AI.
Дипломна работа: учебно съдържание по ООП - Светлин НаковSvetlin Nakov
Дипломна работа на тема
"Учебно съдържание по обектно-ориентирано програмиране в профилираната подготовка по информатика"
Дипломант: д-р Светлин Наков
Специалност: Педагогика на обучението по информатика и информационни технологии в училище (ПОИИТУ)
Степен: магистър
Пловдивски университет "Паисий Хилендарски"
Факултет по математика и информатика (ФМИ)
Катедра “Компютърни технологии”
Настоящата дипломна работа има за цел да подпомогне българските ИТ учители от системата на средното образование в профилираните гимназии и паралелки, като им предостави безплатно добре разработени учебни програми и качествено учебно съдържание за преподаване в първия и най-важен модул от профил “Софтуерни и хардуерни науки”, а именно “Модул 1. Обектно-ориентирано проектиране и програмиране”.
Чрез изграждането на качествени учебни програми и ресурси за преподаване и пренасяне на добре изпитани образователни практики от автора на настоящата дипломна работа (д-р Светлин Наков) към българските ИТ учители целим значително да подпомогнем учителите в тяхната образователна кауза и да повишим качеството на обучението по програмиране в профилираните гимназии с профил “Софтуерни и хардуерни науки”.
Резултатите от настоящата дипломна работа са вече внедрени в практиката и разработените учебни ресурси се използват реално от стотици ИТ учители в България в ежедневната им работа. Това е една от основните цели и тя вече е изпълнена, още преди защитата на настоящата дипломна работа.
Прочетете повече в блога на д-р Наков: https://nakov.com/blog/2023/07/08/free-learning-content-oop-nakov/
Дипломна работа: учебно съдържание по ООПSvetlin Nakov
Презентация за защита на
Дипломна работа на тема
"Учебно съдържание по обектно-ориентирано програмиране в профилираната подготовка по информатика"
Дипломант: д-р Светлин Наков
Пловдивски университет "Паисий Хилендарски"
Факултет по математика и информатика (ФМИ)
Катедра “Компютърни технологии”
Защитена на: 8 юли 2023 г.
Научете повече в блога на д-р Наков: https://nakov.com/blog/2023/07/08/free-learning-content-oop-nakov
Свободно ИТ учебно съдържание за учители по програмиране и ИТSvetlin Nakov
В тази сесия разказвам за училищното образование по програмиране и ИТ, за професионалните и профилираните гимназии, свързани с програмиране и ИТ, за STEM кабинетите, за българските ИТ учители и тяхната подготовка, за проблемите, с които те се сблъскват, и как можем да им помогнем чрез проекта „Свободно учебно съдържание по програмиране и ИТ“: https://github.com/BG-IT-Edu.
Open Fest 2021, 14 август, София
В света се засилва тенденцията за установяване на STEAM образованието като двигател на научно-техническия прогрес чрез развитие на интердисциплинарни умения в сферата на природните науки, математиката, информационните технологии, инженерните науки и изкуствата в училищна възраст. С масовото изграждане на STEAM лаборатории в българските училища се изостря недостига на добре подготвени STEAM и ИТ учители.
Вярвайки в идеята, че българската ИТ общност може да помогне за решаването на проблема, през 2020 г. по инициатива на СофтУни Фондацията стартира проект за създаване на безплатно учебно съдържание по програмиране и ИТ за учители в подкрепа на училищното технологично образование. Проектът е със свободен лиценз в GitHub: https://github.com/BG-IT-Edu. Учителите получават безплатно богат комплект от съвременни учебни материали с високо качество: презентации, постъпкови ръководства, задачи за упражнения и практически проекти, окомплектовани с насоки, подсказки и решения, безплатна система за автоматизирано тестване на решенията и други учебни ресурси, на български и английски език.
Създадени са голяма част от учебните курсове за професиите "Приложен програмист", "Системен програмист" и "Програмист" в професионалните гимназии. Амбицията на проекта е да се създадат свободни учебни материали и за обученията в профил "Софтуерни и хардуерни науки" в профилираните гимназии.
Целта на проекта “Свободно ИТ учебно съдържание за учители” е да подпомогне българския ИТ учител с качествени учебни материали, така че да преподава на добро ниво и със съвременни технологии и инструменти, за да положи основите на подготовката на бъдещите ИТ специалисти и дигитални лидери на България.
В лекцията разказвам за училищното образование по програмиране и ИТ, за професионалните и профилираните гимназии, свързани с програмиране и ИТ, за STEM кабинетите, за българските ИТ учители и тяхната подготовка, за проблемите с които те се срещат, за липсата на учебници и учебни материали по програмиране, ИТ и по техническите дисциплини и как можем да помогнем на ИТ учителите.
A public talk "AI and the Professions of the Future", held on 29 April 2023 in Veliko Tarnovo by Svetlin Nakov. Main topics:
AI is here today --> take attention to it!
- ChatGPT: revolution in language AI
- Playground AI – AI for image generation
AI and the future professions
- AI-replaceable professions
- AI-resistant professions
AI in Education
Ethics in AI
In this seminar Dr. Svetlin Nakov talks about the programming languages, their popularity, available jobs and trends for 2022-2023.
Modern software development uses dozens of programming languages, along with hundreds of technology frameworks, libraries, and software tools.
This talk will review the most popular programming languages on the labor market: JavaScript, Java, C#, Python, PHP, C++, Go, Swift. It will be briefly stated what each of them is, what it is used for and what is its demand in the IT industry.
Agenda:
The Most Used Programming Languages in 2022:
- Python, Java, JavaScript, C#, C++, PHP
Jobs by Programming Languages in 2022:
- Jobs Worldwide by Programming Language
- Jobs in Bulgaria by Programming Language
Programming Languages Trends for 2023
- Language Popularity Rankings from Stack Overflow, GitHub, PYPL, IEEE, TIOBE, Etc.
Become a Software Developer: How To Start?
IT Professions and How to Become a DeveloperSvetlin Nakov
IT Professions and Their Future
The landscape of IT professions in the tech industry: software developer, front-end, back-end, AI, cloud, DevOps, QA, Java, JavaScript, Python, C#, C++, digital marketing, SEO, SEM, SMM, project manager, business analyst, CRM / ERP consultant, design / UI / UX expert, Web designer, motion designer, etc.
Industry 4.0 and the future of manufacturing, smart cities and digitalization of everything.
What are the most in-demand professions on LinkedIn? Why the best jobs in the world are related to software development and IT?
How to learn coding and start a tech job?
Why anyone can be a software developer?
Dr. Svetlin Nakov
December 2022
GitHub Actions (Nakov at RuseConf, Sept 2022)Svetlin Nakov
Building a CI/CD System with GitHub Actions
Dr. Svetlin Nakov
September 2022
Intro to Continuous Integration (CI), Continuous Delivery (CD), Continuous Deployment (CD), and CI/CD Pipelines
Intro to GitHub Actions
Building CI workflow
Building CD workflow
Live Demo: Build CI System for JS and .NET Apps
How to Become a QA Engineer and Start a JobSvetlin Nakov
How to Become a QA Engineer and Start a Job
Svetlin Nakov, PhD
Sofia, Nov, 2022
1) Why Become а QA Engineer?
2) How to Become a Software Quality Assurance Engineer (QA)?
Learn the Fundamental QA Skills:
Manual QA skills – 30%
Software engineering skills – 20%
QA automation skills – 50%
3) Anyone Can Be a QA Engineer!
4) The QA Program @ SoftUni
https://softuni.bg/qa
Призвание и цели: моята рецепта
д-р Светлин Наков
Как да открия своето призвание в живота и да го направя своя професия?
Съдържание:
1) Светлин Наков - моята история
2) Дигиталните професии
3) Как да намеря своята професия?
Поход на вдъхновителите | Аз мога тук и сега @ Петрич (19 ноември 2022 г.)
What Mongolian IT Industry Can Learn from Bulgaria?Svetlin Nakov
In this talk the speaker Dr. Svetlin Nakov explains the growth of the Bulgarian software industry for the latest 25 years and the key events in its growth.
He gives rich statistical data about Bulgaria, its GDP, and its software industry, which generates nearly 5% of the GDP (in 2022), about the open developer positions (5K in Oct 2022) and the number of software developers in Bulgaria (54K in Oct 2022).
The growth of the number of software developers in Bulgaria, software industry's revenues and their share in the national GDP are traced back from 2022 to 2005.
Similar research about the Mongolian software industry is conducted and available data is compared to Bulgaria and USA.
An interesting parallel is given between Bulgaria and Mongolia in terms of their software engineering talent and industry growth potential.
Finally, Dr. Nakov gives his recommendations about how local Mongolian software companies can reach the global tech market and suggests to use the "outstaffing" business model, targeting the European tech industry and the Asian region.
The talk is given at the "The Future of IT" forum in Ulaanbaatar, Mongolia (October 2022).
How to Become a Software Developer - Nakov in Mongolia (Oct 2022)Svetlin Nakov
In this talk the speaker Svetlin Nakov explains the IT professions and their extremely high demand in the latest years, and gives a recipe how to become a software engineer.
He recommends to spend 1-2 years in studying and practicing software engineering, following a learning curriculum like this:
Basic Coding Course – calculations, data, conditions, loops, IDE
Fundamentals of Programming – arrays, lists, maps, nested structures, text processing, error handling, basic language APIs, problem solving
Object-Oriented Programming – classes, objects, inheritance, …
Databases and ORM – relational DB, SQL, ORM frameworks, XML, JSON
Back-End Development – HTTP, MVC, Web apps, REST
Front-End Development – HTML, CSS, JS, DOM, AJAX, JS Frameworks
Projects – Git, software engineering, teamwork
Example: https://softuni.org/learn
The talk is from the "The Future of IT" forum in Ulaanbaatar, Mongolia (October 2022).
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
The Art of the Pitch: WordPress Relationships and Sales
A Knowledge-Rich Approach to Measuring the Similarity between Bulgarian and Russian Words
1. A Knowledge-Rich Approach to Measuring the Similarity
between Bulgarian and Russian Words
Svetlin Nakov Elena Paskaleva Preslav Nakov
Faculty of Mathematics and Informatics Linguistic Modeling Laboratory Department of Computer Science
Sofia University "St. Kliment Ohridski" Bulgarian Academy of Sciences National University of Singapore
5 James Boucher Blvd, Sofia, Bulgaria 25A Acad. G. Bontchev Str., Sofia, Bulgaria 13 Computing Drive, Singapore
nakov @ fmi.uni-sofia.bg hellen @ lml.bas.bg nakov @ comp.nus.edu.sg
Abstract
2. Method
We propose a novel knowledge-rich approach to measuring the
similarity between a pair of words. The algorithm is tailored to The normalization of the Bulgarian and the Russian
Bulgarian and Russian and takes into account the orthographic words into corresponding intermediate forms has phone-
and the phonetic correspondences between the two Slavic lan- tic and morphological motivation and is performed as a
guages: it combines lemmatization, hand-crafted transformation sequence of steps, which will be described below.
rules, and weighted Levenshtein distance. The experimental re-
sults show an 11-pt interpolated average precision of 90.58%, 2.1. Transliteration from Cyrillic to Cyrillic
which represents a sizeable improvement over two classic
rivaling approaches. In a strict linguistic sense, transcription is the process of
transition from sounds to letters, i.e., from speech to text;
Keywords it is carried out generally in a monolingual context. In a
Orthographic similarity, phonetic similarity, cross-lingual bilingual context, the notion of transliteration is used to
transformation. denote the transition of sounds and their letter correspon-
dences in one language to letters in another language. The
1. Introduction term transliteration is commonly used for the transition
We propose an algorithm that measures the extent to of letters when the two languages use different alphabets.
which a Bulgarian and a Russian word are perceived as In this paper, we deal with transliteration since we work
similar by a person who is fluent in both languages. with written texts.
Leaving aside the full orthographical identity, we assume The linguistic objective of our investigation is to intro-
that words with different orthography can be also duce more formal criteria to the investigation of possible
perceived as similar when they have the same or a similar cognates between Russian and Bulgarian. By cognates
stem and inflections, as in the Bulgarian word we mean words with equal or close orthography denoting
афектирахме and the Russian аффектировались (both the same meaning; words with equal/close orthography
meaning ‘we were affected’). but different meaning are false cognates/friends. For their
further investigation in multilingual research, we need to
Bulgarian and Russian are closely related Slavonic define the exact expression of that identity/closeness by
languages with rich morphology, which motivates us to particular metrics and procedures.
study the typical orthographical correspondences between For a pair of languages from different families, the
their lexical entries (conditioned phonetically and mor- source of cognates is borrowing between them or from a
phologically), which we use to formulate and apply trans- third language. Besides borrowing, an essential source of
formation rules for bringing a Russian word close to cognates in related languages is their common protolan-
Bulgarian reading and vice versa. Our algorithm for guage. However, in the historical development of both
measuring the similarity between Bulgarian and Russian languages, three factors lead to different grapheme shape
words first reduces the Russian word to an intermediate for fully identical words: (1) language-specific phonetic
Bulgarian-sounding form and then compares it orthogra- laws and resulting changes, (2) settings of the spelling
phically to the Bulgarian word. The algorithm starts by systems regulating the sound-letter transition, and (3)
transliterating the Russian word with the Bulgarian divergence in the grammatical systems and the grammati-
alphabet, and then transforms some typical Russian cal formatives.
morphemes and word parts (e.g., prefixes, suffixes,
endings, etc.) to their Bulgarian counter-parts. Since both 2.1.1 Full coincidence (equality) of letters
Bulgarian and Russian are highly-inflectional languages, Both Russian and Bulgarian use the Cyrillic alphabet in
lemmatization is used to convert the wordforms to their their writing systems, but Russian uses two letters not
lemmata in order to reduce the differences at the morpho- present in Bulgarian: ы and э. Most other letters generally
logical level. Finally, the orthographic similarity is mea- show a full coincidence with some exceptions to be listed
sured using a modified Levenshtein distance with letter- in the following subsections. The list below presents the
specific substitution weights. full identity of Cyrillic letters in both languages in the
cognates: азбука – азбука, буква – буква, воля – воля,
2. гипс – гипс, дух – дух, езда – езда, жена – жена, закон A fundamental difference between Russian and Bulga-
– закон, истина – истина, йод – йод, кипарис – rian spellings is the treatment of double consonants.
кипарис, лак – лак, монета – монета, нож – нож, Russian allows them in every part of the word structure,
опора – опора, пост – пост, река – река, сом – сом, while in Bulgarian they are only possible at the morphe-
том – том, ум – ум, факт – факт, химия – химия, me boundary. Thus, all words borrowed from third
царь – цар ,чай – чай, шум – шум, щит – щит, юг – languages keep their double consonants in Russian, but
юг, яхта – яхта lose them in Bulgarian, e.g., процесс – процес, аффект
– афект, etc. In this way, a regular transition ll-l can be
As the above list shows, the full identity of the
formulated for all double consonants with the following
grapheme shape of cognates is manifested mainly when
stipulation of grammatical origin.
the transformed letter is in initial position.
In words of Slavic origin, consonant doubling occurs
2.1.2 Regular letter transitions mainly at the morpheme boundary, but in Russian the
phenomenon is more frequent since Russian spelling rules
Replacing Russian letters that are missing in the Bulgari- are more “phonetic”. For example, they reflect the change
an alphabet. The transitions discussed here stem from voiced-voiceless for all prefixes ending with з and
historic differences in the phonetic and the spelling sys- preceding the initial с of the next morpheme. Bulgarian
tems of the two languages. Bulgarian and Russian differ spelling is more ‘morphological’ and conservative; it
in their contemporary phonetic system mainly at the level keeps the з in writing, although it is voiceless in
of pronunciation; in the distinction of soft and hard pronunciation, e.g., рассуждение – разсъждение,
consonants. The Russian-specific letters ы and э serve to бессмертный – безсмъртен, etc. This transformation of
denote the variant of a ‘hard consonant+и/е’ while in hard-soft consonants in the final prefix position is only
Bulgarian all consonants preceding и and е are soft. This valid for the couple з-с. Thus, the Bulgarian-Russian
basic difference of the phonetic systems gives us the transition зс-сс can be formulated as regular for prefixes
regular correspondence ы-и and э-е in all Russian- only and cannot be viewed as a universal for other parts
Bulgarian cognates containing these two letters, e.g., of the word, e.g., кавказский – кавказки.
рыба – риба, поэт- поет. Next, the following general question in treating dou-
Removing a Russian letter. Another regular phonetic dif- ble consonant correspondences arises: if we want to stay
ference between the two languages, which is also related in the domain of uni- and bigram transformations, remo-
to the opposition soft/hard, is the allowed softness of a ving the second consonant in Russian can be ambiguous
consonant preceding another consonant (пальто) or in поддержать – поддържам, but буддист – будист,
final position (шесть). Such phonetic combinations are вводить – въвеждам, раввин – равин. The legal
not allowed in Bulgarian: see the corresponding палто consonant doublings in Bulgarian can be only outlined in
and шест. This regularity allows us to remove all a larger context – a window of up to five letters, contai-
Russian ь in these positions in the initial stage of the pro- ning the prefix and the next consonant, as in предд, надд,
cess of cognate comparison. подд, изз, разз, etc., where the second consonant should
be preserved. Note that these exceptions from the rule are
Partial regularity of the letter transitions. In non-initial only valid for double д, з and в – final letter of prefixes,
positions, other not so regular but repeated letter corres- and for н – first letter of the affix н, e.g., непременно –
pondences can be observed, e.g., е-я in хлеб-хляб, е-ъ in непременно, but аннотация – анотация. .
серп-сърп, о-ъ in сон-сън, у-ъ in муж-мъж, etc. The
iterativity of such transitions is due to the specific Transformations of morphological origin.
development of the spelling systems in the two languages. In addition to the divergent development of phonetic
One such example is the disappearance of some Old and spelling systems, the two languages develop different
Slavic letters and their regular replacement with different grammatical systems, both at a systemic and at a morphe-
letters in Russian and Bulgarian. The above-mentioned mic level – different categories with different graphemic
change у-ъ is due to the disappearance of an Old Slavic expressions. That divergence leads to different grapheme
letter called ‘big yus’ and its regular replacement by shapes for words that are lexically conceived as cognates,
different vowels in all contemporary Slavic languages. e.g., жены – жената, and the difference is manifested
The transition is only partially regular since not all occur- in the ending part of the word, consisting of affixes, and
rences of the letter have the same etymological origin. ending and related to grammatical forms.
The transformations are made in two directions and
2.2. Transformations of n-grams for both languages. They can consist of removal of a
letter sequence or its transformation.
The sound-letter transition legitimated by the spelling ru-
les of the two languages is specific as well; its specificity 1. Removing agglutinative morphemes.
is observed at the level of the grapheme composition of Each of the two languages has one agglutinative me-
the full cognates, i.e., those that are borrowed from third chanism of word formation (but for different parts of
languages or that are identical morphologically. speech) – the reflexive morpheme ся and сь in Russian
verb conjugation and the postpositioned article in Bulga-
Transformations originating from spelling. rian in nominal inflections (for nouns and adjectives). The
3. corresponding grammatical meanings are expressed in the Russian Bulgarian
twin language by other means (the article is totally Examples
Ending Ending
missing in Russian and the reflexivity of verbs is expres-
декорировать →
sed by a lexical element in Bulgarian – the particle се). -овать -ам
декорирам
Thus, removing these morphemes is the first step in the
process of conversion to an intermediate form, e.g., -ить, - бродить → бродя
-я
веселиться – веселить, квадратът – квадрат. Note ять блеять → блея
that the Russian agglutinative morpheme ся/сь at the end -ать -ам давать → давам
of the word are non-ambigous: all 212,000 wordforms -уть -а гаснуть – гасна
with the ending ся in our Russian grammatical dictionary
are reflexive verb forms. This is not the case with the -еть -ея белеть → белея
Bulgarian article, where only removing the morpheme ът Table 2 – Transformation of Russian verbs to Bulgarian.
for masculin is non-ambiguous, while removing та, ят
and other article morpheme can trim the stem, e.g., жена- Concerning the transformation of endings, it is impor-
та, but квадрат-а. We intentionally do not derive a tant to note that two linguistic problems are interrelated
transformation rule from the last correspondence. here: (1) the formal revelation of the morpheme
Removing Bulgarian articles depends on the accepted boundary, and (2) the correct correspondence with the
conception about the place of lemmatization in the Bulgarian ending. The existing ambiguity in resolving
algorithm – should we set the orthographic similarity for these two problems requires serious statistical investigati-
all four members of the language pair – lemmata and ons before the rules can be formulated.
wordforms – or should we measure the similarity at the With ambiguity not taken into account, the proposed
lexical level only – the lemmata. In the latter case, no re- transformation rules for Russian word endings could
moval is necessary (see 1.3) sometimes generate the wrong Bulgarian wordform, e.g.,
2. Transforming ending strings. висеть could become висея, while the correct Bulgarian
form is вися. In order to limit the negative impact of that,
There is a big group of adjectives in the two langua- we measure the similarity (1) with and (2) without
ges derived from other parts of speech and formed with applying rules for lemmatization; we then return the
the suffix н and an adjectival ending, e.g., шум – higher value of the two.
шумный, шум – шумен. When the adjective is derived
from a noun ending with н, we get a doubled н in the 2.3. Lemmatization
Russian lemma and in the Bulgarian wordforms, e.g.,
гарнизон – гарнизонный and гарнизон – гарнизонни. Bulgarian and Russian are highly-inflectional languages,
Another regular correspondence is manifested in the word i.e., they use variety of endings to express the different
derivation with the suffix ск. All these combinations of н forms of the same word. When measuring orthographic
/ нн / ск and different adjectival endings give the similarity, endings could cause major problems since they
correspondences shown in Table 1. can make two otherwise very similar words appear
somewhat different. For example, the Bulgarian word
Russian Bulgarian отправената (‘the directed’, a feminine adjective with a
Examples definite article) and the Russian word отправленному
Ending Ending
-нный -нен военный → военен (‘the directed’, a masculine adjective in dative case)
exhibit only about 50% letter overlap, but, if we ignore
-ный -ен вечный – вечен the endings, the similarity between them becomes much
-нний -нен ранний → ранен bigger. Thus, if our algorithm could safely ignore word
-ний -ен вечерний → вечерен endings when comparing words, it might perform better.
вражеский → If we could remove the ending, the similarity would
-ский -ски be measured using the stem, which is the invariable part
вражески
of the word. Unfortunately, both the ending as a letter
стрелковый –
-ый -и sequence and the location of the morpheme boundary are
стрелкови
quite ambiguous in both languages. Thus, we need to
-нной -нен стенной – стенен lemmatize the text, i.e., convert the word to its main form,
-ной -ен родной – роден the lemma. If every member of the pair of candidate
-ой -и деловой – делови cognates from L1 and L2 is represented by a wordform
(WF) and its lemma (L), then we could compare: L1 with
Table 1: Transforming Russian adjectives to Bulgarian. L2, WF1 with WF2, L1 with WF2 and WF1 with L2.
Considering these four options, we can get a better
For verbs, there are some regularities in the correspon-
estimation for the similarity not only between close
dences of the endings of the Russian infinitive and the
wordforms like the Bulgarian отправената and the Rus-
Bulgarian verb’s main form in first person singular. Table
sian отправленному, which look different orthographi-
2 below shows some examples.
cally, but have very close lemmata, but also between such
4. very different words like the Bulgarian къпейки w(а, е)=0.7; w(а, и)=0.8; w(а, о)=0.7; w(а, у)=0.6;
(‘bathing’, a gerund) and the Russian копейки (‘copeck’, а
w(а, ъ)=0.5; w(а, ю)=0.8; w(а, я)=0.5
plural feminine noun).
The lemmatization of the Bulgarian and the Russian б w(б, в)=0.8; w(б, п)=0.6
words can be done using specialized dictionaries. In the в w(в, ф)=0.6
present work, we will use two large grammatical dictiona-
ries that contain words, their lemmata, and some г w(г, х)=0.5
grammatical information. д w(д, т)=0.6
2.4. Transformation Weights w(е, и)=0.6; w(е, о)=0.7; w(е, у)=0.8; w(е, ъ)=0.5;
е
w(е, ю)=0.8; w(е, я)=0.5
Let us now come back to the transliteration rules and to
the next steps in our algorithm. There are orthographical ж w(ж, з)=0.8; w(ж, ш)=0.6
correspondences between candidate cognates that are not з w(з, с)=0.5
as undisputable as the general rules, but are still observed
w(и, й)=0.6; w(и, о)=0.8; w(и, у)=0.8; w(и, ъ)=0.8;
in the development of the languages, at least for ones with и
a proven etymological basis. As was shown above, the re- w(и, ю)=0.7; w(и, я)=0.7
gular correspondences between the languages can be due й w(й, ю)=0.7; w(й, я)=0.7
to phonetic and spelling reasons. Besides the uncon-
ditional letters transitions described above, not so regular к w(к, т)=0.8; w(к, х)=0.6
ones occur in several cases, and their existence can be м w(м, н)=0.7
taken into account when constructing the weight scale for
w(о, у)=0.6; w(о, ъ)=0.8; w(о, ю)=0.7;
measuring similarity. о
A general principle when building a weight scale is w(о, я)=0.8
that the correspondences between letters denoting conso- п w(п, ф)=0.8; w(п, х)=0.9
nants and vowels (hereinafter ‘vowels’ and ‘consonants’
only) should be measured separately. The maximal с w(с, ц)=0.6; w(с, ш)=0.9
ortographic distance between different letters is 1 (as for т w(т, ф)=0.8; w(т, х)=0.9; w(т, ц)=0.9
а-ц) and the maximal similarity has weight 0 (as for а-а).
All weight values between 0 and 1 are assigned to letter у w(у, ъ)=0.5; w(у, ю)=0.6; w(у, я)=0.8
correspondences that exist in a non-regular way in some ф w(ф, ц)=0.8
cognates (the above-mentioned correspondence у-ъ was
х w(х, ш)=0.9
due to etymological reasons). Another general admission
is that consonants and vowels with similar sequences of ц w(ц, ч)=0.8
distinctive phonetic features (differing only in the place
ч w(ч, ш)=0.9
of articulation or in the presence/absence of voice, e.g., б-
в, б-п) have lower weight distance. The same is valid for ъ w(ъ, ю)=0.8; w(ъ, я)=0.8
the pair of letters denoting a regular phonetic change, ю w(ю, я)=0.8
e.g., reduction (as in а-ъ, о-у) or softening of the prece-
ding consonant (as in у-ю, а-я). Regular correspondences Table 3– Letter substitution weights.
observed in a limited lexical sector (e.g., borrowed from
Latin and Greek) such as г-х also have a lower distance. 3. The MMEDR Algorithm
Table 3 shows the letter transformation weights,
which can be used to measure the orthographic similarity The MMEDR algorithm (modified minimum edit distance
after the Bulgarian and Russian words have been transli- ratio) measures the orthographic similarity between a pair
terated to a subset of the Cyrillic alphabet. of Bulgarian and Russian words using some general
The weights w(a, b) are used to transform the letter a phonetic and morphologically conditioned
into the letter b and vice versa. This weight function w is correspondences between the letters of the two languages
symmetric by definition, i.e., w(a, b) = w(b, a). All other in order to estimate the extent to which the two words
weights not given in Table 3 are equal to 1. would be perceived as similar by people fluent in both
In order to write the Russian words in the modified languages. It returns a value between 0 and 1, where
Bulgarian alphabet used in Table 3, we make the follow- values close to 1 express very high similarity, while 0 is
ing preliminary transformations for all Russian words: returned for completely dissimilar words. The algorithm
э → е; ы → и; ь → (empty letter); ъ → (empty letter) has been tailored for Bulgarian and Russian and thus is
not directly applicable to other pairs of languages. Howe-
Table 3 shapes the match between letters and the so- ver, the general approach can be easily adapted to other
unds they denote in Bulgarian and Russian. It further cor- languages: all that has to be changed are the rules descri-
relates weights for letter transformation that have been bing the phonetic and the morphological correspondences.
phonetically justified.
The MMEDR algorithm in steps:
5. 1. Lemmatize the Bulgarian word. Of course, there are many exceptions for the above
rules, but our experiments show that using each of them
2. Lemmatize the Russian word.
has more positive than negative effect. Initially, we tried
3. Transform the Russian word’s ending. using few more additional rules, which were subsequently
removed since they were found to be harmful.
4. Transliterate the Russian word.
5. Remove some double consonants in the Russian 3.3. Removing Double Consonants
word.
According to 1.1.3, the following substitution rules are
6. Calculate the modified Levenshtein distance using
applied for the Russian word:
suitable weights for letter substitutions.
бб → б; жж → ж; кк → к; лл → л; мм → м; пп →
7. Normalize and calculate the MMEDR value.
п; рр → р; сс → с; тт → т; фф → ф
The algorithm first tries to rewrite the Russian word
following Bulgarian letter constructions. As a result, both 3.4. Calculating the Modified Levenshtein
words are transformed into a special intermediate form Distance with Weights for Letter Substitution
and then are compared orthographically using Leven-
shtein distance with suitable weights for individual letter Given two words, the Levenshtein distance [Levenshtein,
substitutions. The above general algorithm is run in eight 1965], also known as the minimum edit distance (MED),
variants with each of steps 1, 2 and 3 being included or is defined as the minimum total number of single-letter
excluded, and the largest of the eight resulting values is substitutions, deletions and/or insertions necessary to
returned. A description of each step follows below. convert the first word into the second one. We use a
modification, which we call modified minimum edit
3.1. Lemmatizing Bulgarian and Russian distance (MMED), where the weights of all insertions and
Words deletions are fixed to 1, and the weights for single-letter
The Bulgarian word is lemmatized using a grammatical substitution are as given in Table 3.
dictionary of Bulgarian as described in Section 1.3. If the
dictionary contains no lemmata for the target word, the 3.5. Calculating MMEDR Value
original word is returned; if it contains more than one
lemma, we try using each of them in turn and we choose At this step, we calculate MMEDR value by normalizing
the one yielding the highest value in the MMEDR MMED – we divide it by the length of the longer word
algorithm. The Russian word is lemmatized in the same (the length is calculated after all transformations have
way, using a grammatical dictionary of Russian. been made in the previous steps). We use the following
formula:
3.2. Transforming the Russian Ending
MMED( wbg , wru )
At this step, we transform the endings of the Russian MMEDR( wbg , wru ) = 1 −
max( wbg , wru )
word according to Tables 1 and 2 and we remove the
agglutinative suffix ся:
3.6. Calculating the Final Result
нный → нен; ный → ен; нний → нен; ний → ен; ий
→ и; ый → и; нной → нен; ной → ен; ой → и; ский The final result is given by the maximum of the obtained
→ ски; ься → ь; овать → ам; ить → я; ять → я; values for all eight variants of the MMEDR algorithm –
ать → ам; уть → а; еть → ея with/without lemmatization of the Bulgarian word,
with/without lemmatization of the Russian word, and
The substitutions rules are applied only if the left hand- with/without transformation of the Russian word ending.
side letter sequences are at the end of the word. Rules are Note also, that lemmatization steps might result in
applied in the given order; multiple rule applications are calculating additional values for MMEDR – one for each
allowed. Note that we do not have rules for all possible possible lemma of the Russian/Bulgarian word.
endings in Russian, but only for the typical ones – object
of transformation for adjectives and verbs.
3.7. Example
Since all words have been already lemmatized in the As we will see below, the proposed MMEDR algorithm
previous step (if applied), verbs are assumed to be in yields significant improvements over classic orthographic
infinitive and adjectives in singular masculine form. similarity measures like LCSR (longest common
Adjective endings are transformed to their respective subsequence ratio, defined as the longest common letter
Bulgarian counter-parts, and reflexive verbs are turned subsequence, normalized by the length of the longer word
into non-reflexive. Nouns are not considered since they [Melamed, 1999]) and MEDR (minimum edit distance
generally have the same endings in the two languages ratio, defined as the Levenshtein distance with all weights
(after having been lemmatized) and thus need no set to 1, normalized by the length of the longer word, also
additional transformations. known as normalized edit distance /NED/ [Marzal &
6. Vidal, 1993]). This is due to the above-described steps Bulga- Rus-
which turn the Russian word into a Bulgarian-sounding # rian sian MMEDR Sim Precision Recall
one and the application of letter substitution weights that word word
reflect the closeness of the corresponding phonemes. 1 беляев беляев 1.0000 Yes 100.00% 0.68%
Let us consider for example the Bulgarian word 2 на на 1.0000 Yes 100.00% 1.37%
афектирахме and the Russian word аффектировались.
3 глава глава 1.0000 Yes 100.00% 2.05%
Using the classic Levenshtein distance, we obtain the
following: MED(афектирахме, аффектировались) = 7. канди- кан-
4 1.0000 Yes 100.00% 2.74%
дат дидат
And after normalization: MEDR=1–(7/15) = 8/15 ≈ 53%.
In contrast, with the MMEDR algorithm, we first 5 за за 1.0000 Yes 100.00% 3.42%
lemmatize the two words, thus obtaining афектирам and напо- напо-
6 1.0000 Yes 100.00% 4.11%
аффектировать respectively. We then replace the леон леоны
double Russian consonant -фф- by -ф- and the Russian 7 не не 1.0000 Yes 100.00% 4.79%
ending -овать by the first singular Bulgarian verb ending 8 ми нас 1.0000 No 87.50% 4.79%
-ам. We thus obtain the intermediate forms афектирам 9 ми мой 1.0000 Yes 88.89% 5.48%
and афектирам, which are identical, and MMEDR = 10 ми мы 1.0000 Yes 90.00% 6.16%
100%. Note that some pairs of words like афектирахме ... ... ... ... ... ... ...
and аффектировались could be neither orthographically
четвър-чет-
nor phonetically close but could be perceived as similar 93 0.9375 Yes 94.57% 59.59%
тият вертым
due to cross-lingual correspondences that are obvious to
оста-
people speaking both languages. 94 оставят 0.9286 Yes 94.62% 60.27%
ется
Let us take another example – with the Bulgarian
... ... ... ... ... ... ...
word избягам and the Russian word отбегать (both
meaning ‘to run out’), which sound similarly. Using 39998 са в 0.0000 No 0.37% 100%
Levenshtein distance: MED(избягам,отбегать) = 5 and 39999 са к 0.0000 No 0.37% 100%
thus MEDR = 1 – (5/8) = 3/8 = 37.5%. In contrast, with боядис-
40000 к 0.0000 No 0.37% 100%
the MMEDR algorithm, we first transform отбегать to вали
its intermediate form отбегам and we then calculate
Table 4 – Results of the MMEDR algorithm.
MMED(избягам, отбегам) = 0.8 + 1 + 0.5 = 2.3 and
MMEDR = 1 – (2.3/7) = 47/70 ≈ 67%, which is a much
better reflection of the similarity between the two words. 4.2. Grammatical Resources
Thus, we can conclude that, at least in the above two We used two monolingual dictionaries for lemmatization:
examples, the traditional MEDR does not work well for
the highly inflectional Bulgarian and Russian. MEDR is • A grammatical dictionary of Bulgarian, created at
based on the classic Levenshtein distance, which uses the the Linguistic Modeling Department, Institute for
same weight for all letter substitution, and thus cannot Parallel Processing, Bulgarian Academy of Sciences
distinguish small phonetic changes like replacing я with е [Paskaleva, 2002]. This electronic dictionary con-
(two phonetically very close vowels) from more tained 963,339 wordforms and 73,113 lemmata.
significant differences like replacing я with г (a vowel Each dictionary entry consisted of a wordform, a
and a consonant that are quite different). corresponding lemma, and some morphological and
grammatical information.
4. Experiments and Evaluation • A grammatical dictionary of Russian, created at
We performed several experiments in order to assess the the Institute of Russian language, Russian Academy
accuracy of the proposed MMEDR algorithm for of Sciences, based on the Grammatical Dictionary of
measuring the similarity between Bulgarian and Russian A. Zaliznyak [Zaliznyak, 1977]. The dictionary
words in a literary text. consisted of 1,390,613 wordforms and 66,101 lem-
mata. Each dictionary entry consisted of a word-
4.1. Textual Resources form, a corresponding lemma, and some morpholo-
gical and grammatical information.
We used the Russian novel The Lord of the World
(Властелин мира) by Alexander Belyayev [Belayayev,
1940a] and its Bulgarian translation by Assen Trayanov 4.3. Experimental Setup
[Belayayev, 1940b] as our test data. We extracted the first We measured the similarity between all 200x200=40,000
200 different Bulgarian words and the first 200 different Bulgarian-Russian pairs of words. Among them, 163
Russian words that occur in the novel, and we measured pairs were annotated as very similar by a linguist who
the similarity between them. was fluent in Russian and a native speaker of Bulgarian;
the remaining 39,837 were considered unrelated.
We used the MMEDR algorithm to rank the 40,000
pairs of words in decreasing order according to the
7. calculated similarity values. Ideally, the 163 pairs followed by a voiceless consonant. Thus, the
designated by the linguist would be ranked at the top. We substiution weight for з → с should probably be
can determine how well the ranking produced by our higher than for c → з. We could further extend the
algorithm does using standard measures from information rules to take into account the local context, e.g.,
retrieval, e.g. 11-point interpolated average precision changing раз- to рас- could have a different weight
[Manning et al., 2008]. than changing -з- то -с- in general.
We compared the MMEDR algorithm with two classic
• Another potential problem comes from us using only
orthographic similarity measures: LCSR and MEDR.
one linguist for the annotation, which might have
Unfortunately, we could not directly compare our results
yielded biased judgments. To assess the impact of
to those in other work, since there were no previous
the potential subjectivity, we would need judgments
publications measuring orthographic or phonetic similari-
by at least one additional linguist.
ty between words in Bulgarian and Russian.
4.4. Results 6. Related Work
Table 4 shows part of the ranking produced by the
Many algorithms have been proposed in the literature for
MMEDR algorithm. The table shows an excerpt of the
measuring the orthographic and the phonetic similarity
ranked pairs of words along with their similarity
between pairs of words from different languages.
calculated by the MMEDR algorithm, the corresponding
The simplest ones considered as orthographically
human annotation for similarity (the column "Sim"), as
close words with identical prefixes [Simard & al., 1992].
well as precision and recall calculated for all rows from
Much more popular have been orthographic similarity
the beginning to the current row.
measures based on normalized versions of the Levensh-
Table 5 shows the 11-pt interpolated average tein distance [Levenshtein, 1965], the longest common
precision for LCSR, MEDR and MMEDR. We can see subsequence [Melamed, 1999], and the Dice coefficient
that MMEDR outperforms the other two similarity [Brew and McKelvie, 1996].
measures by a large margin: 18-22% absolute difference. Somewhat less common have been phonetic similarity
measures, which compare sounds instead of letter sequen-
ces. Such an approach has been proposed for the first
11-pt interpolated
Algorithm time by [Russel, 1918]. Guy [1994] described an
average precision
algorithm for cognate identification in bilingual word lists
LCSR 69.06% based on statistics of common sound correspondences.
MEDR 72.30% Algorithms that learn the typical sound correspondences
MMEDR 90.58% between two languages automatically have also been
proposed: [Kondrak, 2000], [Kondrak, 2003] and
Table 5 – Comparison of the similarity measuring algorithms.
[Kondrak & Dorr, 2004].
Instead of applying similarity measures for symbolic
5. Discussion strings on the words directly, some researchers have first
performed transformations that reflect the typical cross-
As Tables 4 and 5 show, the MMEDR algorithm works lingual orthographic and phonetic correspondences bet-
quite well. Still, there is a lot of room for improvement: ween the target languages. This is especially important
for language pairs where some letters in the source
• Bulgarian and Russian inflectional morphologies are language are systematically substituted by other letters in
quite complex, with many exceptions that are not the target language. The idea can be extended further with
captured by our rules. This is probably a limitation substitutions of whole syllables, prefixes and suffixes.
of the general approach rather than a deficiency of For example, Koehn & Knight [2002] proposed manually
the particular rules used: if we are to capture all constructed transformation rules from German to English
exceptions, we would need to manually specify (e.g., the letters k and z are changed to c; and the ending -
them all, which would require a lot of extra manual tät is changed to -ty) in order to expand lists of
work. automatically extracted cognates.
Finally, orthographic measures like LCSR and MEDR
• The transformation rules between Bulgarian and
have gradually evolved over the years, enriched by
Russian are sometimes imprecise as well, e.g., for
machine learning techniques that automatically identify
very short words or for words of foreign origin.
templates for cross-lingual orthographic and phonetic
• While linguistically motivated, the letter-for-letter correspondences. For example, Tiedemann [1999] learned
substitution weights we used are ad hoc, and could spelling transformations from English to Swedish, while
be improved. First, while we used symmetric letter Mulloni & Pekar [2006] and Mitkov & al. [2007] learned
substitution weight in Table 3, asymmetric weights transformation templates, which represent substitutions of
might work better, e.g. the Bulgarian prefixes раз- letters sequences in one language with letter sequences in
and из- are spelled as рас- and ис- in Russian when another language.
8. 7. Conclusions and Future Work NAACL/ANLP 2000: 1st conference of the North American
Chapter of the Association for Computational Linguistics
We have described and tested a novel algorithm for mea- and 6th Conference on Applied Natural Language
suring the similarity between pairs of words based on Processing, pages 288-295, Seattle, WA, USA, 2000
transformation rules between Bulgarian and Russian. The
[Kondrak, 2003] Kondrak G. "Identifying Complex Sound
algorithm has shown very high precision and could be
Correspondences in Bilingual Wordlists". Proceedings of the
used to identify possible candidates for cognates or false
4th International Conference on Computational Linguistics
friends in text corpora. It can also be used in machine and Intelligent Text Processing (CICLING 2003), pages 432-
translation systems working on related languages where it 443, Mexico City, Mexico, 2003
could help overcome the incompleteness of translation
dictionaries used in the system. [Levenshtein, 1965] Levenshtein V. "Binary Codes Capable
There are many ways in which we could improve the of Correcting Deletions, Insertions and Reversals". Doklady
proposed algorithm. For example, we could adapt the al- Akademii Nauk SSSR, Volume 163 (4), pages 845-848,
gorithms described in [Mitkov et al., 2007] and [Bergsma Moscow, Russia, 1965
& Kondrak, 2007] to Bulgarian and Russian and try to
learn cross-lingual transformation rules for morphemes [Manning et al., 2008] Manning C., Prabhakar R. and
Schütze H. "Introduction to Information Retrieval".
and other sub-word sequences automatically. We could
Cambridge University Press, ISBN 0521865719, New York,
then try to combine MMEDR with such rules.
USA, 2008
Acknowledgments [Marzal & Vidal, 1993] Marzal A., Vidal E. "Computation of
Normalized Edit Distance and Applications". IEEE
The presented research is supported by the project FP7- Transactions on Pattern Analysis and Machine Intelligence,
REGPOT-2007-1 SISTER. Volume 15, Issue 9, pages 926-932, USA, 1993
[Melamed, 1999] Melamed D. "Bitext Maps and Alignment
8. References via Pattern Recognition". Computational Linguistics,
Volume 25 (1), pages 107-130, ISSN:0891-2017, 1999
[Belyayev, 1940a] Belyaev A. "Lord of the World" (1940),
in Russian, publisher "Onyx 21 Century", 2005, ISBN 5-329- [Mitkov et al., 2007] Mitkov R., Pekar V., Blagoev D. and
01356-9, http://lib.ru/RUFANT/BELAEW/lordwrld.txt Mulloni A. "Methods for Extracting and Classifying Pairs of
Cognates and False Friends". Machine Translation, Volume
[Belyayev, 1940b] Belyaev A. "Lord of the World" (1940), 21, Issue 1, pages 29-53, Springer Netherlands, 2007
translation from Russian to Bulgarian by A. Trayanov,
publisher "National Youth", 1977, [Mulloni & Pekar, 2006] Mulloni A. and Pekar V.
http://www.chitanka.info/lib/text/2130 "Automatic Detection of Orthographic Cues for Cognate
Recognition". Proceedings of the 5th International
[Bergsma & Kondrak, 2007] Bergsma S., Kondrak G. Conference on Language Resources and Evaluation (LREC-
"Alignment-Based Discriminative String Similarity". Procee- 06), pages 2387–2390, Genoa, Italy, 2006.
dings of the 45th Annual Meeting of the ACL, pages 656–
663, Prague, Czech Republic, 2007 [Paskaleva, 2002] Paskaleva E. “Processing Bulgarian and
Russian Resources in Unified Format”. Proceedings of the
[Brew and McKelvie, 1996] Brew C. and McKelvie D. 8th International Scientific Symposium MAPRIAL, pages
"Word-Pair Extraction for Lexicography". Proceedings of 185-194, Veliko Tarnovo, Bulgaria, 2002.
the 2nd International Conference on New Methods in
Language Processing, pages 45–55, Ankara, Turkey, 1996 [Russel, 1918] Russel R. "U.S. Patent 1,261,167", Pittsburgh,
PA, USA, 1918
[Guy, 1994] Guy J. "An Algorithm for Identifying Cognates
in Bilingual Wordlists and Its Applicability to Machine [Simard et al., 1992] Simard M., Foster G., Isabelle P.
Translation", Journal of Quantitative Linguistics, Volume 1 "Using Cognates to Align Sentences in Bilingual Corpora".
(1), pages 35-42, 1994 Proceedings of the 4th International Conference on
Theoretical and Methodological Issues in Machine
[Koehn & Knight, 2002] Koehn P., Knight K. "Learning a Translation, pages 67–81, Montreal, Canada, 1992
Translation Lexicon from Monolingual Corpora". In Procee-
dings of the Workshop of the ACL Special Interest Group on [Tiedemann, 1999] Tiedemann J. "Automatic Construction
the Lexicon (SIGLEX), pages 9-16, Philadelphia, PA, 2002. of Weighted String Similarity Measures". SIGDAT
Conference on Empirical Methods in Natural Language
[Kondrak and Dorr, 2004] Kondrak G., Dorr B. Processing and Very Large Corpora, pages 213-219, College
"Identification of Confusable Drug Names: A New Approach Park, MD, USA, 1999
and Evaluation Methodology". Proceedings of COLING
2004, pages 952–958, Geneva, Switzerland, 2004 [Zaliznyak, 1977] Zaliznyak A. "Grammatical Dictionary of
the Russian Language", publisher "Russian Language",
[Kondrak, 2000] Kondrak G. "A New Algorithm for the Moscow, Russia, 1977
Alignment of Phonetic Sequences". Proceedings of