SlideShare a Scribd company logo
1 of 26
Initiative for the
development of open
Vuk Batanović, PhD, ETF Belgrade Innovation Center
Tanja Samardžić, PhD, University of Zürich
Slobodan Marković, UNDP Serbia
NLP/NLU resources and tools
for the Serbian language
In recent years, large language models
have proven success in natural language
processing and understanding (NLP/NLU)
ChatGPT and GPT-4 have made considerable strides in natural
language processing and understanding.
The key to the success of these models is not only the vast volume
of text used for self-supervised training, but also the availability of
high-quality datasets for supervised fine-tuning for a wide range of
NLP/NLU tasks and linguistic domains.
The current "AI revolution" would not be possible without multi-year
public and corporate investments in high-quality datasets, which
are currently available predominantly for the English language.
How well is the Serbian language supported?
Getting better, but...
Even when large language models "speak Serbian" (which is not often the case),
they are not substantially fine-tuned for Serbian market, limiting their practical
and business application.
• When working with Cyrillic and Latin, they perform poorer and cost more
• They sometimes mix ekavian and iekavian pronunciation in their responses
or give answers in similar languages (Croatian, Slovenian, and Macedonian)
• They give worse answers in situations that go beyond the scope of everyday
conversational language (specific language domains)
• They give incorrect or undesirable answers in the context of Serbian culture
• They have limited possibilities for expressing Serbian language (for example,
generating speech with varied emotions and support for local dialects)
• Their use is often not viable in business applications that handle large
amounts of text, require rapid response, guarantee data confidentiality, etc.
What could be better?
A greater proportion of Serbian text in training corpora
More high-quality datasets for fine-tuning models for different
language domains and NLP/NLU tasks
More datasets for model evaluation
+
Ideally, as much as possible should be freely available to the public
under a permissive license, and should cover both ekavian and
iekavian pronunciations of Serbian
Small language communities, like ours,
need to invest in language technologies
Estonia
language community: 1.16 million
Israel
language community: 6.1 million
Denmark
language community: 5.5 million
Iceland
language community: 0.33 million
Slovenia
language community: 2.5 million
Serbia?
Present situation
Global IT giants
They have little interest in developing support for the Serbian language because we are a small market
and a low priority. When they do offer something, it will be on commercial and restricted terms.
Academic community
The volume and scope of academic research in the field of NLP/NLU for Serbian is insufficient.
Furthermore, these are typically carried out in the domain of basic research rather than being applied
in the industry.
Serbian companies and start-ups
In theory, they are interested in meeting local market demands. However, significant upfront
investments in high-quality datasets and model development are difficult to justify in the context of
the small and low-income Serbian/regional market, resulting in a slow return on investment.
Government
In recent years, AI has been one of the government's top priorities, and significant progress has been
made. Support for the development of language technologies is insufficient.
Impact
There are very few Serbian NLP/NLU software products
The endangered status of the Serbian language in the digital age
Instead of having a reliable and easily accessible foundation,
Serbian IT companies and start-ups waste time and money
integrating disparate solutions and “reinventing the wheel”,
i.e. re-creating basic tools and data sets
Time goes by... Our kids are already conversing in English
with digital assistants, and the gap will only grow wider
For example, in virtual/augmented reality, voice (converted to text)
will be the primary mode of user-computer interaction
What should we do?
Initiative for the development of open NLP resources for Serbian
September 1, 2021
Vuk Batanović, PhD
ETF Belgrade Innovation Center
Tanja Samardžić, PhD
University of Zürich
Slobodan Marković
UNDP Serbia
Initiative goals
1. Create a basic set of NLP/NLU resources for
the Serbian language that are publicly and easily
accessible, under a license that permits them to
be used for any purpose (including commercial)
2. Gather and coordinate the local community
(IT industry, academic community, government)
that will contribute to the project’s implementation
by donating material resources, expertise, and
intellectual property
What do we aim to produce?*
Priority resources and tools for:
1. Improved text search, including named entities
2. Improved text understanding (recognizing
semantic similarity and generating answers to
questions)
+ all the above for ekavian and iekavian
pronunciations of Serbian
3. Creation of educational materials for software
engineers to learn how to implement NLP/NLU
for Serbian
Labeled datasets
Fine-tuned models
* after consultations with more than 40 organizations of the local IT community
What do we get?
Greater flexibility and independence – we may use produced datasets for
training, fine-tuning, and evaluation of both closed (commercially available)
and open-source models
Lower individual investments and higher quality – instead of everyone
starting from scratch, everyone gets a reliable and high-quality foundation
from which to build, while retaining their competitive edge (because the
basic model is insufficient, each solution requires additional adaptation to
the user's needs/data, integration into business processes, continuous
support, and so on)
Faster development of high-quality NLP/NLU solutions with Serbian support
– by internal corporate IT teams, Serbian IT companies, and start-ups (which
are currently virtually non-existent in this field)
The project is being implemented by a consortium of
Initial financial and other assistance agreements were signed with
What are we doing this year, and how far have we come?
1.
Selection of texts to
cover the language
domain
January – March
2.
Automated processing
using existing tools:
tokenization,
lemmatization, word types
April – May
3.
Pronunciation
conversion: ekavian
and iekavian variants
May – June
4.
Manual check and
correction of the
initial automated
processing
May – September
5.
Evaluation of the
existing models
October – November
8.
Results publication
and preparation for a
new project
December – January
6.
Evaluation of models
fine-tuned on the new
dataset
October – November
7.
Transition to business
applications
December – January
This is only the beginning.
We have broken new ground,
but there’s still much work ahead
Therefore…
If you have developed or plan to develop an NLP/NLU solution for the
Serbian language (and its variants spoken in Serbia, Montenegro,
Bosnia and Herzegovina)
If automated text processing can benefit your (e-)business (in terms
of better search, better recommendations, better customer support...)
If you want to position your organization as socially responsible and
willing to help the Serbian IT market and local tech community grow
If you want to contribute to the preservation of the Serbian language
in the digital age
Slobodan Marković
slobodan.markovic@undp.org
+381 63 387 260

More Related Content

Similar to [DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx

A REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIA
A REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIAA REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIA
A REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIAJoe Osborn
 
Multilingualism for Digital Europe
Multilingualism for Digital EuropeMultilingualism for Digital Europe
Multilingualism for Digital EuropeGeorg Rehm
 
NLP Meetup 2023
NLP Meetup 2023NLP Meetup 2023
NLP Meetup 2023GabiMaeztu
 
Human Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeHuman Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeGeorg Rehm
 
Transition from Education to Employment: creating meaningful multilingualism ...
Transition from Education to Employment: creating meaningful multilingualism ...Transition from Education to Employment: creating meaningful multilingualism ...
Transition from Education to Employment: creating meaningful multilingualism ...Alan Bruce
 
Design and Implementation of a Language Assistant for English – Arabic Texts
Design and Implementation of a Language Assistant for English – Arabic TextsDesign and Implementation of a Language Assistant for English – Arabic Texts
Design and Implementation of a Language Assistant for English – Arabic TextsIJCSIS Research Publications
 
Innovative Open Educational Resources in European Higher Education – Status ...
Innovative Open Educational Resources in European Higher Education –  Status ...Innovative Open Educational Resources in European Higher Education –  Status ...
Innovative Open Educational Resources in European Higher Education – Status ...FernUniversität in Hagen
 
AI, don't f$# up my name.pdf
AI, don't f$# up my name.pdfAI, don't f$# up my name.pdf
AI, don't f$# up my name.pdfMarcis Pinnis
 
Apps for mobile learning: from theory to real world application
Apps for mobile  learning: from theory to real world  applicationApps for mobile  learning: from theory to real world  application
Apps for mobile learning: from theory to real world applicationLearnAhead
 
Designing a task-based curriculum
Designing a task-based curriculumDesigning a task-based curriculum
Designing a task-based curriculumJoost Elshoff
 
Handbook of good teaching practices
Handbook of good teaching practicesHandbook of good teaching practices
Handbook of good teaching practicesszabjass
 
Celtic language technologies in the digital age
Celtic language technologies in the digital ageCeltic language technologies in the digital age
Celtic language technologies in the digital agetechiaith
 
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...Georg Rehm
 
Analysing the forum discussion
Analysing the forum discussionAnalysing the forum discussion
Analysing the forum discussionJimmy Castro
 
Learning, communication and teachers using technology in ELT
Learning, communication and teachers using technology in ELTLearning, communication and teachers using technology in ELT
Learning, communication and teachers using technology in ELTYamith José Fandiño Parra
 

Similar to [DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx (20)

A REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIA
A REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIAA REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIA
A REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIA
 
Multilingualism for Digital Europe
Multilingualism for Digital EuropeMultilingualism for Digital Europe
Multilingualism for Digital Europe
 
Lingua2Go
Lingua2GoLingua2Go
Lingua2Go
 
Dissemination Strategy Plan
Dissemination Strategy PlanDissemination Strategy Plan
Dissemination Strategy Plan
 
NLP Meetup 2023
NLP Meetup 2023NLP Meetup 2023
NLP Meetup 2023
 
Human Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeHuman Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual Europe
 
Transition from Education to Employment: creating meaningful multilingualism ...
Transition from Education to Employment: creating meaningful multilingualism ...Transition from Education to Employment: creating meaningful multilingualism ...
Transition from Education to Employment: creating meaningful multilingualism ...
 
Design and Implementation of a Language Assistant for English – Arabic Texts
Design and Implementation of a Language Assistant for English – Arabic TextsDesign and Implementation of a Language Assistant for English – Arabic Texts
Design and Implementation of a Language Assistant for English – Arabic Texts
 
Innovative Open Educational Resources in European Higher Education – Status ...
Innovative Open Educational Resources in European Higher Education –  Status ...Innovative Open Educational Resources in European Higher Education –  Status ...
Innovative Open Educational Resources in European Higher Education – Status ...
 
AI, don't f$# up my name.pdf
AI, don't f$# up my name.pdfAI, don't f$# up my name.pdf
AI, don't f$# up my name.pdf
 
Apps for mobile learning: from theory to real world application
Apps for mobile  learning: from theory to real world  applicationApps for mobile  learning: from theory to real world  application
Apps for mobile learning: from theory to real world application
 
Icwl2015 wahl
Icwl2015 wahlIcwl2015 wahl
Icwl2015 wahl
 
Designing a task-based curriculum
Designing a task-based curriculumDesigning a task-based curriculum
Designing a task-based curriculum
 
Handbook of good teaching practices
Handbook of good teaching practicesHandbook of good teaching practices
Handbook of good teaching practices
 
K33050053
K33050053K33050053
K33050053
 
Celtic language technologies in the digital age
Celtic language technologies in the digital ageCeltic language technologies in the digital age
Celtic language technologies in the digital age
 
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
 
Analysing the forum discussion
Analysing the forum discussionAnalysing the forum discussion
Analysing the forum discussion
 
OWN-PT: Taking Stock
OWN-PT: Taking Stock OWN-PT: Taking Stock
OWN-PT: Taking Stock
 
Learning, communication and teachers using technology in ELT
Learning, communication and teachers using technology in ELTLearning, communication and teachers using technology in ELT
Learning, communication and teachers using technology in ELT
 

More from DataScienceConferenc1

[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdfDataScienceConferenc1
 
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...DataScienceConferenc1
 
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdfDataScienceConferenc1
 
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdfDataScienceConferenc1
 
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdfDataScienceConferenc1
 
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptxDataScienceConferenc1
 
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdfDataScienceConferenc1
 
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...DataScienceConferenc1
 
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdfDataScienceConferenc1
 
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...DataScienceConferenc1
 
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...DataScienceConferenc1
 
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdfDataScienceConferenc1
 
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptxDataScienceConferenc1
 
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...DataScienceConferenc1
 
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptxDataScienceConferenc1
 
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...DataScienceConferenc1
 
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...DataScienceConferenc1
 
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptxDataScienceConferenc1
 
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptxDataScienceConferenc1
 
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdfDataScienceConferenc1
 

More from DataScienceConferenc1 (20)

[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
 
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
 
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
 
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
 
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
 
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
 
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
 
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
 
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
 
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
 
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
 
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
 
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
 
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
 
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
 
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
 
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
 
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
 
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
 
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
 

Recently uploaded

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 

Recently uploaded (20)

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 

[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx

  • 1. Initiative for the development of open Vuk Batanović, PhD, ETF Belgrade Innovation Center Tanja Samardžić, PhD, University of Zürich Slobodan Marković, UNDP Serbia NLP/NLU resources and tools for the Serbian language
  • 2. In recent years, large language models have proven success in natural language processing and understanding (NLP/NLU)
  • 3. ChatGPT and GPT-4 have made considerable strides in natural language processing and understanding. The key to the success of these models is not only the vast volume of text used for self-supervised training, but also the availability of high-quality datasets for supervised fine-tuning for a wide range of NLP/NLU tasks and linguistic domains. The current "AI revolution" would not be possible without multi-year public and corporate investments in high-quality datasets, which are currently available predominantly for the English language.
  • 4. How well is the Serbian language supported?
  • 5. Getting better, but... Even when large language models "speak Serbian" (which is not often the case), they are not substantially fine-tuned for Serbian market, limiting their practical and business application. • When working with Cyrillic and Latin, they perform poorer and cost more • They sometimes mix ekavian and iekavian pronunciation in their responses or give answers in similar languages (Croatian, Slovenian, and Macedonian) • They give worse answers in situations that go beyond the scope of everyday conversational language (specific language domains) • They give incorrect or undesirable answers in the context of Serbian culture • They have limited possibilities for expressing Serbian language (for example, generating speech with varied emotions and support for local dialects) • Their use is often not viable in business applications that handle large amounts of text, require rapid response, guarantee data confidentiality, etc.
  • 6. What could be better? A greater proportion of Serbian text in training corpora More high-quality datasets for fine-tuning models for different language domains and NLP/NLU tasks More datasets for model evaluation + Ideally, as much as possible should be freely available to the public under a permissive license, and should cover both ekavian and iekavian pronunciations of Serbian
  • 7. Small language communities, like ours, need to invest in language technologies
  • 14. Present situation Global IT giants They have little interest in developing support for the Serbian language because we are a small market and a low priority. When they do offer something, it will be on commercial and restricted terms. Academic community The volume and scope of academic research in the field of NLP/NLU for Serbian is insufficient. Furthermore, these are typically carried out in the domain of basic research rather than being applied in the industry. Serbian companies and start-ups In theory, they are interested in meeting local market demands. However, significant upfront investments in high-quality datasets and model development are difficult to justify in the context of the small and low-income Serbian/regional market, resulting in a slow return on investment. Government In recent years, AI has been one of the government's top priorities, and significant progress has been made. Support for the development of language technologies is insufficient.
  • 15. Impact There are very few Serbian NLP/NLU software products The endangered status of the Serbian language in the digital age Instead of having a reliable and easily accessible foundation, Serbian IT companies and start-ups waste time and money integrating disparate solutions and “reinventing the wheel”, i.e. re-creating basic tools and data sets
  • 16. Time goes by... Our kids are already conversing in English with digital assistants, and the gap will only grow wider For example, in virtual/augmented reality, voice (converted to text) will be the primary mode of user-computer interaction
  • 18. Initiative for the development of open NLP resources for Serbian September 1, 2021 Vuk Batanović, PhD ETF Belgrade Innovation Center Tanja Samardžić, PhD University of Zürich Slobodan Marković UNDP Serbia
  • 19. Initiative goals 1. Create a basic set of NLP/NLU resources for the Serbian language that are publicly and easily accessible, under a license that permits them to be used for any purpose (including commercial) 2. Gather and coordinate the local community (IT industry, academic community, government) that will contribute to the project’s implementation by donating material resources, expertise, and intellectual property
  • 20. What do we aim to produce?* Priority resources and tools for: 1. Improved text search, including named entities 2. Improved text understanding (recognizing semantic similarity and generating answers to questions) + all the above for ekavian and iekavian pronunciations of Serbian 3. Creation of educational materials for software engineers to learn how to implement NLP/NLU for Serbian Labeled datasets Fine-tuned models * after consultations with more than 40 organizations of the local IT community
  • 21. What do we get? Greater flexibility and independence – we may use produced datasets for training, fine-tuning, and evaluation of both closed (commercially available) and open-source models Lower individual investments and higher quality – instead of everyone starting from scratch, everyone gets a reliable and high-quality foundation from which to build, while retaining their competitive edge (because the basic model is insufficient, each solution requires additional adaptation to the user's needs/data, integration into business processes, continuous support, and so on) Faster development of high-quality NLP/NLU solutions with Serbian support – by internal corporate IT teams, Serbian IT companies, and start-ups (which are currently virtually non-existent in this field)
  • 22. The project is being implemented by a consortium of Initial financial and other assistance agreements were signed with
  • 23. What are we doing this year, and how far have we come? 1. Selection of texts to cover the language domain January – March 2. Automated processing using existing tools: tokenization, lemmatization, word types April – May 3. Pronunciation conversion: ekavian and iekavian variants May – June 4. Manual check and correction of the initial automated processing May – September 5. Evaluation of the existing models October – November 8. Results publication and preparation for a new project December – January 6. Evaluation of models fine-tuned on the new dataset October – November 7. Transition to business applications December – January
  • 24. This is only the beginning. We have broken new ground, but there’s still much work ahead
  • 25. Therefore… If you have developed or plan to develop an NLP/NLU solution for the Serbian language (and its variants spoken in Serbia, Montenegro, Bosnia and Herzegovina) If automated text processing can benefit your (e-)business (in terms of better search, better recommendations, better customer support...) If you want to position your organization as socially responsible and willing to help the Serbian IT market and local tech community grow If you want to contribute to the preservation of the Serbian language in the digital age