SlideShare a Scribd company logo
1 of 24
Download to read offline
Mārcis Pinnis
Chief AI officer
Tilde
AI, don't f*#$ up my name!
Language is personal.
A little about me
Developing language
technologies since 2006
Overseeing AI research in
Tilde since 2019
What will I talk about?
ꟷ What are language technologies?
ꟷ How are language technologies developed today?
ꟷ Examples of when language technologies fail
ꟷ What can we do about it?
Solutions that analyze,
produce, modify or
respond to human
texts and speech.
Spelling and grammar checking Machine translation
Speech processing
Virtual assistants, dialog
systems, etc.
Electronic dictionaries Anonymization
… and many, many more!
Terminology management
What are language technologies?
How are language technologies
developed?
Language data
ꟷ The most important ingredient when developing
language technologies
ꟷ Any text, speech (audio/video) produced by humans
First, we
collect
language
data.
Then, we
train
models on
that data.
Finally, we
deploy
models for
use.
Language is not constant – once you train a
model on some data, it becomes outdated!
Source: https://chat.openai.com
Language data is often the main cause why
language technologies generate errors
Typical challenges with language data are:
ꟷ There is never enough data
ꟷ Data is noisy
ꟷ Data is obsolete
ꟷ Data is not in the right domain
How can language data become
obsolete?
Language is not constant –
the focus of the society is a changing factor
Source: NewsCrawl corpus (https://data.statmt.org/news-crawl)
Language use “follows”
Y2014Y2015Y2016Y2017Y2018Y2019Y2020Y2021Y2022
Frequency
in
news
(LV)
Ukraina (lv) / Ukraine (en)
Y2014 Y2015 Y2016 Y2017 Y2018 Y2019 Y2020 Y2021 Y2022
Koronavīruss (lv) / Coronavirus (en)
Language is not constant –
the society is constantly advancing
Source: https://termini.gov.lv/komisija/lza-tk-23052023-sedes-protokols-nr-51175
The Terminology Commission of the Latvian Academy of Sciences
regularly introduces new terminology in Latvian, e.g.:
English term Translation into Latvian (introduced in May, 2023)
parasailing izpletņbraukšana
backpacker mugursomnieks
Language becomes richer
Source: https://termini.gov.lv/komisija/lza-tk-23052023-sedes-protokols-nr-51175
The Terminology Commission of the Latvian Academy of Sciences
sometimes alters existing terminology in Latvian, e.g.:
English term Before May, 2023 Since May, 2023
cooling aukstumapgāde dzesēšana
engineering and communication systems inženierkomunikācijas inženiersistēmas
Language keeps changing
Language is not constant –
the society is constantly advancing
Societal efforts may introduce new concepts
or alter existing ones
Source of examples: https://www.auswaertiges-amt.de
In Germany, the “gender star” is being introduced in public sector communication to express
gender-neutrality
Example – gender-neutral language
Referent*in (w/m/d) in der Social-Media-Analyse (w/m/d)
/Consultant (f/m/d) in social media analysis (f/m/d)/
Die Mitarbeiter:innen stehen im Zentrum
/Employees are in the center/
Societal efforts may alter existing concepts
Source: https://likumi.lv/ta/id/331352-par-ukrainas-pilsetu-nosaukumu-atveidi-latviesu-valoda
In 2022, the State Language Center of Latvia decided that 31 Ukrainian towns and city names in
Latvian will be translated to follow the original Ukrainian (and not Russian) writing.
Even if you can
keep up with the
pace of change,
your language
data will never
be complete
Source: https://twitter.com/krisjaniskarins/status/1705071215520481494
Language is naturally
ambiguous and sparse
Language data is often English-centric
More data is available in English and about English-speaking regions.
In other words, data has probably never witnessed some “random person” from a “random place”
somewhere outside the US/UK
If you are that “random person”, AI becomes personal!
I am such a “random person”!
Sometimes AI tends to f*%# up my name.
Language is changing!
What are our options?
For language technology developers
Collect and don’t stop
collecting data
Source local data
(collect or synthesize)
Plan to deliver
models iteratively
Use adaptive methods
to adjust to a changing
language
For language technology users
Pay attention to data
management
processes in your
organization
(Language) data is
gold – do not lose it!
Share your (language) data openly if
you want to benefit better from
“free” AI services.
No one except you have data in your
narrow subject.
Use public infrastructure to do that:
European Language Resource Coordination (ELRC-SHARE)
European Language Grid (ELG)
Language is changing!
What are our options?
Takeaways
Language technologies are integral in our day-to-day activities with computers
ꟷ we become more productive
ꟷ we can access more information
ꟷ we can reach wider audiences
Language technologies are not 100% precise
ꟷ Languages are complex and constantly changing
ꟷ There will always be cases where they fail
However, if we develop our systems to expect such changes, we can effectively mitigate errors
(and make our customers happier).
Thank you!
Mārcis Pinnis
Chief AI officer
Tilde

More Related Content

Similar to Developing Language Technologies in a Changing World

Promoting the Use of Basque via Language Technology
Promoting the Use of Basque via Language TechnologyPromoting the Use of Basque via Language Technology
Promoting the Use of Basque via Language Technologytechiaith
 
Omt Personal Statement Examples
Omt Personal Statement ExamplesOmt Personal Statement Examples
Omt Personal Statement ExamplesTammy Lacy
 
Procedia Computer Science 94 ( 2016 ) 295 – 301 Avail.docx
 Procedia Computer Science   94  ( 2016 )  295 – 301 Avail.docx Procedia Computer Science   94  ( 2016 )  295 – 301 Avail.docx
Procedia Computer Science 94 ( 2016 ) 295 – 301 Avail.docxaryan532920
 
Resources To Support Library And Information Specialists Aug 09
Resources To Support Library And Information Specialists Aug 09Resources To Support Library And Information Specialists Aug 09
Resources To Support Library And Information Specialists Aug 09magsmckay
 
Natural language processing
Natural language processingNatural language processing
Natural language processingKarenVacca
 
Design and Implementation of a Language Assistant for English – Arabic Texts
Design and Implementation of a Language Assistant for English – Arabic TextsDesign and Implementation of a Language Assistant for English – Arabic Texts
Design and Implementation of a Language Assistant for English – Arabic TextsIJCSIS Research Publications
 
Questions On Natural Language Processing
Questions On Natural Language ProcessingQuestions On Natural Language Processing
Questions On Natural Language ProcessingAdriana Wilson
 
IS-EUD-2015, Madrid, Spain, 27 May 2015
IS-EUD-2015, Madrid, Spain, 27 May 2015IS-EUD-2015, Madrid, Spain, 27 May 2015
IS-EUD-2015, Madrid, Spain, 27 May 2015Charith Perera
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsVahid Saffarian
 
IRJET- Communication Aid for Deaf and Dumb People
IRJET- Communication Aid for Deaf and Dumb PeopleIRJET- Communication Aid for Deaf and Dumb People
IRJET- Communication Aid for Deaf and Dumb PeopleIRJET Journal
 
American Standard Sign Language Representation Using Speech Recognition
American Standard Sign Language Representation Using Speech RecognitionAmerican Standard Sign Language Representation Using Speech Recognition
American Standard Sign Language Representation Using Speech Recognitionpaperpublications3
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana
 
ELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyDafydd Gibbon
 
Presentation - JIAMCATT 2013
Presentation - JIAMCATT 2013Presentation - JIAMCATT 2013
Presentation - JIAMCATT 2013Ashok Hariharan
 
Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language Dr. Amit Kumar Jha
 
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptxDataScienceConferenc1
 

Similar to Developing Language Technologies in a Changing World (20)

Exploring the Evolution and Diversity of Speech Datasets
Exploring the Evolution and Diversity of Speech DatasetsExploring the Evolution and Diversity of Speech Datasets
Exploring the Evolution and Diversity of Speech Datasets
 
Promoting the Use of Basque via Language Technology
Promoting the Use of Basque via Language TechnologyPromoting the Use of Basque via Language Technology
Promoting the Use of Basque via Language Technology
 
Omt Personal Statement Examples
Omt Personal Statement ExamplesOmt Personal Statement Examples
Omt Personal Statement Examples
 
Procedia Computer Science 94 ( 2016 ) 295 – 301 Avail.docx
 Procedia Computer Science   94  ( 2016 )  295 – 301 Avail.docx Procedia Computer Science   94  ( 2016 )  295 – 301 Avail.docx
Procedia Computer Science 94 ( 2016 ) 295 – 301 Avail.docx
 
Resources To Support Library And Information Specialists Aug 09
Resources To Support Library And Information Specialists Aug 09Resources To Support Library And Information Specialists Aug 09
Resources To Support Library And Information Specialists Aug 09
 
Project paper
Project paperProject paper
Project paper
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Design and Implementation of a Language Assistant for English – Arabic Texts
Design and Implementation of a Language Assistant for English – Arabic TextsDesign and Implementation of a Language Assistant for English – Arabic Texts
Design and Implementation of a Language Assistant for English – Arabic Texts
 
Questions On Natural Language Processing
Questions On Natural Language ProcessingQuestions On Natural Language Processing
Questions On Natural Language Processing
 
visH (fin).pptx
visH (fin).pptxvisH (fin).pptx
visH (fin).pptx
 
IS-EUD-2015, Madrid, Spain, 27 May 2015
IS-EUD-2015, Madrid, Spain, 27 May 2015IS-EUD-2015, Madrid, Spain, 27 May 2015
IS-EUD-2015, Madrid, Spain, 27 May 2015
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
IRJET- Communication Aid for Deaf and Dumb People
IRJET- Communication Aid for Deaf and Dumb PeopleIRJET- Communication Aid for Deaf and Dumb People
IRJET- Communication Aid for Deaf and Dumb People
 
American Standard Sign Language Representation Using Speech Recognition
American Standard Sign Language Representation Using Speech RecognitionAmerican Standard Sign Language Representation Using Speech Recognition
American Standard Sign Language Representation Using Speech Recognition
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
 
ELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technology
 
Presentation - JIAMCATT 2013
Presentation - JIAMCATT 2013Presentation - JIAMCATT 2013
Presentation - JIAMCATT 2013
 
Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language
 
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx
 
Free opensourceat
Free opensourceatFree opensourceat
Free opensourceat
 

Recently uploaded

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 

Recently uploaded (20)

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 

Developing Language Technologies in a Changing World

  • 1. Mārcis Pinnis Chief AI officer Tilde AI, don't f*#$ up my name! Language is personal.
  • 2. A little about me Developing language technologies since 2006 Overseeing AI research in Tilde since 2019
  • 3. What will I talk about? ꟷ What are language technologies? ꟷ How are language technologies developed today? ꟷ Examples of when language technologies fail ꟷ What can we do about it?
  • 4. Solutions that analyze, produce, modify or respond to human texts and speech. Spelling and grammar checking Machine translation Speech processing Virtual assistants, dialog systems, etc. Electronic dictionaries Anonymization … and many, many more! Terminology management What are language technologies?
  • 5. How are language technologies developed?
  • 6. Language data ꟷ The most important ingredient when developing language technologies ꟷ Any text, speech (audio/video) produced by humans
  • 10. Language is not constant – once you train a model on some data, it becomes outdated! Source: https://chat.openai.com
  • 11. Language data is often the main cause why language technologies generate errors Typical challenges with language data are: ꟷ There is never enough data ꟷ Data is noisy ꟷ Data is obsolete ꟷ Data is not in the right domain
  • 12. How can language data become obsolete?
  • 13. Language is not constant – the focus of the society is a changing factor Source: NewsCrawl corpus (https://data.statmt.org/news-crawl) Language use “follows” Y2014Y2015Y2016Y2017Y2018Y2019Y2020Y2021Y2022 Frequency in news (LV) Ukraina (lv) / Ukraine (en) Y2014 Y2015 Y2016 Y2017 Y2018 Y2019 Y2020 Y2021 Y2022 Koronavīruss (lv) / Coronavirus (en)
  • 14. Language is not constant – the society is constantly advancing Source: https://termini.gov.lv/komisija/lza-tk-23052023-sedes-protokols-nr-51175 The Terminology Commission of the Latvian Academy of Sciences regularly introduces new terminology in Latvian, e.g.: English term Translation into Latvian (introduced in May, 2023) parasailing izpletņbraukšana backpacker mugursomnieks Language becomes richer
  • 15. Source: https://termini.gov.lv/komisija/lza-tk-23052023-sedes-protokols-nr-51175 The Terminology Commission of the Latvian Academy of Sciences sometimes alters existing terminology in Latvian, e.g.: English term Before May, 2023 Since May, 2023 cooling aukstumapgāde dzesēšana engineering and communication systems inženierkomunikācijas inženiersistēmas Language keeps changing Language is not constant – the society is constantly advancing
  • 16. Societal efforts may introduce new concepts or alter existing ones Source of examples: https://www.auswaertiges-amt.de In Germany, the “gender star” is being introduced in public sector communication to express gender-neutrality Example – gender-neutral language Referent*in (w/m/d) in der Social-Media-Analyse (w/m/d) /Consultant (f/m/d) in social media analysis (f/m/d)/ Die Mitarbeiter:innen stehen im Zentrum /Employees are in the center/
  • 17. Societal efforts may alter existing concepts Source: https://likumi.lv/ta/id/331352-par-ukrainas-pilsetu-nosaukumu-atveidi-latviesu-valoda In 2022, the State Language Center of Latvia decided that 31 Ukrainian towns and city names in Latvian will be translated to follow the original Ukrainian (and not Russian) writing.
  • 18. Even if you can keep up with the pace of change, your language data will never be complete Source: https://twitter.com/krisjaniskarins/status/1705071215520481494 Language is naturally ambiguous and sparse
  • 19. Language data is often English-centric More data is available in English and about English-speaking regions. In other words, data has probably never witnessed some “random person” from a “random place” somewhere outside the US/UK If you are that “random person”, AI becomes personal!
  • 20. I am such a “random person”! Sometimes AI tends to f*%# up my name.
  • 21. Language is changing! What are our options? For language technology developers Collect and don’t stop collecting data Source local data (collect or synthesize) Plan to deliver models iteratively Use adaptive methods to adjust to a changing language
  • 22. For language technology users Pay attention to data management processes in your organization (Language) data is gold – do not lose it! Share your (language) data openly if you want to benefit better from “free” AI services. No one except you have data in your narrow subject. Use public infrastructure to do that: European Language Resource Coordination (ELRC-SHARE) European Language Grid (ELG) Language is changing! What are our options?
  • 23. Takeaways Language technologies are integral in our day-to-day activities with computers ꟷ we become more productive ꟷ we can access more information ꟷ we can reach wider audiences Language technologies are not 100% precise ꟷ Languages are complex and constantly changing ꟷ There will always be cases where they fail However, if we develop our systems to expect such changes, we can effectively mitigate errors (and make our customers happier).