Solving the world’s
(localization) problems
Eemeli Aro Mozilla), Ujjwal Sharma Igalia
Ujjwal Sharma
from New Delhi, India
based out of A Coruña, Galiza
OSS zealot, open web maximalist
love dogs, (masochistic) videogames
work at Igalia
Eemeli Aro
● From Helsinki, Finland
● Staff Software Engineer at Mozilla
● Maintainer of messageformat and yaml on npm
● Working on localization and localization
since 2012
What's wrong with
current solutions?
- L10n frameworks are primarily chosen based on their developer
front-end
- The message format/syntax is incidental
- Each framework provides all the answers
- Solution needs to get picked early (often doesn't…) and changing to
another can have a really high cost
A world of silos and monoliths
- Dynamic messages vary on many aspects of language
- plural case
- grammatical gender
- personal gender
- Vowel sounds: English a/an, French le/l'
- Prepositions: in a car, on a bus
- Messages often vary in more than one dimension
- Variance depends on language
- English he/she vs. Finnish hän (but oh so many suffixes)
A world of limitations
Inflection is hard.
Even in English.
Variance is
multi-dimensional
- Explicitly identified as an interesting problem to solve in 2013, but no
sufficiently good format was identified then or later.
- In 2019, TC39TG2 formed the MFWG to define a new format that could
be made available in JS via Intl.MessageFormat.
- WG moved under Unicode CLDR as a more appropriate host – a solution
for JS should be good for everyone else as well.
- After many meetings over five years, we think we're done.
Solving the problem for the web
Standards develop
very slowly.
Intl.MessageFormat was first proposed in 2013.
[Status]
Syntax
Hello, FOSDEM!
{ type: 'message',
declarations: [],
pattern: [ 'Hello, FOSDEM!' ] }
Placeholders
Hello, {$place}!
{ type: 'message', declarations: [], pattern: [
'Hello, ',
{ type: 'expression',
arg: { type: 'variable', name: 'place' } },
'!' ] }
Markup
Click {#link u:id=next}here{/link} to continue
interface Markup {
type: "markup";
kind: "open" | "standalone" | "close";
name: string;
options: Options;
attributes: Attributes;
}
Expressions
Hello, {$userName}!
Total: {$sum :number style=currency currency=USD}.
interface Expression {
type: "expression";
arg?: Literal | VariableRef;
function?: FunctionRef;
attributes: Attributes;
}
Functions
Today is {$date :date style=long}
interface FunctionExpression {
type: "expression";
arg?: never;
function: FunctionRef;
attributes: Attributes;
}
Patterns
This is a pattern. It can include expressions like {$v} and
{#b}markup{/b}.
type Pattern = (string | Expression | Markup)[]
Matchers
.input {$count :number}
.match $count
one {{You have {$count} week.}}
* {{You have {$count} weeks.}}
interface SelectMessage {
type: "select";
declarations: Declaration[];
selectors: VariableRef[];
variants: Variant[];
}
interface Variant {
keys: (Literal | CatchallKey)[];
value: Pattern;
}
What next?
The world beyond a single message
- Syntax & data model for a single message is good, but how do we put
together multiple messages?
- We need a new resource file format.
- Also, a metadata language – think JavaDoc/JSDoc for localization
- @locale
- @param
- @allow-empty
- …
A world of interoperability
- The message data model is not meant to be an abstract thing, but a tool
to be used
- This makes it possible to compare and convert messages across all
formats
- npm: messageformat, @messageformat/fluent,
@messageformat/icu-messageformat-1
- python: moz.l10n
- A better translation memory?
We're providing you with building blocks
- Your favourite L10n framework probably doesn't support MF2 yet.
- The tools you need to adopt MF2 probably aren't there yet.
- This is by design: We are not presuming to solve all the problems at
once, and we need your help.
- A key thought: Translatable human messages are not really that
complex, and the MF2 data model can represent all of them.
- MF2 isn't going to replace your current framework; it's trying to make it
better, and make it less of a silo.
Supporting localization in HTML
- Let's make localization declarative, and so web-native that you don't
need JavaScript to make it work.
- Declare in HTML your MF2 resources with <link rel="messages">, and
use them as <span msg="msg-id"></span>.
- This does depend on a message resource spec, and on the JS
Intl.MessageFormat spec.
<html>
You should tell us if we're wrong
- The 2.0 version of the spec is currently a Final Candidate, and it'll be
finalized with a month or so.
- If you think we're wrong about some part of this, you should tell us
ASAP, or we'll likely be stuck with our mistakes for the next decade or
three.
- Unicode MessageFormat WG
github.com/unicode-org/message-format-wg
- Unicode Inflection WG
github.com/unicode-org/inflection
- Intl.MessageFormat Proposal
github.com/tc39/proposal-intl-messageformat
- Message resources
github.com/eemeli/message-resource-wg
- JS messageformat
github.com/messageformat/messageformat/tree/main/mf2/messageformat
- Python: moz.l10n
github.com/mozilla/moz-l10n
- C & Java: ICU
icu.unicode.org

Solving the world’s (localization) problems

  • 1.
    Solving the world’s (localization)problems Eemeli Aro Mozilla), Ujjwal Sharma Igalia
  • 2.
    Ujjwal Sharma from NewDelhi, India based out of A Coruña, Galiza OSS zealot, open web maximalist love dogs, (masochistic) videogames work at Igalia
  • 3.
    Eemeli Aro ● FromHelsinki, Finland ● Staff Software Engineer at Mozilla ● Maintainer of messageformat and yaml on npm ● Working on localization and localization since 2012
  • 4.
  • 6.
    - L10n frameworksare primarily chosen based on their developer front-end - The message format/syntax is incidental - Each framework provides all the answers - Solution needs to get picked early (often doesn't…) and changing to another can have a really high cost A world of silos and monoliths
  • 9.
    - Dynamic messagesvary on many aspects of language - plural case - grammatical gender - personal gender - Vowel sounds: English a/an, French le/l' - Prepositions: in a car, on a bus - Messages often vary in more than one dimension - Variance depends on language - English he/she vs. Finnish hän (but oh so many suffixes) A world of limitations
  • 10.
  • 11.
  • 13.
    - Explicitly identifiedas an interesting problem to solve in 2013, but no sufficiently good format was identified then or later. - In 2019, TC39TG2 formed the MFWG to define a new format that could be made available in JS via Intl.MessageFormat. - WG moved under Unicode CLDR as a more appropriate host – a solution for JS should be good for everyone else as well. - After many meetings over five years, we think we're done. Solving the problem for the web
  • 14.
  • 16.
  • 17.
    Syntax Hello, FOSDEM! { type:'message', declarations: [], pattern: [ 'Hello, FOSDEM!' ] }
  • 18.
    Placeholders Hello, {$place}! { type:'message', declarations: [], pattern: [ 'Hello, ', { type: 'expression', arg: { type: 'variable', name: 'place' } }, '!' ] }
  • 19.
    Markup Click {#link u:id=next}here{/link}to continue interface Markup { type: "markup"; kind: "open" | "standalone" | "close"; name: string; options: Options; attributes: Attributes; }
  • 20.
    Expressions Hello, {$userName}! Total: {$sum:number style=currency currency=USD}. interface Expression { type: "expression"; arg?: Literal | VariableRef; function?: FunctionRef; attributes: Attributes; }
  • 21.
    Functions Today is {$date:date style=long} interface FunctionExpression { type: "expression"; arg?: never; function: FunctionRef; attributes: Attributes; }
  • 22.
    Patterns This is apattern. It can include expressions like {$v} and {#b}markup{/b}. type Pattern = (string | Expression | Markup)[]
  • 23.
    Matchers .input {$count :number} .match$count one {{You have {$count} week.}} * {{You have {$count} weeks.}} interface SelectMessage { type: "select"; declarations: Declaration[]; selectors: VariableRef[]; variants: Variant[]; } interface Variant { keys: (Literal | CatchallKey)[]; value: Pattern; }
  • 24.
  • 26.
    The world beyonda single message - Syntax & data model for a single message is good, but how do we put together multiple messages? - We need a new resource file format. - Also, a metadata language – think JavaDoc/JSDoc for localization - @locale - @param - @allow-empty - …
  • 29.
    A world ofinteroperability - The message data model is not meant to be an abstract thing, but a tool to be used - This makes it possible to compare and convert messages across all formats - npm: messageformat, @messageformat/fluent, @messageformat/icu-messageformat-1 - python: moz.l10n - A better translation memory?
  • 31.
    We're providing youwith building blocks - Your favourite L10n framework probably doesn't support MF2 yet. - The tools you need to adopt MF2 probably aren't there yet. - This is by design: We are not presuming to solve all the problems at once, and we need your help. - A key thought: Translatable human messages are not really that complex, and the MF2 data model can represent all of them. - MF2 isn't going to replace your current framework; it's trying to make it better, and make it less of a silo.
  • 34.
    Supporting localization inHTML - Let's make localization declarative, and so web-native that you don't need JavaScript to make it work. - Declare in HTML your MF2 resources with <link rel="messages">, and use them as <span msg="msg-id"></span>. - This does depend on a message resource spec, and on the JS Intl.MessageFormat spec.
  • 35.
  • 37.
    You should tellus if we're wrong - The 2.0 version of the spec is currently a Final Candidate, and it'll be finalized with a month or so. - If you think we're wrong about some part of this, you should tell us ASAP, or we'll likely be stuck with our mistakes for the next decade or three.
  • 39.
    - Unicode MessageFormatWG github.com/unicode-org/message-format-wg - Unicode Inflection WG github.com/unicode-org/inflection - Intl.MessageFormat Proposal github.com/tc39/proposal-intl-messageformat - Message resources github.com/eemeli/message-resource-wg - JS messageformat github.com/messageformat/messageformat/tree/main/mf2/messageformat - Python: moz.l10n github.com/mozilla/moz-l10n - C & Java: ICU icu.unicode.org