Unraveling Multimodality with Large Language Models.pdf
Pravin s glibc-unicode_and_cldr
1. Glibc Unicode and CLDRGlibc Unicode and CLDR
Pravin SatputePravin Satpute
Senior Software EngineerSenior Software Engineer
Globalization TeamGlobalization Team
Red HatRed Hat
2. AgendaAgenda
●
What was the problem?What was the problem?
●
Why so?Why so?
●
How we resolved issue?How we resolved issue?
● Analysis
● Development
●
Getting patch into upstreamGetting patch into upstream
●
Question and AnswersQuestion and Answers
4. What was the problem?What was the problem?
● Updating Glibc localedata from Unicode 5.1 to
Unicode 7.0
● /usr/share/i18n/locales/i18n (LC_CTYPE)
● /usr/share/i18n/charmaps/UTF-8.gz
8. How we resolved it?How we resolved it?
AnalysisAnalysis
● Started gathering Changelog, Git logs for all changes
happened over the time. Specific fixes.
● Found hints, information written in Localedata files.
● Comments on Bugzilla
● Unicode source files providing raw information for Glibc
Localedata
● UnicodeData.txt
● DerivedCoreProperties.txt
● EastAsianWidth.txt
9. How we resolved it?How we resolved it?
Started with LC_CTYPE (i18n)Started with LC_CTYPE (i18n)
● Wrote script gen-unicode-ctype.py to update output
generated from gen-unicode-ctype.c
● Backward Compatiblity
● This script was comparing existing data with newly
generated and generating easy to understand report.
● Later Mike modify gen-unicode-ctype.py to deprecate
gen-unicode-ctype.c
● Repeated same process for UTF-8 charmap and WIDTH.
11. Patch prepared what next?Patch prepared what next?
● Later Mike Fabian stepped into it.
● He reviewed scripts. Improved it.
● Glibc upstream further improved it.
● We proposed System wide change for Fedora 22
12. Patch prepared what next?Patch prepared what next?
● Patch got committed in upstream Feb last week.
● Collaborative work from Me, Mike Fabian,
Alexandre Oliva supported by Carlos and Jens
Petersen
● Users will get this update through Fedora 22 and
other upstream distros latest releases.