Mind Your lang
Presented by Adrian Roselli (@aardrian)
for London Web Standards
Slides from this talk will be available at
rosel.li/lws18
London skyline by Taras Kalapun, CC BY 2.0
• I’ve written some stuff,
• Member of W3C,
• Building for the web
since 1993,
• Learn more at
AdrianRoselli.com,
• Avoid on Twitter
@aardrian.
About Adrian Roselli
What Is lang?
What Is lang?
• Examples:
<html lang="en">
<html lang="en-gb">
<html lang="en-us">
<html lang="en-GB-x-hixie">
• Source:
BCP47: Tags for Identifying Languages,
https://tools.ietf.org/html/bcp47
We’ll come back to that last one.
Who Uses lang?
Who Uses lang?
• WHATWG Bug: “why do these examples of <html>
lack the lang attribute?”
This is where my research started.
“Why not? Realistically,
few people include it. It
just means the language
is unknown.”
Who Uses lang?
• Pulled January 2015 archive from
WebDevData.org (a W3C Community Group),
• Parsed 84,054 pages,
• Found that 39,433 pages use the lang
attribute on the <html> element,
• 47% use <html lang="…">.
12,762 use xml:lang, which is wrong.
Who Uses lang?
• “why do these examples of <html> lack the
lang attribute?”
• WHATWG HTML bug (26942)
• Reported: 2014-09-30
• Resolved: 2016-04-18
• Git merge:
• Editorial: Add lang to most examples #1061
Spoiled the surprise, I know, but we aren’t here for a bug.
Why Would You Use lang?
Why Would You Use lang?
• HTML 5 Specification
• HTML Validation
• Internationalization (i18)
• WCAG 2.0 A, AA
• Numbers
• Dates
• Hyphens
• Quotes
• Screen Readers
HTML 5 Specification
HTML 5 Specification
• The spec provides a warning,
• Notes that it must match detected language of
the page,
• Identified ways which it is used,
• Added in April 2016
• add warning/advice about lang attribute use #218
https://github.com/w3c/html/issues/218
HTML 5 Specification
http://w3c.github.io/html/dom.html#lang-warning
HTML Validation
HTML Validation
• The W3C HTML validator compares the
following attributes on the page with the
detected page language:
• dir
• lang
• If there is a mismatch, the validator will
provide a warning,
• If there is no dir or lang, the validator will
provide a warning.
It will know if you lie.
HTML Validation
https://www.w3.org/blog/International/2016/07/13/w3c-html5-validator-enhanced-with-language-detection-functionality/
Internationalization (i18n)
Internationalization (i18n)
• Spelling and grammar checkers:
• spellcheck attribute (at caniuse.com)
• CSS:
• ::first-letter (at caniuse.com)
• Hanging punctuation
• Translation tools (particularly when looking at
parts of a page).
https://www.w3.org/International/questions/qa-lang-why
Internationalization (i18n)
• Font selection for CJK (for political reasons).
https://medium.com/behancetech/localization-gotchas-for-asian-languages-cjk-e52a57c0fde1
Internationalization (i18n)
Localization Gotchas for Asian Languages (CJK) by Andrew Landry
WCAG 2.0 A, AA
WCAG 2.0 A, AA
• Guideline 3.1 Readable: Make text content
readable and understandable.
• 3.1.1 Language of Page (Level A)
• H57: Using language attributes on the html
element
• 3.1.2 Language of Parts (Level AA)
• H58: Using language attributes to identify changes
in the human language
https://www.w3.org/TR/2008/REC-WCAG20-20081211/#meaning-doc-lang-id
Numbers
Numbers
• A browser can adjust decimal characters in
number fields,
• Some use comma, some use period,
• Yes, this is for Latin scripts.
• Do not worry about browser support unless
you are mixing within a page.
• In that case, Firefox is the way to go.
If left blank, the browser should go with locale settings.
Numbers
http://codepen.io/aardrian/pen/rOGYNL
Numbers
http://codepen.io/aardrian/pen/rOGYNL
Dates
Dates
• Not so much.
http://s.codepen.io/aardrian/debug/ZpgNWJ
Hyphens
Hyphens
• For browsers that support hyphens, you will
enjoy the benefit just by using the attribute.
• This assumes you use the following CSS:
• hyphens: auto;
• -ms-hyphens: auto; (ugh)
• -webkit-hyphens: auto; (also ugh)
• Browser support:
• http://caniuse.com/#search=hyphens
If left blank, the browser should go with locale settings.
Hyphens
http://codepen.io/aardrian/pen/zKVLvO
Hyphens
http://codepen.io/aardrian/pen/zKVLvO
Quotes
Quotes
• Let the browser choose the quote marks
based on the language.
• This assumes you use the following HTML:
• <q>…</q>
Obviously you can override this with CSS, but that would be silly.
Quotes
http://s.codepen.io/aardrian/debug/zKgbVv
Quotes
http://s.codepen.io/aardrian/debug/zKgbVv
Screen Readers
Screen Readers
• VoiceOver uses it to auto-switch voices.
• VoiceOver can speak using a different accent.
• JAWS uses it to load the correct phonetic engine /
phonologic dictionary.
• NVDA uses it in the same way as VoiceOver and JAWS.
• For HTML in ePub or Apple iBooks document, it affects
how VoiceOver will read the book.
• Leaving out the lang attribute may require the user to
manually switch to the correct language for proper
pronunciation.
This gist is that things can sound funny if done wrong.
Screen Readers
http://s.codepen.io/aardrian/debug/eBOrZY
NVDA:
Screen Readers
http://s.codepen.io/aardrian/debug/eBOrZY
JAWS:
Fun Facts
Fun Facts
• WHATWG HTML 5
<html class=split lang=en-US-x-hixie>
• W3C HTML 5.0
<html lang="en-US-x-Hixie">
• W3C HTML 5.1
<html lang="en">
You can confirm this by viewing the source of each.
Fun Facts
“Private-use subtags do not appear in the
subtag registry, and are chosen and maintained
by private agreement amongst parties.”
“Because these subtags are only meaningful
within private agreements and cannot be used
interoperably across the Web, they should be
used with great care, and avoided whenever
possible.”
http://www.w3.org/International/articles/language-tags/Overview.en.php#extension
Fun Facts
• There is a normative spec:
• Hixie English
• Version: 1.0-pre43
• Language Tag: en-GB-x-Hixie
• “This is a normative reference to Hixie English.
Hixie English is a variant of the language
spoken by the majority of the residents of the
United Kingdom (England) and the United
States of America.”
http://ian.hixie.ch/bible/english
Questions?
Mind Your lang
Presented by Adrian Roselli (@aardrian)
for London Web Standards
Slides from this talk will be available at
rosel.li/lws18
London skyline by Taras Kalapun, CC BY 2.0

Mind Your Lang — London Web Standards

  • 1.
    Mind Your lang Presentedby Adrian Roselli (@aardrian) for London Web Standards Slides from this talk will be available at rosel.li/lws18 London skyline by Taras Kalapun, CC BY 2.0
  • 2.
    • I’ve writtensome stuff, • Member of W3C, • Building for the web since 1993, • Learn more at AdrianRoselli.com, • Avoid on Twitter @aardrian. About Adrian Roselli
  • 3.
  • 4.
    What Is lang? •Examples: <html lang="en"> <html lang="en-gb"> <html lang="en-us"> <html lang="en-GB-x-hixie"> • Source: BCP47: Tags for Identifying Languages, https://tools.ietf.org/html/bcp47 We’ll come back to that last one.
  • 5.
  • 6.
    Who Uses lang? •WHATWG Bug: “why do these examples of <html> lack the lang attribute?” This is where my research started. “Why not? Realistically, few people include it. It just means the language is unknown.”
  • 7.
    Who Uses lang? •Pulled January 2015 archive from WebDevData.org (a W3C Community Group), • Parsed 84,054 pages, • Found that 39,433 pages use the lang attribute on the <html> element, • 47% use <html lang="…">. 12,762 use xml:lang, which is wrong.
  • 8.
    Who Uses lang? •“why do these examples of <html> lack the lang attribute?” • WHATWG HTML bug (26942) • Reported: 2014-09-30 • Resolved: 2016-04-18 • Git merge: • Editorial: Add lang to most examples #1061 Spoiled the surprise, I know, but we aren’t here for a bug.
  • 9.
    Why Would YouUse lang?
  • 10.
    Why Would YouUse lang? • HTML 5 Specification • HTML Validation • Internationalization (i18) • WCAG 2.0 A, AA • Numbers • Dates • Hyphens • Quotes • Screen Readers
  • 11.
  • 12.
    HTML 5 Specification •The spec provides a warning, • Notes that it must match detected language of the page, • Identified ways which it is used, • Added in April 2016 • add warning/advice about lang attribute use #218 https://github.com/w3c/html/issues/218
  • 13.
  • 14.
  • 15.
    HTML Validation • TheW3C HTML validator compares the following attributes on the page with the detected page language: • dir • lang • If there is a mismatch, the validator will provide a warning, • If there is no dir or lang, the validator will provide a warning. It will know if you lie.
  • 16.
  • 17.
  • 18.
    Internationalization (i18n) • Spellingand grammar checkers: • spellcheck attribute (at caniuse.com) • CSS: • ::first-letter (at caniuse.com) • Hanging punctuation • Translation tools (particularly when looking at parts of a page). https://www.w3.org/International/questions/qa-lang-why
  • 19.
    Internationalization (i18n) • Fontselection for CJK (for political reasons). https://medium.com/behancetech/localization-gotchas-for-asian-languages-cjk-e52a57c0fde1
  • 20.
    Internationalization (i18n) Localization Gotchasfor Asian Languages (CJK) by Andrew Landry
  • 21.
  • 22.
    WCAG 2.0 A,AA • Guideline 3.1 Readable: Make text content readable and understandable. • 3.1.1 Language of Page (Level A) • H57: Using language attributes on the html element • 3.1.2 Language of Parts (Level AA) • H58: Using language attributes to identify changes in the human language https://www.w3.org/TR/2008/REC-WCAG20-20081211/#meaning-doc-lang-id
  • 23.
  • 24.
    Numbers • A browsercan adjust decimal characters in number fields, • Some use comma, some use period, • Yes, this is for Latin scripts. • Do not worry about browser support unless you are mixing within a page. • In that case, Firefox is the way to go. If left blank, the browser should go with locale settings.
  • 25.
  • 26.
  • 27.
  • 28.
    Dates • Not somuch. http://s.codepen.io/aardrian/debug/ZpgNWJ
  • 29.
  • 30.
    Hyphens • For browsersthat support hyphens, you will enjoy the benefit just by using the attribute. • This assumes you use the following CSS: • hyphens: auto; • -ms-hyphens: auto; (ugh) • -webkit-hyphens: auto; (also ugh) • Browser support: • http://caniuse.com/#search=hyphens If left blank, the browser should go with locale settings.
  • 31.
  • 32.
  • 33.
  • 34.
    Quotes • Let thebrowser choose the quote marks based on the language. • This assumes you use the following HTML: • <q>…</q> Obviously you can override this with CSS, but that would be silly.
  • 35.
  • 36.
  • 37.
  • 38.
    Screen Readers • VoiceOveruses it to auto-switch voices. • VoiceOver can speak using a different accent. • JAWS uses it to load the correct phonetic engine / phonologic dictionary. • NVDA uses it in the same way as VoiceOver and JAWS. • For HTML in ePub or Apple iBooks document, it affects how VoiceOver will read the book. • Leaving out the lang attribute may require the user to manually switch to the correct language for proper pronunciation. This gist is that things can sound funny if done wrong.
  • 39.
  • 40.
  • 41.
  • 42.
    Fun Facts • WHATWGHTML 5 <html class=split lang=en-US-x-hixie> • W3C HTML 5.0 <html lang="en-US-x-Hixie"> • W3C HTML 5.1 <html lang="en"> You can confirm this by viewing the source of each.
  • 43.
    Fun Facts “Private-use subtagsdo not appear in the subtag registry, and are chosen and maintained by private agreement amongst parties.” “Because these subtags are only meaningful within private agreements and cannot be used interoperably across the Web, they should be used with great care, and avoided whenever possible.” http://www.w3.org/International/articles/language-tags/Overview.en.php#extension
  • 44.
    Fun Facts • Thereis a normative spec: • Hixie English • Version: 1.0-pre43 • Language Tag: en-GB-x-Hixie • “This is a normative reference to Hixie English. Hixie English is a variant of the language spoken by the majority of the residents of the United Kingdom (England) and the United States of America.” http://ian.hixie.ch/bible/english
  • 45.
  • 46.
    Mind Your lang Presentedby Adrian Roselli (@aardrian) for London Web Standards Slides from this talk will be available at rosel.li/lws18 London skyline by Taras Kalapun, CC BY 2.0

Editor's Notes

  • #2 The most exciting talk you have ever seen about a single HTML attribute. Maybe.
  • #4 Specifically, what is the attribute? Where does it live?
  • #5 The first example sets the language of the page as English The second sets the language as British English The third sets the language as American English The fourth is… we’ll come back to that. “Case distinctions are ignored in extensions (as with any language subtag) and normalized subtags of this type are expected to be in lowercase.”
  • #6 This question might not seem relevant, but it is helps explain how I got here.
  • #7 I stumbled across this issue when trying to suss out why en-GB-x-Hixie was a thing. But it was the response to the issue that bothered me. It made an assertion with no support.
  • #8 I set out to find data to see if that assertion was true. Nearly half used the lang attribute. I consider this different than “few people”. I also found that nearly 13,000 use xml:lang, which is only valid for XML or HTML5 polyglot.
  • #9 The good news is that the bug was resolved. I am not here to re-litigate the bug. It did get me an acknowledgment in the WHATWG spec. So yeah. But now you have context for the following slides.
  • #10 In my opinion, this was the more important question.
  • #11 This is where it gets exciting. I have collected 9 reasons.
  • #13 For context, that is about the same time WHATWG closed its lang bug. The W3C spec was learning from WHATWG’s mistake? Either way, clarity was needed.
  • #14 The language of HTML documents is indicated using a lang attribute (on the html element itself, to indicate the primary language of the document, and on individual elements, to indicate a change in language). It provides an explicit indication to user agents about the language of content in order to enable language specific behavior. For example, use of an appropriate language dictionary; selection of an appropriate font or glyphs for characters shared between different languages; or in the case of screen readers and similar assistive technologies with voice output, pronunciation of content using the correct voice / language library. Incorrect or absent lang attributes can produce unexpected results in other circumstances, as they are also used to determine quotation marks for q elements, styling such as hyphenation, case conversion, line-breaking, and spell-checking in some editors, etc. Setting the lang attribute to a language which does not match the language of the document or document parts will result in some users being unable to understand the content.
  • #15 It stands to reason it would come into play here.
  • #16 Heuristics!
  • #17 This warning is helpful It explains how it came to that conclusion Eg: it sees page content does not match the declared language
  • #18 Localization or internationalization. In the UK you likely worry about it more than we do in the U.S. Though with our diversity we need to be better at in-country localization.
  • #19 https://caniuse.com/#search=spellcheck: “Browsers have different behavior in how they deal with spellchecking in combination with the the lang attribute. Generally spelling is based on the browser's language, not the language of the document.” http://caniuse.com/#search=%3A%3Afirst-letter: “The spec says that both letters of digraphs which are always capitalized together (such as "IJ" in Dutch) should be matched by ::first-letter, but no browser has ever implemented this.”
  • #20 Mandarin, Cantonese, Japanese, and Korean yes, Korean fonts also contain Chinese characters, in addition to Hangul These 4 languages contain the same Chinese characters. Many characters are drawn differently in each language. Each language’s version of the character shares the same unicode value.
  • #21 Hopefully this graphic is more obvious. Can use the :lang pseudo class selector to choose the right font for when browser cannot. There are potential political ramifications for using the wrong character.
  • #22 Accessibility!
  • #23 A single A Success Criterion is to define the language of the page. A double A SC is to define the language of any parts of the page that deviate from that. For many sites, 3.1.2 may not be an issue. However, think of language switchers, which are a language change in-page to present the other language name.
  • #24 Which are not words.
  • #25 In a vacuum, a browser will default to the current system’s language setting. But this is where you can help make the experience better for people who are not their regular system. Travelers using computers in foreign countries comes to mind fo me pretty easily.
  • #26 This is using Firefox. The first two fields are English. The second two are Norwegian. Norwegians use a comma as a decimal delimiter.
  • #27 Here you can see me increasing the value in each field. The first is in .01 steps, then in whole numbers. For Norwegian it is the same. In each case, it can be confusing for non-native users.
  • #28 Ok, here we go.
  • #29 Yeah, no. I did make a test page, so you can play around later.
  • #30 This has more support, though.
  • #31 https://caniuse.com/#search=hyphens: Chrome < 55 and Android 4.0 Browser support "-webkit-hyphens: none", but not the "auto" property. It is advisable to set the @lang attribute on the HTML element to enable hyphenation support and improve accessibility. In case you wanted to disable hyphens.
  • #32 A sample using Flexbox for layout. 6 columns of text at just too narrow a width.
  • #33 Once those words are allowed to hyphenate, we have a proper six columns for that width. All I am doing is adding and removing the lang attribute.
  • #34 Specifically quote marks. Not inspirational messages.
  • #35 Free, built-in localization.
  • #36 I have an example of dialog. This may be familiar to some of you.
  • #37 All I am doing is changing the lang value to demonstrate how the browser switches the quote marks. This was done in Firefox, works in WebKit browsers. Did not work in Edge this morning.
  • #39 There is a lot happening under the hood. Stuff you need not manage other than using the right attribute/value.
  • #40 NVDA actively changes the pronunciation. Worst or best German text track ever for club music?
  • #41 JAWS prepends the block of words with the chosen language. Does not change pronunciation for my English-only system.
  • #43 Bringing it back to where we started. This is how things used to be. But was with that –x-Hixie thing?
  • #44 The –x-[word] is for private-use sub-tags. Note, “avoided whenever possible.” But wait… it gets better…
  • #45 There is a normative spec. This means you too can make up your own language and have it considered acceptable by some standards. Also, don’t. Don’t be WHATWG.
  • #47 The most exciting talk you have ever seen about a single HTML attribute. Maybe.