Mind Your lang
Presented by Adrian Roselli (@aardrian)
For role=drinks San Diego
Slides from this talk will be available at
rosel.li/roledrinks17
“Calm San Diego Night” by Justin Brown, CC BY-NC-SA 2.0
What Is lang?
What Is lang?
• Examples:
<html lang="en">
<html lang="en-ca">
<html lang="en-us">
<html lang="en-GB-x-hixie">
• Source:
BCP47: Tags for Identifying Languages,
https://tools.ietf.org/html/bcp47
We’ll come back to that last one.
Who Uses lang?
Who Uses lang?
• WHATWG Bug: “why do these examples of <html>
lack the lang attribute?”
This is where my research started.
“Why not? Realistically,
few people include it. It
just means the language
is unknown.”
Who Uses lang?
• “why do these examples of <html> lack the
lang attribute?”
• WHATWG HTML bug (26942)
• Reported: 2014-09-30
• Resolved: 2016-04-18
• Git merge:
• Editorial: Add lang to most examples #1061
Spoiled the surprise, I know, but we aren’t here for a bug.
Who Uses lang?
• Pulled January 2015 archive from
WebDevData.org (a W3C Community Group),
• Parsed 84,054 pages,
• Found that 39,433 pages use the lang
attribute on the <html> element,
• 47% use <html lang="…">.
12,762 use xml:lang, which is wrong.
Why Would You Use lang?
Why Would You Use lang?
• HTML 5 Specification
• HTML Validation
• Internationalization (i18)
• WCAG 2.0 A, AA
• Numbers
• Dates
• Hyphens
• Quotes
• Screen Readers
HTML 5 Specification
HTML 5 Specification
• The spec now provides a warning,
• Notes that it must match detected language of
the page,
• Identified ways which it is used,
• Added in April 2016
• add warning/advice about lang attribute use #218
https://github.com/w3c/html/issues/218
HTML 5 Specification
http://w3c.github.io/html/dom.html#lang-warning
HTML Validation
HTML Validation
• The W3C HTML validator compares the
following attributes on the page with the
detected page language:
• dir
• lang
• If there is a mismatch, the validator will
provide a warning,
• If there is no dir or lang, the validator will
provide a warning.
It will know if you lie.
HTML Validation
https://www.w3.org/blog/International/2016/07/13/w3c-html5-validator-enhanced-with-language-detection-functionality/
Internationalization (i18n)
Internationalization (i18n)
• Spelling and grammar checkers:
• spellcheck attribute (at caniuse.com)
• CSS:
• ::first-letter (at caniuse.com)
• Hanging punctuation
• Translation tools (particularly when looking at
parts of a page).
https://www.w3.org/International/questions/qa-lang-why
Internationalization (i18n)
• Font selection for CJK (for political reasons).
https://medium.com/behancetech/localization-gotchas-for-asian-languages-cjk-e52a57c0fde1
WCAG 2.0 A, AA
WCAG 2.0 A, AA
• Guideline 3.1 Readable: Make text content
readable and understandable.
• 3.1.1 Language of Page (Level A)
• H57: Using language attributes on the html
element
• 3.1.2 Language of Parts (Level AA)
• H58: Using language attributes to identify changes
in the human language
https://www.w3.org/TR/2008/REC-WCAG20-20081211/#meaning-doc-lang-id
Numbers
Numbers
• A browser can adjust decimal characters in
number fields,
• Some use comma, some use period,
• Yes, this is for Latin scripts.
• Do not worry about browser support unless
you are mixing within a page.
• In that case, Firefox is the way to go.
If left blank, the browser should go with locale settings.
Numbers
http://codepen.io/aardrian/pen/rOGYNL
Numbers
http://codepen.io/aardrian/pen/rOGYNL
Dates
Dates
• Not so much.
http://s.codepen.io/aardrian/debug/ZpgNWJ
Hyphens
Hyphens
• For browsers that support hyphens, you will
enjoy the benefit just by using the attribute.
• This assumes you use the following CSS:
• hyphens: auto;
• -ms-hyphens: auto; (ugh)
• -webkit-hyphens: auto; (also ugh)
• Browser support:
• http://caniuse.com/#search=hyphens
If left blank, the browser should go with locale settings.
Hyphens
http://codepen.io/aardrian/pen/zKVLvO
Hyphens
http://codepen.io/aardrian/pen/zKVLvO
Quotes
Quotes
• Let the browser choose the quote marks
based on the language.
• This assumes you use the following HTML:
• <q>…</q>
Obviously you can override this with CSS, but that would be silly.
Quotes
http://s.codepen.io/aardrian/debug/zKgbVv
Quotes
http://s.codepen.io/aardrian/debug/zKgbVv
Screen Readers
Screen Readers
• VoiceOver uses it to auto-switch voices.
• VoiceOver can speak using a different accent.
• JAWS uses it to load the correct phonetic engine /
phonologic dictionary.
• NVDA uses it in the same way as VoiceOver and JAWS.
• For HTML in ePub or Apple iBooks document, it affects
how VoiceOver will read the book.
• Leaving out the lang attribute may require the user to
manually switch to the correct language for proper
pronunciation.
This gist is that things can sound funny if done wrong.
Screen Readers
http://s.codepen.io/aardrian/debug/eBOrZY
NVDA:
Screen Readers
http://s.codepen.io/aardrian/debug/eBOrZY
JAWS:
Fun Facts
Fun Facts
• WHATWG HTML 5
<html class=split lang=en-US-x-hixie>
• W3C HTML 5.0
<html lang="en-US-x-Hixie">
• W3C HTML 5.1
<html lang="en">
You can confirm this by viewing the source of each.
Fun Facts
“Private-use subtags do not appear in the
subtag registry, and are chosen and maintained
by private agreement amongst parties.”
“Because these subtags are only meaningful
within private agreements and cannot be used
interoperably across the Web, they should be
used with great care, and avoided whenever
possible.”
http://www.w3.org/International/articles/language-tags/Overview.en.php#extension
Fun Facts
• There is a normative spec:
• Hixie English
• Version: 1.0-pre43
• Language Tag: en-GB-x-Hixie
• “This is a normative reference to Hixie English.
Hixie English is a variant of the language
spoken by the majority of the residents of the
United Kingdom (England) and the United
States of America.”
http://ian.hixie.ch/bible/english
Drink!
Mind Your lang
Presented by Adrian Roselli (@aardrian)
For role=drinks San Diego
Slides from this talk will be available at
rosel.li/roledrinks17
“Calm San Diego Night” by Justin Brown, CC BY-NC-SA 2.0

Mind your lang (for role=drinks at CSUN 2017)

  • 1.
    Mind Your lang Presentedby Adrian Roselli (@aardrian) For role=drinks San Diego Slides from this talk will be available at rosel.li/roledrinks17 “Calm San Diego Night” by Justin Brown, CC BY-NC-SA 2.0
  • 2.
  • 3.
    What Is lang? •Examples: <html lang="en"> <html lang="en-ca"> <html lang="en-us"> <html lang="en-GB-x-hixie"> • Source: BCP47: Tags for Identifying Languages, https://tools.ietf.org/html/bcp47 We’ll come back to that last one.
  • 4.
  • 5.
    Who Uses lang? •WHATWG Bug: “why do these examples of <html> lack the lang attribute?” This is where my research started. “Why not? Realistically, few people include it. It just means the language is unknown.”
  • 6.
    Who Uses lang? •“why do these examples of <html> lack the lang attribute?” • WHATWG HTML bug (26942) • Reported: 2014-09-30 • Resolved: 2016-04-18 • Git merge: • Editorial: Add lang to most examples #1061 Spoiled the surprise, I know, but we aren’t here for a bug.
  • 7.
    Who Uses lang? •Pulled January 2015 archive from WebDevData.org (a W3C Community Group), • Parsed 84,054 pages, • Found that 39,433 pages use the lang attribute on the <html> element, • 47% use <html lang="…">. 12,762 use xml:lang, which is wrong.
  • 8.
    Why Would YouUse lang?
  • 9.
    Why Would YouUse lang? • HTML 5 Specification • HTML Validation • Internationalization (i18) • WCAG 2.0 A, AA • Numbers • Dates • Hyphens • Quotes • Screen Readers
  • 10.
  • 11.
    HTML 5 Specification •The spec now provides a warning, • Notes that it must match detected language of the page, • Identified ways which it is used, • Added in April 2016 • add warning/advice about lang attribute use #218 https://github.com/w3c/html/issues/218
  • 12.
  • 13.
  • 14.
    HTML Validation • TheW3C HTML validator compares the following attributes on the page with the detected page language: • dir • lang • If there is a mismatch, the validator will provide a warning, • If there is no dir or lang, the validator will provide a warning. It will know if you lie.
  • 15.
  • 16.
  • 17.
    Internationalization (i18n) • Spellingand grammar checkers: • spellcheck attribute (at caniuse.com) • CSS: • ::first-letter (at caniuse.com) • Hanging punctuation • Translation tools (particularly when looking at parts of a page). https://www.w3.org/International/questions/qa-lang-why
  • 18.
    Internationalization (i18n) • Fontselection for CJK (for political reasons). https://medium.com/behancetech/localization-gotchas-for-asian-languages-cjk-e52a57c0fde1
  • 19.
  • 20.
    WCAG 2.0 A,AA • Guideline 3.1 Readable: Make text content readable and understandable. • 3.1.1 Language of Page (Level A) • H57: Using language attributes on the html element • 3.1.2 Language of Parts (Level AA) • H58: Using language attributes to identify changes in the human language https://www.w3.org/TR/2008/REC-WCAG20-20081211/#meaning-doc-lang-id
  • 21.
  • 22.
    Numbers • A browsercan adjust decimal characters in number fields, • Some use comma, some use period, • Yes, this is for Latin scripts. • Do not worry about browser support unless you are mixing within a page. • In that case, Firefox is the way to go. If left blank, the browser should go with locale settings.
  • 23.
  • 24.
  • 25.
  • 26.
    Dates • Not somuch. http://s.codepen.io/aardrian/debug/ZpgNWJ
  • 27.
  • 28.
    Hyphens • For browsersthat support hyphens, you will enjoy the benefit just by using the attribute. • This assumes you use the following CSS: • hyphens: auto; • -ms-hyphens: auto; (ugh) • -webkit-hyphens: auto; (also ugh) • Browser support: • http://caniuse.com/#search=hyphens If left blank, the browser should go with locale settings.
  • 29.
  • 30.
  • 31.
  • 32.
    Quotes • Let thebrowser choose the quote marks based on the language. • This assumes you use the following HTML: • <q>…</q> Obviously you can override this with CSS, but that would be silly.
  • 33.
  • 34.
  • 35.
  • 36.
    Screen Readers • VoiceOveruses it to auto-switch voices. • VoiceOver can speak using a different accent. • JAWS uses it to load the correct phonetic engine / phonologic dictionary. • NVDA uses it in the same way as VoiceOver and JAWS. • For HTML in ePub or Apple iBooks document, it affects how VoiceOver will read the book. • Leaving out the lang attribute may require the user to manually switch to the correct language for proper pronunciation. This gist is that things can sound funny if done wrong.
  • 37.
  • 38.
  • 39.
  • 40.
    Fun Facts • WHATWGHTML 5 <html class=split lang=en-US-x-hixie> • W3C HTML 5.0 <html lang="en-US-x-Hixie"> • W3C HTML 5.1 <html lang="en"> You can confirm this by viewing the source of each.
  • 41.
    Fun Facts “Private-use subtagsdo not appear in the subtag registry, and are chosen and maintained by private agreement amongst parties.” “Because these subtags are only meaningful within private agreements and cannot be used interoperably across the Web, they should be used with great care, and avoided whenever possible.” http://www.w3.org/International/articles/language-tags/Overview.en.php#extension
  • 42.
    Fun Facts • Thereis a normative spec: • Hixie English • Version: 1.0-pre43 • Language Tag: en-GB-x-Hixie • “This is a normative reference to Hixie English. Hixie English is a variant of the language spoken by the majority of the residents of the United Kingdom (England) and the United States of America.” http://ian.hixie.ch/bible/english
  • 43.
  • 44.
    Mind Your lang Presentedby Adrian Roselli (@aardrian) For role=drinks San Diego Slides from this talk will be available at rosel.li/roledrinks17 “Calm San Diego Night” by Justin Brown, CC BY-NC-SA 2.0