October 9,
2018
Jim DeFabia
@drdephobia
Preparing an Open Source Documentation Repository for Translations
2018 HPCC Systems® Community
Day
Preparing an
Open Source
Documentation
Repository
for
Translations
Why Internationalize?
Preparing an Open Source Documentation Repository for
Translations
Out of the world’s
approximately 7.5
billion inhabitants, 1.5
billion speak English
— that’s [only] 20% of
the Earth’s population.
Source-Babbel.com and ASIST
Why Internationalize?
Preparing an Open Source Documentation Repository for
Translations
55%
English
Speaking
11%
6%
4%
3%
3%
3%
2%
Portal Visitors
United States
India
Brazil
Philippines
China
France
Germany
Peru
Canada
Ireland
Japan
Russia
South Korea
Netherlands
Australia
Italy
Vietnam
Spain
Our HPCC Systems®
Portal visitors are more
English-centric.
But…
If we support more
languages that could
change.
Source-HPCCSystems®
Why Internationalize?
Preparing an Open Source Documentation Repository for
Translations
Our HPCC Systems®
Portal visitors are more
English-centric.
But…
If we support more
languages that could
change.
Source-HPCCSystems®
Why Internationalize?
Should we lose half
of
our potential
audience?
* No actual camels were harmed in
the
creation of this slide
Preparing an Open Source Documentation Repository for
Translations
Why Internationalize?
Should we lose
half of
our potential
audience?
Preparing an Open Source Documentation Repository for
Translations
Why Internationalize?
Should we
lose half of
our potential
audience?
Preparing an Open Source Documentation Repository for
Translations
How to Internationalize?
The Babel fish is small yellow and leech-like, and probably the oddest thing in the
Universe. It feeds on brainwave energy received … and then excretes into the mind
of its carrier a telepathic matrix formed by combining the unconscious thought
frequencies with nerve signals picked up from the speech centres of the brain
which has supplied them.
The practical upshot of this is that if you stick a Babel fish in your ear you can
instantly understand anything said to you in any form of language. The speech
patterns you actually hear decode the brainwave matrix which has been fed into
your mind by your Babel fish.
Preparing an Open Source Documentation Repository for
Translations
How to Internationalize?
In 2013, Comrise, an HPCC
Systems partner, gave us a
wonderful gift!
A Chinese translated version
of several of our manuals!!
The ECL Programmers Guide
(version 4.2.0.3) is still
available on the HPCC
Systems® Portal
The other books were
incorporated into our online
training. Preparing an Open Source Documentation Repository for
Translations
How to Internationalize?
Thanks, Comrise!
Preparing an Open Source Documentation Repository for
Translations
How to Internationalize?
•Static (version 4.2)
•Not easily maintainable
•Has become outdated
Preparing an Open Source Documentation Repository for
Translations
This one-off translation was nice to have but:
So what SHOULD we do?
How to Internationalize?
Use the Source, Luke!
• Our documentation source is
already in XML (Text) format
• Easy to compare differences
• GIT DIFF
• Other tools
• Minor differences handled in
house
• Major differences (e.g., adding
a new 200 page book) would
be sent to translation company
Preparing an Open Source Documentation Repository for
Translations
How to Internationalize?
Since HPCC Systems® went Open Source, all of our documentation source files
are in DocBook XML.
Keep it Simple!
• Translate sources from a checkpoint
• Later only translate the differences (delta)
By translating ONLY the delta,
translation costs are dramatically reduced!
Preparing an Open Source Documentation Repository for
Translations
Open Source to the rescue!
Git Diff
Before After
Preparing an Open Source Documentation Repository for
Translations
Git Diff
Before After
Preparing an Open Source Documentation Repository for
Translations
Git Diff
PDF DocBook
Preparing an Open Source Documentation Repository for
Translations
How to Internationalize
First Steps:
Where are we gonna put it all?
Preparing an Open Source Documentation Repository for
Translations
How to Internationalize
First Steps:
Where are we gonna put it all?
Preparing an Open Source Documentation Repository for
Translations
Preparing an Open Source Documentation Repository for
Translations
We reorganized folders using the IETF tags to support each language
• EN-US
• PT-BR
• ZH-CN
• Etc.
We chose to use a two-part IETF code:
• Primary code that identifies the language (e.g., “EN")
• Sub-code that specifies the national variety (e.g., "GB" or "US" ).
This allows us to support variants of languages, if the need ever arises.
Preparing an Open Source Documentation Repository for
Translations
Preparing an Open Source Documentation Repository for
Translations
Hello World in Australian English (EN-AU)
OUTPUT('G'day, Mate!');
Preparing an Open Source Documentation Repository for
Translations
Preparing an Open Source Documentation Repository for
Translations
Preparing an Open Source Documentation Repository for
Translations
Naming Conventions
Preparing an Open Source Documentation Repository for
Translations
Name that file…
DOCBOOK_TO_PDF( ${FO_XSL} ECLR-includer.xml "ECLLanguageReference_${DOC_LANG}" "ECLR_mods")
This produces a filename of:
ECLLanguageReference_EN_US-7.0.0-1.pdf
Naming Convention
Preparing an Open Source Documentation Repository for
Translations
A rose by any other name…
Common Elements
• Some Images
• Logos
• Warning
• Tip
• Icons
• Version.xml (used locally)
Preparing an Open Source Documentation Repository for
Translations
Not So Common Elements
Creative Commons License
Preparing an Open Source Documentation Repository for
Translations
This document is licensed under
the Creative Commons License CC
BY-ND 3.0 applicable to the
jurisdiction of the principal location
of the user, as available; otherwise,
the CC BY-ND 3.0 Unported
https://creativecommons.org/licens
es/by-nd/3.0/
Language-specific Images
ECL Watch in Portuguese
Preparing an Open Source Documentation Repository for
Translations
Do Not Translate Tags
Preparing an Open Source Documentation Repository for
Translations
Do Not Translate Tags
Preparing an Open Source Documentation Repository for
Translations
• Tools, tools, tools
NLP++ to the rescue
Preparing an Open Source Documentation Repository for
Translations
NLP++ to the rescue
<informaltable colsep="1" frame="all" rowsep="1">
<tgroup cols="3">
<colspec colwidth="147.60pt" />
<colspec colwidth="147.60pt" />
<colspec colwidth="147.60pt" />
<thead>
<row>
<entry align="left"><!-- DNT-Start -->Field Name<!-- DNT-End --></entry>
<entry align="left">Type</entry>
<entry align="left">Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><!-- DNT-Start -->FirstName<!-- DNT-End --></entry>
<entry>15 Character String</entry>
<entry>First Name</entry>
</row>
<row>
<entry><!-- DNT-Start -->LastName<!-- DNT-End --></entry>
<entry>25 Character String</entry>
<entry>Last name</entry>
</row>
.
.
.
Preparing an Open Source Documentation Repository for
Translations
Do Not Translate
Preparing an Open Source Documentation Repository for
Translations
Beneficial Side Effects
We found bugs in our source files
• Special Characters had been
introduced during initial
import/conversion
• -- em dashes
• ... Ellipses
• Smart Quotes aren’t so smart
• “Smart”
vs
• "Dumb"
• In total, we found 1,450 errors
introduced by autocorrect!
Preparing an Open Source Documentation Repository for
Translations
(Thank You, AutoCorrect)
Beneficial Side Effects
Preparing an Open Source Documentation Repository for
Translations
• We learned that we need more code reuse
• Included content only needs translation once
• Easier to maintain in any language
• More Efficient
Side Effects
Preparing an Open Source Documentation Repository for
Translations
Side Effects
Preparing an Open Source Documentation Repository for
Translations
• Automated Builds and Independent Builds
• Include PT-BR in automated build
process
• Other languages can follow
• Independent Doc Builds
• English
• PT-BR
• All
Translation
English
<para>This tutorial assumes:</para>
<itemizedlist>
<listitem>
<para>
You have a running HPCC. This can be
a VM Edition or a single or
multinode HPCC platform
</para>
</listitem>
</itemizedlist>
<para>You have the ECL IDE
<footnote><para>
The ECL IDE (Integrated
Development Environment) is the tool
used to create queries into your
data and ECL files with which to
build your queries.
</para>
</footnote>
installed and configured</para>
Portuguese
<para>Este tutorial presume que:</para>
<itemizedlist>
<listitem>
<para>
Você tem um HPCC em execução. Ele pode
ser a VM Edition ou uma plataforma HPCC
com um ou mais nós
</para>
</listitem>
</itemizedlist>
<para>Você tem o ECL IDE
<footnote><para>
O ECL IDE (Ambiente de desenvolvimento
integrado) é uma ferramenta usada para
criar consultas em seus dados e arquivos
ECL com os quais suas consultas serão
compiladas.
</para>
</footnote>
instalado e configurado</para>
Preparing an Open Source Documentation Repository for
Translations
Translation
Preparing an Open Source Documentation Repository for
Translations
What’s Next?
Preparing an Open Source Documentation Repository for
Translations
Today Brazil,
Tomorrow the WORLD!
To Do List
• Translate to more languages
• Add more books
• Build translated CHM files
• Screen Shots from translated ECL Watch
• Further engage the open source
community
Preparing an Open Source Documentation Repository for
Translations
It’s a small, small, small, small world.
Preparing an Open Source Documentation Repository for
Translations
Shameless Plug: The HPCC Systems® Cookbook
Preparing an Open Source Documentation Repository for
Translations
I want to contribute to documentation but…
What if my idea
isn’t good enough?
Preparing an Open Source Documentation Repository for
Translations
44
What if my idea isn’t good enough?
Preparing an Open Source Documentation Repository for
Translations
45
Someone once said in a meeting
Hey, let’s make a movie about a tornado full of sharks!
I want to contribute to documentation but…
 I don’t know DocBook XML
 I don’t know Git, GitHub, or your
procedures for pull requests
• Well let me present…
Preparing an Open Source Documentation Repository for
Translations
46
The HPCC Systems® Cookbook by HPCC
Systems®
Preparing an Open Source Documentation Repository for
Translations
47
The HPCC Systems® Cookbook--A collection of
recipes and tips
• ECL How To Section
• How to create a phonetic search
• How to use superfiles/superkeys and consolidate them
periodically
• Modify a Jobname
• Specify a Workunit Scope
• Tools, Tips, and Techniques Section
• How to use Git within the IDE
• System Admin How To Section
• Create a jailed SFTP site
Preparing an Open Source Documentation Repository for
Translations
48
Written by the best chefs in town—YOU!
I want to contribute to documentation but…
 I have an idea, but I don’t have
time to flesh it all out
 I’m not good at writing
Preparing an Open Source Documentation Repository for
Translations
49
Here’s WIKI!
Preparing an Open Source Documentation Repository for
Translations
50
There are many ways to contribute
Write directly in the Wiki
Write it, but submit for review / editing
Send in an idea
Let us know about a forum question
you found interesting
Submit a code example that a
colleague wrote (get permission first)
Preparing an Open Source Documentation Repository for
Translations
51
Why should I contribute?
Fame and Fortune
Work with professional editors
Help the community
CONTEST!!!
Preparing an Open Source Documentation Repository for
Translations
52
♪ ♫ Goodbye, Farewell, Auf Wiedersehen, Adieu ♪ ♫
Preparing an Open Source Documentation Repository for
Translations
Questions?
Preparing an Open Source Documentation Repository for
Translations
james.defabia@lexisnexisrisk.com
https://twitter.com/DrDePhobia
https://www.linkedin.com/in/james-defabia/
https://github.com/JamesDeFabia

Preparing an Open Source Documentation Repository for Translations

  • 2.
    October 9, 2018 Jim DeFabia @drdephobia Preparingan Open Source Documentation Repository for Translations 2018 HPCC Systems® Community Day
  • 3.
  • 4.
    Why Internationalize? Preparing anOpen Source Documentation Repository for Translations Out of the world’s approximately 7.5 billion inhabitants, 1.5 billion speak English — that’s [only] 20% of the Earth’s population. Source-Babbel.com and ASIST
  • 5.
    Why Internationalize? Preparing anOpen Source Documentation Repository for Translations 55% English Speaking 11% 6% 4% 3% 3% 3% 2% Portal Visitors United States India Brazil Philippines China France Germany Peru Canada Ireland Japan Russia South Korea Netherlands Australia Italy Vietnam Spain Our HPCC Systems® Portal visitors are more English-centric. But… If we support more languages that could change. Source-HPCCSystems®
  • 6.
    Why Internationalize? Preparing anOpen Source Documentation Repository for Translations Our HPCC Systems® Portal visitors are more English-centric. But… If we support more languages that could change. Source-HPCCSystems®
  • 7.
    Why Internationalize? Should welose half of our potential audience? * No actual camels were harmed in the creation of this slide Preparing an Open Source Documentation Repository for Translations
  • 8.
    Why Internationalize? Should welose half of our potential audience? Preparing an Open Source Documentation Repository for Translations
  • 9.
    Why Internationalize? Should we losehalf of our potential audience? Preparing an Open Source Documentation Repository for Translations
  • 10.
    How to Internationalize? TheBabel fish is small yellow and leech-like, and probably the oddest thing in the Universe. It feeds on brainwave energy received … and then excretes into the mind of its carrier a telepathic matrix formed by combining the unconscious thought frequencies with nerve signals picked up from the speech centres of the brain which has supplied them. The practical upshot of this is that if you stick a Babel fish in your ear you can instantly understand anything said to you in any form of language. The speech patterns you actually hear decode the brainwave matrix which has been fed into your mind by your Babel fish. Preparing an Open Source Documentation Repository for Translations
  • 11.
    How to Internationalize? In2013, Comrise, an HPCC Systems partner, gave us a wonderful gift! A Chinese translated version of several of our manuals!! The ECL Programmers Guide (version 4.2.0.3) is still available on the HPCC Systems® Portal The other books were incorporated into our online training. Preparing an Open Source Documentation Repository for Translations
  • 12.
    How to Internationalize? Thanks,Comrise! Preparing an Open Source Documentation Repository for Translations
  • 13.
    How to Internationalize? •Static(version 4.2) •Not easily maintainable •Has become outdated Preparing an Open Source Documentation Repository for Translations This one-off translation was nice to have but: So what SHOULD we do?
  • 14.
    How to Internationalize? Usethe Source, Luke! • Our documentation source is already in XML (Text) format • Easy to compare differences • GIT DIFF • Other tools • Minor differences handled in house • Major differences (e.g., adding a new 200 page book) would be sent to translation company Preparing an Open Source Documentation Repository for Translations
  • 15.
    How to Internationalize? SinceHPCC Systems® went Open Source, all of our documentation source files are in DocBook XML. Keep it Simple! • Translate sources from a checkpoint • Later only translate the differences (delta) By translating ONLY the delta, translation costs are dramatically reduced! Preparing an Open Source Documentation Repository for Translations Open Source to the rescue!
  • 16.
    Git Diff Before After Preparingan Open Source Documentation Repository for Translations
  • 17.
    Git Diff Before After Preparingan Open Source Documentation Repository for Translations
  • 18.
    Git Diff PDF DocBook Preparingan Open Source Documentation Repository for Translations
  • 19.
    How to Internationalize FirstSteps: Where are we gonna put it all? Preparing an Open Source Documentation Repository for Translations
  • 20.
    How to Internationalize FirstSteps: Where are we gonna put it all? Preparing an Open Source Documentation Repository for Translations
  • 21.
    Preparing an OpenSource Documentation Repository for Translations We reorganized folders using the IETF tags to support each language • EN-US • PT-BR • ZH-CN • Etc. We chose to use a two-part IETF code: • Primary code that identifies the language (e.g., “EN") • Sub-code that specifies the national variety (e.g., "GB" or "US" ). This allows us to support variants of languages, if the need ever arises. Preparing an Open Source Documentation Repository for Translations
  • 22.
    Preparing an OpenSource Documentation Repository for Translations Hello World in Australian English (EN-AU) OUTPUT('G'day, Mate!'); Preparing an Open Source Documentation Repository for Translations
  • 23.
    Preparing an OpenSource Documentation Repository for Translations Preparing an Open Source Documentation Repository for Translations
  • 24.
    Naming Conventions Preparing anOpen Source Documentation Repository for Translations Name that file… DOCBOOK_TO_PDF( ${FO_XSL} ECLR-includer.xml "ECLLanguageReference_${DOC_LANG}" "ECLR_mods") This produces a filename of: ECLLanguageReference_EN_US-7.0.0-1.pdf
  • 25.
    Naming Convention Preparing anOpen Source Documentation Repository for Translations A rose by any other name…
  • 26.
    Common Elements • SomeImages • Logos • Warning • Tip • Icons • Version.xml (used locally) Preparing an Open Source Documentation Repository for Translations
  • 27.
    Not So CommonElements Creative Commons License Preparing an Open Source Documentation Repository for Translations This document is licensed under the Creative Commons License CC BY-ND 3.0 applicable to the jurisdiction of the principal location of the user, as available; otherwise, the CC BY-ND 3.0 Unported https://creativecommons.org/licens es/by-nd/3.0/
  • 28.
    Language-specific Images ECL Watchin Portuguese Preparing an Open Source Documentation Repository for Translations
  • 29.
    Do Not TranslateTags Preparing an Open Source Documentation Repository for Translations
  • 30.
    Do Not TranslateTags Preparing an Open Source Documentation Repository for Translations • Tools, tools, tools
  • 31.
    NLP++ to therescue Preparing an Open Source Documentation Repository for Translations
  • 32.
    NLP++ to therescue <informaltable colsep="1" frame="all" rowsep="1"> <tgroup cols="3"> <colspec colwidth="147.60pt" /> <colspec colwidth="147.60pt" /> <colspec colwidth="147.60pt" /> <thead> <row> <entry align="left"><!-- DNT-Start -->Field Name<!-- DNT-End --></entry> <entry align="left">Type</entry> <entry align="left">Description</entry> </row> </thead> <tbody> <row> <entry><!-- DNT-Start -->FirstName<!-- DNT-End --></entry> <entry>15 Character String</entry> <entry>First Name</entry> </row> <row> <entry><!-- DNT-Start -->LastName<!-- DNT-End --></entry> <entry>25 Character String</entry> <entry>Last name</entry> </row> . . . Preparing an Open Source Documentation Repository for Translations
  • 33.
    Do Not Translate Preparingan Open Source Documentation Repository for Translations
  • 34.
    Beneficial Side Effects Wefound bugs in our source files • Special Characters had been introduced during initial import/conversion • -- em dashes • ... Ellipses • Smart Quotes aren’t so smart • “Smart” vs • "Dumb" • In total, we found 1,450 errors introduced by autocorrect! Preparing an Open Source Documentation Repository for Translations (Thank You, AutoCorrect)
  • 35.
    Beneficial Side Effects Preparingan Open Source Documentation Repository for Translations • We learned that we need more code reuse • Included content only needs translation once • Easier to maintain in any language • More Efficient
  • 36.
    Side Effects Preparing anOpen Source Documentation Repository for Translations
  • 37.
    Side Effects Preparing anOpen Source Documentation Repository for Translations • Automated Builds and Independent Builds • Include PT-BR in automated build process • Other languages can follow • Independent Doc Builds • English • PT-BR • All
  • 38.
    Translation English <para>This tutorial assumes:</para> <itemizedlist> <listitem> <para> Youhave a running HPCC. This can be a VM Edition or a single or multinode HPCC platform </para> </listitem> </itemizedlist> <para>You have the ECL IDE <footnote><para> The ECL IDE (Integrated Development Environment) is the tool used to create queries into your data and ECL files with which to build your queries. </para> </footnote> installed and configured</para> Portuguese <para>Este tutorial presume que:</para> <itemizedlist> <listitem> <para> Você tem um HPCC em execução. Ele pode ser a VM Edition ou uma plataforma HPCC com um ou mais nós </para> </listitem> </itemizedlist> <para>Você tem o ECL IDE <footnote><para> O ECL IDE (Ambiente de desenvolvimento integrado) é uma ferramenta usada para criar consultas em seus dados e arquivos ECL com os quais suas consultas serão compiladas. </para> </footnote> instalado e configurado</para> Preparing an Open Source Documentation Repository for Translations
  • 39.
    Translation Preparing an OpenSource Documentation Repository for Translations
  • 40.
    What’s Next? Preparing anOpen Source Documentation Repository for Translations Today Brazil, Tomorrow the WORLD!
  • 41.
    To Do List •Translate to more languages • Add more books • Build translated CHM files • Screen Shots from translated ECL Watch • Further engage the open source community Preparing an Open Source Documentation Repository for Translations
  • 42.
    It’s a small,small, small, small world. Preparing an Open Source Documentation Repository for Translations
  • 43.
    Shameless Plug: TheHPCC Systems® Cookbook Preparing an Open Source Documentation Repository for Translations
  • 44.
    I want tocontribute to documentation but… What if my idea isn’t good enough? Preparing an Open Source Documentation Repository for Translations 44
  • 45.
    What if myidea isn’t good enough? Preparing an Open Source Documentation Repository for Translations 45 Someone once said in a meeting Hey, let’s make a movie about a tornado full of sharks!
  • 46.
    I want tocontribute to documentation but…  I don’t know DocBook XML  I don’t know Git, GitHub, or your procedures for pull requests • Well let me present… Preparing an Open Source Documentation Repository for Translations 46
  • 47.
    The HPCC Systems®Cookbook by HPCC Systems® Preparing an Open Source Documentation Repository for Translations 47
  • 48.
    The HPCC Systems®Cookbook--A collection of recipes and tips • ECL How To Section • How to create a phonetic search • How to use superfiles/superkeys and consolidate them periodically • Modify a Jobname • Specify a Workunit Scope • Tools, Tips, and Techniques Section • How to use Git within the IDE • System Admin How To Section • Create a jailed SFTP site Preparing an Open Source Documentation Repository for Translations 48 Written by the best chefs in town—YOU!
  • 49.
    I want tocontribute to documentation but…  I have an idea, but I don’t have time to flesh it all out  I’m not good at writing Preparing an Open Source Documentation Repository for Translations 49
  • 50.
    Here’s WIKI! Preparing anOpen Source Documentation Repository for Translations 50
  • 51.
    There are manyways to contribute Write directly in the Wiki Write it, but submit for review / editing Send in an idea Let us know about a forum question you found interesting Submit a code example that a colleague wrote (get permission first) Preparing an Open Source Documentation Repository for Translations 51
  • 52.
    Why should Icontribute? Fame and Fortune Work with professional editors Help the community CONTEST!!! Preparing an Open Source Documentation Repository for Translations 52
  • 53.
    ♪ ♫ Goodbye,Farewell, Auf Wiedersehen, Adieu ♪ ♫ Preparing an Open Source Documentation Repository for Translations
  • 54.
    Questions? Preparing an OpenSource Documentation Repository for Translations james.defabia@lexisnexisrisk.com https://twitter.com/DrDePhobia https://www.linkedin.com/in/james-defabia/ https://github.com/JamesDeFabia

Editor's Notes

  • #11 If we had the technology from HHGTTG, we could just use Babel Fish. Since we don’t, we have to use efficient, but expensive, translation companies.