(ここの文字サイズ大きくする、マウスをレーザーポインタにする)
Hi everyone. Thank you for coming my session.
//Can you hear me? OK
//Can you see the session title?
あにょん はせよ
なぬん しみじゅかわ いむにだ
こんにちは
I’m Takayuki Shimizukawa came from Japan.
This is the second time of speaker at PyCon in Korea.
This is a capture image from PyCon APAC site.
(クリック)
Today I’ll give a talk relates Languages.
Title is: Easy contributable document internationalization process with Sphinx.
This is Agenda of this session.
1. Sphinx introduction & Setup for i18n.
2. A easy way and Non easy ways to translate3. Sphinx i18n feature4. Automated translation process with several services
5. Tips
Before proceeding to the subject, I want to ask you.
How many people do you use Open Source Software?
-> 10, 20, ... Thanks.
A-> Obviously, most people are using OSS.
B-> Oh, It's surprising. Half of the people don't use OSS.
How many people may have contributed to the OSS?
-> 1, 2, ... Thanks,
A-> Well,... it's a very small number.
B-> Oh, a lot of people have contributed. It's very good news.
IMO, using OSS is the first contribution.
Also we can say that introducing the OSS to other people is another contribution.
So, almost (a lot of / a half of) people who have raised your hand to the first question is already contributing to the OSS.
Many of the OSS provides the document in English.
How do you think, Does the translation into familiar language contribute to other developers? Yes?
10, 20,... Thanks.
A-> most people think it's a contribution, I agree it completely.
B-> Looks not too many. I guess everyone can read English, right?
In Japan, many programmers aren't familiar to English.
In such situations, translated documents helps me when I introduce some library to other developers.
Because of this, I occasionally translate documents.
I think, by your translation of a OSS document, more people will be able to use the OSS.
(クリック)
If more than one person can join such translation work easily, it would be great.
In this session, I'll introduce how to make a mechanism which is easy to join as a translator.
In order to make the mechanism, we'll use such tools and services in this session.
Tools; sphinx, docutils, sphinx-intl, transifex-client
Services; Transifex, Drone.io.
First subject, I'll introduce the sphinx introduction and setup Sphinx for i18n.
What is Sphinx?
Sphinx is a documentation generator.
Sphinx generates doc as several output formats from reStructuredText markup that is an extensible.
(ポインタでinputとoutputを指す)
This man, Georg Brandl is the father of Sphinx.
(クリック)
Until 2007, python official document was written by LaTeX.
But, it's too hard to maintenance.
Georg was trying to change such situation.
(クリック)
So then, he created Sphinx in 2007.
Sphinx provides usability and maintainability of the Python official document.
Sphinx before and after.
Before
There was no standard ways to write documents. One of example is a Python official document. It was written by LaTeX and several some python scripts jungle.
And, Sometime, we need converting markups into other formats
Since sphinx has been released,
* We can generate more multiple output format from single source.
* Integrated html themes make read docs easier.
* API references can be integrated into narrative docs.
* Automated doc building and hosting by ReadTheDocs service.
Nowadays, sphinx has been used by these libraries and tools.
Python libraries/tools: Python, Sphinx, Flask, Jinja2, Django, Pyramid, SQLAlchemy, Numpy, SciPy, scikit-learn, pandas, fabric, ansible, awscli, …
And Non python library/tools also using Sphinx for them docs: Chef, CakePHP(2.x), MathJax, Selenium, Varnish
Nowadays, sphinx has been used by these libraries and tools.
Python libraries/tools: Python, Sphinx, Flask, Jinja2, Django, Pyramid, SQLAlchemy, Numpy, SciPy, scikit-learn, pandas, fabric, ansible, awscli, …
And Non python library/tools also using Sphinx for them docs: Chef, CakePHP(2.x), MathJax, Selenium, Varnish
Sphinx extend the i18n mechanism to provide translation feature.
In this session, let's take a look the i18n feature.
How to install Sphinx with i18n tools.
You need to install sphinx to build a sphinx document.
Also you need to install sphinx-intl and transifex-client for translation.
You should use newest packages.
At first, you should get source code whether from github or other place to translate it.
In this case, the source code already has a sphinx doc directory.
So, type "make html" in doc directory to generate html output.
You can see the output in _build/html directory.
Now you can see the directories/files structure, like this.
(ポインタ)
* Target library for doc that is cloned from existent repository.
* Document build output that was generated by "make html"
* Document source that was generated by "sphinx-quickstart"
(クリック)
This is a conf.py file in the doc directory.
You should add these 2 lines to the conf.py
* Line-3rd: set a language code you want to use for output doc.
* Line-4th: set a directory relative path from conf.py location. The locale directory will have translation catalog files.
Now setup is over.
Well, it is the main subjects from here.
Easy contributable internationalization process with Sphinx.
internationalization
What is internationalization? It's usually called I18N that I’ve already mentioned.
The internationalization process is sometimes called translation or localization enablement without rewriting the source code from a technical point of view.
"without rewriting the source code" is important.
Easy contributable.
What is easy contributable?
Some software users or document readers have wanted to contribute by translation.
But, if the preparation of the translation takes long time or requires many kind of skill, they are frustrated and give up.
"Easy contributable" is the STATE that will be accepted the contributions immediately.
So, for the comparison, I'll introduce the three cases of difficult to contribute.
First one.
The manual are provided only in the HTML files.
Of course, you can translate it.
But, usually the html files are frequently updated.
How do you follow the original changes to update your translations?
This is the first case. I have no idea how to follow the original changes.
2nd example.
When the HTML manual are generated from reST files and docstrings in the source files,
You can rewrite original files to translate it.
But this approach has three problems:No1. You must be careful to maintain reStructuredText structure.No2. It's hard to divide translation tasks for a number of volunteer translators.No3. It's hard to follow the upstream document source that is frequently updated.
last example.
OK, we are engineers. We can use git!
(上を読む)
There are many many steps to complete our translations.
Additionally, in this case, the repository owner should allow write access permission for each translators.
This is also not a easy way.
Well then, what is a easy way?
My opinion is this.Left side is Not a easy way.
Right side is A easy way.
We need a easy way.
The combination of the Sphinx i18n feature and some of the services give you a easy way.
That's what is the goal of this session
OK, let's move forward.
3rd Subject. Sphinx i18n feature.
I'll explain the detail of the feature.
Sphinx i18n feature is composed of two functions.
First one is the generation pot files from reST files.
pot file is a famous translation catalog format for gettext that is used for i18n, you know.
Second one is the generation translated html files with using po files that is translated version of catalogs.
It's the human work to prepare the translated po files from the pot files.
Like this, Converting the POT files into PO files is the job of humans.
(クリック)In this flow, Author generate pot file at first part, and then, Author ask translators to translate them.
(クリック)Translators receive the pot files from Author in some way, then translate them.
(クリック)Once it has been finished, translators send back the po files to Author.
Let's look a little more detail from first part.
Author can use "make gettext" command to generate pot files.
"make gettext" command extracts text from reST files and python sources that referenced by autodoc.
It means, you can translate them into any language without rewriting the original reST files and python source files.
As you see, pot translation catalog makes easy to translate from one language to another languages."msgid" line have a original sentence.
"msgstr" line will have translated one.
Translated PO files should have as a right side structure.
"locale" directory has “ko" and it has "LC_MESSAGES" directory, and it has translated PO files.
PO files are copy of POT files with changed file's extension.
If you want to translate it into other languages, like 'ja', you can create other directories under "locale" directory.
When original POT files are updated by changing of original document, you should update PO files too.
If you had done it manually, it is very hard.
You can automated it by sphinx-intl.
At first, sphinx-intl copy pot files and rename them.
When the document is changed, translators can use sphinx-intl to update PO differences.
By same command.
As you see, translators can finish the boring tasks by one command.
sphinx-intl copy pot files and rename them on behalf of you.
sphinx-intl also update differences for each po files when the document is changed.
It's easy to use.
As a consequence of introducing the sphinx-intl tool, translators can concentrate for translation.
This is a translation work.
Translators can use their favorite editors or helpful tools/services easily that provide translation support features like a translation memory, recommending similar translation, glossary and auto-translation.
Once the translation is complete, finally let's run the "make html" command.
Finally author and translators can run "make html" and can get translated output w/o editing reST and python code.
This is a translated HTML sample.
Internally, Sphinx parse reST file and replace them into translated one by using po file.
Entire process to translate sphinx docs is this.
If you translate some sphinx document without using the i18n feature, you need to rewrite original document source files.
The Sphinx i18n feature provides a easy way to translate a document as you seen.
So far, we've found the mechanism of i18n of Sphinx.
In 4th Subject, Automated translation process with several services, we will continue to automate this mechanism.
Right now, we have seen an overall picture of the process.
Actually, I have a thing that was not pictured here.
(クリック)
Yes, these commands such as "make blah blah" requires your hands to invoke.
In other words, the translator must learn such steps, yet.
Additionally, it would be difficult to translate in parallel.
It's still not a easy way.
OK, Let's make it to be automation and parallelization.
And then, the translator doesn't need to know the mechanism.
And, make the translation work possible to parallelize.
Let's take a look at from the parallelization of the translation work.
Once you get po files, you can use helpful tools/services easily that provide translation support features like a translation memory, recommending similar translation, glossary and auto-translation.
In particular, the use of the online translation service is important for parallelization.
I'd like to introduce the Transifex as a online translation service.
Transifex provides the following functions: Upload pot and download po files by API, glossary feature, translation memory, recommending translations, automatic translation by using Google / Microsoft API.
Because you can do translation on the Web screen, you can translate them with other translators in parallel.
Translation on Transifex web console.
1. You can select one untranslated message
2. And Translate it by your self, or by machine translation
3. If the message contains long module or method name, you can click the icon to copy original to translation box w/o typing.
4. Sometimes, you are suggested translation messages from other translator, or Translation Memory. Translation Memory (called TM) pick up suggestions from similar translation in the project or its historical changes. For example, When Doc Author fix a typo, translation will be discarded. However, TM suggests previous translation that matches 99% with current original message.
5. Once you finish the translation, you can save it.
6. Optionally, reviewer can review it.
We have made translation parallelization.
Next.
Let's automate the central part.
The procedure you want to automate is made up of 6 steps.
At first, we will review the 6 steps.
First step is, Clone repository to be translated.
Step 2. make gettext to generate translation catalog.
Step 3. Upload pot translation catalog source.
Step 4. Download translated po files.
Step 5. "make html" command to generate translated document.
Step 6. Upload html to publish it
That's all.
Next, we will replace these all steps on the command line.
This is the command line that include the steps.
Line1: install sphinx and relates tools
Line2: clone repository
Line4: create a transifex account file by using sphinx-intl that reference environment variables (USERNAME and PASSWORD).
Line5: create a transifex config file by using sphinx-intl that also reference a environment variable (PROJECT_NAME)
Line6: generate pot files by make gettext
Line7: update resources information in transifex config file that also reference a environment variable (POT_DIR)
Line8: push pot files to transifex
Line9: pull translated po files from transifex
Line10: make html to generate translated html with using translated catalogs.
Now we replaced 5 steps on the command line, except HTML uploading part.
I'll explain the part later.
As you seen above, You can automate the process by this script.
Rest of work is preparing a environment to run the script automatically.
As one of the environment, I'll introduce the drone.io service.
The drone.io is a continuous integration service.
Drone.io integrates seamlessly with Github, Bitbucket and Google Code to make setup fast and simple.
Drone.io also integrates with Amazon, Heroku, Google AppEngine and more. For custom deployments you can use SSH and shell scripts.
You can use the service for public project without charge.
In my case, I usually use Github and S3 integrations.
Process of that is,
(クリック)
1. Push notification from GitHub
2. That invoke the script you've seen
3. Then, deploy html files onto S3 storage. (You need a publish html setting of S3)
If you set WebHook notification URL to transifex hook setting, updating translations also invoke the drone.io script.
By using drone.io service, automation can be achieved.
Upper-Left process is replaced with Right-Down one.
1 WebHook from Github invokes the script.
The script execute commands 2, 3, 4 and 5.
Finally drone.io deployment feature will deploys HTML to AWS S3 storage.
As this, we got the automated mechanism by using drone.io environment.
Maybe you can achieve this with using Travis-CI or CirculeCI or some other CI services.
In summary, let's look at the results of the automation at each point of view as author and translators.
At first, a point of view from the doc author.
When author push to the repository, translation source of Transifex will be updated, and author can check the translated html in the site.
Doc Author doesn't require annoying procedure.
The second is, a point of view from doc translators.
They can translate the document in parallel with;
No git, No file, No conflict.
Translation source are updated Automatically
They can get a translated HTML output w/o hand-build.
It means that the translation contributors can focus on the translation.
By the mechanism of automation, both of doc author and translators will provide the maximum effect with a minimum of effort.
In order to make the mechanism, we'll use such tools and services in this session.
Tools; sphinx, docutils, sphinx-intl, transifex-client
Services; Transifex, Drone.io.
I'd like to introduce some of the tips.
How to handle URLs.
By current implementation of Sphinx, hyperlink target in reST files doesn’t appear in POT files.
It means you can’t change the URL in the hyperlink target.
If you want to change URLs into translation version, you can use Embedded URLs notation (like above example).
How to change language w/o rewriting conf.py
Because of you are not a owner of documentation, you can’t change the conf.py in the document.
In such case, you can use SPHINXOPTS make option.
It also works with other targets not only ’make html’.
linkcheck target is useful if your translation contains URLs. It will find typos in translation text.
These 2 examples are using earlier techniques.