7.pdf This presentation captures many uses and the significance of the number...
Language Weaver
1. Language Weaver Components
Web Service or C++ API
Specific
Processor: 2 GHz, Pentium 4
Translation
Language Weaver Decoder
RAM: 2 GB Server
Engine
Hard Disk Space: 100 MB
Operating System:
Microsoft Windows 2000 Domain Language
Customizer
Dictionary Model
Advanced Server SP3 or
Microsoft Windows 2003
Language Weaver should be installed on a dedicated machine
2. Web Service or C++ API
Specific Translation Engine Specific
Language Weaver Decoder Translation
Engine
Improves translation quality using specific data Domain Language
Customizer
Dictionary Model
The specific translation engine is built by processing pre-translated documents.
LW software “learns” from what it sees in pre-translated documents, so the first step in making the
translation output more accurate and appropriate for your organization is to give the software your
organization’s translated documents. This information can come from an existing translation system,
knowledge bases, human resources documents, technical manuals, etc. Language Weaver will need
digital versions of both the original document and its translation; the more data we have, the better
the translation output
Then the processed documents are “aligned” with the translated versions using a
Language Weaver alignment tool
Once the documents are gathered, each document and its respective translation needs to be
aligned at the sentence or segment level. To help automate this process, users can use Language
Weaver’s alignment tool or a third party alignment application. This alignment process prepares
the parallel corpus shown in the graphic on the previous page.
If needed the process can be repeated to improve accuracy and relevance.
Specific translation engines may already exist within the Agency.
3. Web Service or C++ API
Web Service or C++ API Specific
Language Weaver Decoder Translation
Engine
2 interfaces exist for sending and receiving translations Domain Language
Customizer
Dictionary Model
1) Web Service – suggested as a loosely coupled interface
• Language Weaver supplies a WSDL for interfacing systems to use as the contract on how
to use Language Weaver.
• Technology agnostic – Web Service communication allows any technology able to adhere
to the WSDL methods and Simple Object Access Protocol (SOAP) to leverage Language
Weaver.
2) C++ API – Local or Remote Windows only interface
• Two licenses (local or remote) depending on need
• Local license used for applications or processes running on same machine as Language
Weaver (LW). Remote license communicates to LW via port-port communications.
• Recommended for systems or products embedding LW.
Input Formats
• Both interfaces allows translation of any size:
• Documents
• Snippets of text
• A single document
• Supported input formats:
• Plain text, HTML, TMX,XLIFF, PDF, ODF (Open office/MS office documents)
4. Web Service or C++ API
Other Components Specific
Language Weaver Decoder Translation
Engine
Improve translation fidelity using specific lists and data Domain Language
Customizer
Dictionary Model
Decoder – main processing unit within Language Weaver
• The commander of the translation process
• Leverages other components (Language Model, Dictionary etc) and uses statistical
analysis to produce translations
Domain Dictionary
• A set of domain specific words (Nouns, labels, adjectives)
• Bilingual entries that are matched at run-time against the incoming text.
• Users can create multiple specialized dictionaries to meet ongoing needs.
Customizer
• A set of domain specific words (Nouns, labels, adjectives)
• Allows users to customize Language Weaver’s baseline translation software (on a small
scale) to a specific subject domain.
• Users can create multiple specialized dictionaries to meet ongoing needs.
• Customization process is done in-house so that the system can be continuously updated
and sensitive data stays secure
5. Suggested Architectures
A Translation Sub-System
Determine document type
Main System Translation sub-system Identify language pair
Perform any cleansing
Assign priority
Content to Pre-process
Check translation version to
translate document ensure most recent translation
engine is being use
Or ensure that a specific
translation engine is being
Translation
applied.
Document Version check
Object
Submitted by
Language
Topics
Apply metadata Weaver
Dictionary updates
Translate
Status (needs human
Translation
review, success….)
Sub-system
Provides:
• Creates a central point for translation logic and application of specific meta-data
• Translated versions stored and displayed when requested
• Populates central corpus
6. Suggested Architectures
•Determine if document is a duplicate
Automated Batch Process •Identify language pair
•Perform any cleansing
•Assign priority
Content to
Queue
translate
•Alert specific user Post Translation
•Index translation
Processing Sub-system
•Add to corpus
•Keywords, Hot words
•People, Places, Things Entity Store
•Dates
Extraction Translation
•Topics
Allows for separate translation factory to run independently of main system
• Creates a central point for translation logic
• Translated versions stored and displayed when requested
• Populates central corpus and facilitates post-processing of translated content
Negatives
• Development of a sub-system requires additional resources and time
• Not needed IF:
• Translation throughput is slow (0-10 documents/hour)
• Latency in user seeing translation is acceptable
7. Suggested Architectures
Real-Time Batch Process
Language Weaver
Content to
translate Translation Web
Service
Store
Translation
Simplest implementation
• Creates a central point for translation logic
• Translated versions stored and displayed when requested
• Populates central corpus
Negatives
• Latency between user seeing translated document may be in the neighborhood of 10 sec.
• May have scaling issues
This is suggested as the iteration 1 goal since it can be
expanded to any other architecture.