Your SlideShare is downloading. ×

Language Weaver

1,751

Published on

My analysis of language weaver and how it can be integrated into an analyst workflow

My analysis of language weaver and how it can be integrated into an analyst workflow

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,751
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Language Weaver Components Web Service or C++ API Specific Processor: 2 GHz, Pentium 4 Translation Language Weaver Decoder RAM: 2 GB Server Engine Hard Disk Space: 100 MB Operating System: Microsoft Windows 2000 Domain Language Customizer Dictionary Model Advanced Server SP3 or Microsoft Windows 2003 Language Weaver should be installed on a dedicated machine
  • 2. Web Service or C++ API Specific Translation Engine Specific Language Weaver Decoder Translation Engine Improves translation quality using specific data Domain Language Customizer Dictionary Model The specific translation engine is built by processing pre-translated documents. LW software “learns” from what it sees in pre-translated documents, so the first step in making the translation output more accurate and appropriate for your organization is to give the software your organization’s translated documents. This information can come from an existing translation system, knowledge bases, human resources documents, technical manuals, etc. Language Weaver will need digital versions of both the original document and its translation; the more data we have, the better the translation output Then the processed documents are “aligned” with the translated versions using a Language Weaver alignment tool Once the documents are gathered, each document and its respective translation needs to be aligned at the sentence or segment level. To help automate this process, users can use Language Weaver’s alignment tool or a third party alignment application. This alignment process prepares the parallel corpus shown in the graphic on the previous page. If needed the process can be repeated to improve accuracy and relevance. Specific translation engines may already exist within the Agency.
  • 3. Web Service or C++ API Web Service or C++ API Specific Language Weaver Decoder Translation Engine 2 interfaces exist for sending and receiving translations Domain Language Customizer Dictionary Model 1) Web Service – suggested as a loosely coupled interface • Language Weaver supplies a WSDL for interfacing systems to use as the contract on how to use Language Weaver. • Technology agnostic – Web Service communication allows any technology able to adhere to the WSDL methods and Simple Object Access Protocol (SOAP) to leverage Language Weaver. 2) C++ API – Local or Remote Windows only interface • Two licenses (local or remote) depending on need • Local license used for applications or processes running on same machine as Language Weaver (LW). Remote license communicates to LW via port-port communications. • Recommended for systems or products embedding LW. Input Formats • Both interfaces allows translation of any size: • Documents • Snippets of text • A single document • Supported input formats: • Plain text, HTML, TMX,XLIFF, PDF, ODF (Open office/MS office documents)
  • 4. Web Service or C++ API Other Components Specific Language Weaver Decoder Translation Engine Improve translation fidelity using specific lists and data Domain Language Customizer Dictionary Model Decoder – main processing unit within Language Weaver • The commander of the translation process • Leverages other components (Language Model, Dictionary etc) and uses statistical analysis to produce translations Domain Dictionary • A set of domain specific words (Nouns, labels, adjectives) • Bilingual entries that are matched at run-time against the incoming text. • Users can create multiple specialized dictionaries to meet ongoing needs. Customizer • A set of domain specific words (Nouns, labels, adjectives) • Allows users to customize Language Weaver’s baseline translation software (on a small scale) to a specific subject domain. • Users can create multiple specialized dictionaries to meet ongoing needs. • Customization process is done in-house so that the system can be continuously updated and sensitive data stays secure
  • 5. Suggested Architectures A Translation Sub-System Determine document type Main System Translation sub-system Identify language pair Perform any cleansing Assign priority Content to Pre-process Check translation version to translate document ensure most recent translation engine is being use Or ensure that a specific translation engine is being Translation applied. Document Version check Object Submitted by Language Topics Apply metadata Weaver Dictionary updates Translate Status (needs human Translation review, success….) Sub-system Provides: • Creates a central point for translation logic and application of specific meta-data • Translated versions stored and displayed when requested • Populates central corpus
  • 6. Suggested Architectures •Determine if document is a duplicate Automated Batch Process •Identify language pair •Perform any cleansing •Assign priority Content to Queue translate •Alert specific user Post Translation •Index translation Processing Sub-system •Add to corpus •Keywords, Hot words •People, Places, Things Entity Store •Dates Extraction Translation •Topics Allows for separate translation factory to run independently of main system • Creates a central point for translation logic • Translated versions stored and displayed when requested • Populates central corpus and facilitates post-processing of translated content Negatives • Development of a sub-system requires additional resources and time • Not needed IF: • Translation throughput is slow (0-10 documents/hour) • Latency in user seeing translation is acceptable
  • 7. Suggested Architectures Real-Time Batch Process Language Weaver Content to translate Translation Web Service Store Translation Simplest implementation • Creates a central point for translation logic • Translated versions stored and displayed when requested • Populates central corpus Negatives • Latency between user seeing translated document may be in the neighborhood of 10 sec. • May have scaling issues This is suggested as the iteration 1 goal since it can be expanded to any other architecture.

×