• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
LMC Whitepaper
 

LMC Whitepaper

on

  • 243 views

 

Statistics

Views

Total Views
243
Views on SlideShare
243
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    LMC Whitepaper LMC Whitepaper Document Transcript

    • Lingsoft Language Management CentralWhite paper, 2012-10-25IntroductionWritten and spoken language are central parts of human communication, be it in everyday lifeor in business. The ability to express yourself clearly and effectively is thus a very important andvalued skill.In an increasingly globalised and interconnected world, the need for multilingualcommunication becomes ever more pressing. And while multilingualism opens doors to newopportunities and has the potential of expanding your field of operations significantly, it alsointroduces new challenges in communication.In business, there is a clear need for correct, accurate and consistent language. A documentriddled with misspellings or grammatical errors is difficult to read, and may give the audiencean unfavourable impression of your business. Furthermore, a document with inconsistentterminology, that is, using multiple different terms to express the same thing, makes it all themore complicated to understand what is actually being said.In a world where effective and unambiguous communication is crucial, you cannot afford not totake language seriously.What Is Lingsoft Language Management?In a nutshell, language management encompasses every tool, resource, methodology andworkflow that aims to handle or process various aspects of language needs. Lingsoft www.lingsoft.fi 1 (12 )
    • This includes: ● Spell checking and grammar checking of documents ● Stylistic checking and handling of brand language ● Terminology management and terminology lookup ● Speech recognition ● Translations ● Automated information extraction ● Handling of search terms in search enginesLingsoft Language Management is a comprehensive and complete service for all of yourlanguage needs. With Lingsoft LM you can focus on very specific areas of language, such asspell checking or terminology management, or combine several areas into a holistic languagemanagement solution.Lingsoft Language Management Central (LSLMC)All of Lingsoft LM Services are offered through our centralised system, Lingsoft LanguageManagement Central (LS LMC). Hosted in the cloud, LS LMC operates on the Software-as-a-Service (SaaS) model.Thanks to its modular architecture, LS LMC can be embedded into virtually any business applicationthat benefits from its services. By providing all of our services through a centralised system, it allowsfor easy integration of multiple services, and for combining the services to enable all of your languagemanagement needs. Lingsoft www.lingsoft.fi 2 (12 )
    • Who needs LS LMC?Thanks to its large feature set and versatility, LS LMC is a crucial component in a great variety ofdifferent industries, businesses and organisations. Examples include: ● The newspaper and magazine publishing industry ● Television media ● The healthcare sector ● Educational institutions ● Public administration ● The IT sector Lingsoft www.lingsoft.fi 3 (12 )
    • Example: Publishing industryTechnical OverviewLS LMC is a .NET server based solution, hosted in Lingsoft’s network, which exposes interfacesfor integration over the web. All interfaces are available as either WebServices/SOAP or REST/JSON. This allows for easy integration in any number of applications and platforms. Lingsoft www.lingsoft.fi 4 (12 )
    • All of Lingsoft’s language tools are offered via different language services, where each languageservice targets a specific area of language management (for more information, see LanguageManagement Services below). Comprehensive integration and API documentation is availablefor all services.Located in a state-of-the-art data center, LS LMC is robust, stable, fast and reliable.Applications and componentsBesides the possibility of integrating LS LMC in your application or solution, the followingapplications and components, which are available from Lingsoft, utilise LS LMC:Lingsoft LMCornerLMCorner is an add-in for MS Word 2010, through which all of the LM services you haveordered are available, and allows you to take advantage of them directly in MS Word. Lingsoft www.lingsoft.fi 5 (12 )
    • CKEditor Plug-inFor integration of LM services in web applications, Lingsoft maintains a plug-in for the popularCKEditor web text editor, which makes it possible to use LS LMC for proofing and terminologyhandling. CKEditor is easy to integrate in just about any web application, and existingintegrations exist for Joomla! and Drupal content management systems.Lucene Search Expansion ComponentFor use in search systems, Lingsoft maintains a component for the popular Lucene informationretrieval library, which provides search expansion (see below) using the services of LS LMC.Language Management ServicesLingsoft LMC currently offers the following services:Morphological Analysis and Dependency Parsing Lingsoft www.lingsoft.fi 6 (12 )
    • Since its foundation, Lingsoft has been a leading developer of language analyser tools,particularly morphological and sentence analysers. The now standard two-level morphology(TWOL), constraint grammar (CG) and finite-state transducer (FST) technologies pioneered byLingsoft forms the core of Lingsoft’s analyser technology.Morphological analysis essentially allows you to analyse word forms and get their part ofspeech, information on the inputted inflected forms as well as the dictionary forms, orbaseforms. This is crucial for processing of inflecting languages such as Finnish, where a singleword may have thousands of different forms. An example from Finnish would be inputtingkädessäni in my hand to find käsi hand, while an English example would be to input left toget left (adverb), left (adjective) and leave (verb). Lingsoft’s morphological tools can also findthe borders in compound words, for example käsikirja ‘handbook’ can return käsi ‘hand’ andkirja ‘book’.As is seen, words with entirely different meaning and part of speech may be overlapping,and can return several different baseform. If a word returns multiple baseforms, sentencedisambiguation can identify the desired baseform based on the sentence structure; an Englishexample would be left in he left for school, which would return leave (verb).It is also possible to generate specific word forms as needed, for example for Finnish, requestingthe inessive singular form with possessive suffix of käsi hand would return kädessäni, while forEnglish, requesting the past participle of leave would return left.In search systems, having the ability to extend your searches using morphological data canvastly expand the number of results returned by a search engine; for more information, seeSearch Systems below.Available languages (and domains) Lingsoft www.lingsoft.fi 7 (12 )
    • Analysis: Finnish (General, Medical, EU), Swedish, Norwegian Bokmål, Norwegian Nynorsk,Danish, German, English, Swahili, RussianDisambiguation: Finnish, Swedish, Norwegian Bokmål, Danish, EnglishGeneration: Finnish, Swedish, Norwegian Bokmål, Norwegian Nynorsk, Danish, GermanLingsoft also offer customisation services, where tailor-made morphological services aredesigned and created for your specific language domain and needs, taking advantage ofLingsoft’s long standing expertise in language technology and linguistics.Besides morphological and sentence-level analysis, Lingsoft also provides sentence dependencyanalysis, which charts the roles and relationships between words in a sentence; a very simpleexample would be the dog eats food, where the dog is the subject, eats is the predicate, andfood is the direct object. There are a number of applications for this, chief among them beinginformation extraction utilised for example in business intelligence systems. Based on statisticallearning methods, the dependency parser can be trained for any language, and for any languagedomain.Applied Use: Search SystemsLS LMC can be easily integrated in your search system solution. Using LS LMC, when searchingfor a word or term, it is possible to automatically also search for inflected forms of the word,and vice versa, using a process called baseform reduction: ● example from English: left returns left (adverb), left (adjective) and leave (verb) ● example from Finnish: kädessäni in my hand finds käsi hand ● example from Swedish: sprungit ‘to have run’ finds springa ‘to run’It is also possible to split compound words into its constituent parts: ● English: handbook → hand + book ● Finnish: käsikirja → käsi + kirja ● Swedish: handdbok → hand + bokThe two features above are crucial to search expansion, where search queries and can beradically simplified and the results pool greatly expanded. For heavily inflected languages andfor languages with many compound words, these two features are absolutely crucial. In Finnish,where a single word may have thousands of different forms, it would be impossible to manuallykeep track of the fact that a single keyword may have so many different forms. Instead, using Lingsoft www.lingsoft.fi 8 (12 )
    • search expansion via LS LMC, it becomes a simple two-step process: 1. When indexing the documents, all keywords are first reduced to their baseform. This way, all keywords are entered in the same form 2. When searching, the search terms are reduced to their baseform, ensuring that they match up with the indexed keywordsThus, regardless of which form of the word was in the original document, it is always possible tofind the document.Search expansion with LS LMC is easy to integrate in your search system solution, and Lingsoftprovides a Lucene component that can be plugged in any Lucene-compatible solution.ProofingLingsoft provides proofing solutions for a large number of languages and scenarios. Besidesstandard checking tools for spelling, grammar and hyphenation, the proofing service alsocontains configurable stylistic checking rules. The proofing service can be combined with theterminology service (see below) for terminology checking, providing a comprehensive and all-encompassing proofing solution.Available languages (and domains)Spelling: Finnish (General, Medical, EU), Swedish, Norwegian Bokmål, Norwegian Nynorsk,Danish, German, English, SwahiliGrammar: Finnish (General, Medical, EU, IT), Swedish (General, Finland-Swedish), NorwegianBokmål, DanishHyphenation: Finnish, Swedish, Norwegian Bokmål, Norwegian Nynorsk, Danish, German,SwahiliLingsoft also offers customisation services, where tailor-made proofing services are designedand created for your specific language domain and needs, taking advantage of Lingsoft’s longstanding expertise in language technology and linguistics.Speech recognitionLingsoft is a leading provider of speech recognition technology in Finland, having been deployedin a multitude of applications and environments, such as dictation systems, audio closecaptioning, customer support dialogue systems. Lingsoft’s primary speech recognition language Lingsoft www.lingsoft.fi 9 (12 )
    • is Finnish.Terminology ManagementLS LMC offers a fully fledged terminology management system, where the entire termmanagement workflow is supported. This entails: ● Administration and editing of customer terms and customer termbases ● Term search ● Term proposal ● Finding accepted and rejected terms in your textSeveral third-part tools are supported, among them Interverbum TermWeb and TradosMultiterm. Multilingual termbases are supported. Throughout the term management workflow,the standard TBX XML format is used to ensure compatibility and interoperability with a largenumber of applications and systems. Lingsoft www.lingsoft.fi 10 (12 )
    • The Terminology Management service can easily be combined with other services; forexample Accepted/Rejected terms can be combined with the proofing service, so as to providecomprehensive brand language proofing. Key terminology can be utilized when creating searchengine indexes, and the possibility to utilize terminologies for creating totally customizedspellchecker.LS LMC Terminology Management is offered through a set of tightly coupled interfaces:Intelligent Term SearchA key strength of Lingsoft’s Terminology Management service is the use of morphological datato expand its term searches; while searching for or highlighting a specific term in the termbase,inflected forms can also be found and highlighted.The Intelligent Term Search service uses Lingsofts language tools for search expansion, andallows for a number of different search targets: ● custom termbase ● static termbase (for example MOT-dictionaries) ● public terminologies (f.ex. WordNet) ● Lingsoft language tools functioning as terminologies (for example synonym dictionaries)Term HighlightingUsing term highlighting, it is possible to submit a portion of text in order to have termscontained in the text recognised and identified as such. One can for example use the accepted/rejected distinction, in order to find terms that are not to be used in a text, and should bereplaced. The service returns positional and other term-related information, and there is greatflexibility in selecting target termbases.Term SuggestionIf a term is unrecognized, the customer has the possibility to propose adding it to the termbase.The Term Suggestion service gives the possibility to select a target termbase for the suggestion,and to add basic information about the suggestion. The terminology administrator will see thesuggestions as unprocessed terms; if the administrator approves the term, LS LMC will publish itin use through the platform. Lingsoft www.lingsoft.fi 11 (12 )
    • Applied Use: Brand Language ManagementConsistent terminology is one of the key factors in successful communication. With the help of LanguageManagement Central, terminology can be effectively managed and utilized in different situations.Corporate terms and other supported terminologies are stored into the Language Management Centralterminology repository. All language tools in Language Management Central, which can consumeexternal terms, can be connected to the same common terminology. In this way language tools arealways aware of the supported terminology and can guide the end-users in its use consistently andeffectively.The end-users are exposed to the terminology in various ways: dictionary tools in addition to otherglossaries and termbanks, proofreading and style checking tools, machine translation, informationretrieval, term highlighting and so on. Also standard texts that the user can add to a document areincluded in the same termbank.The user may also make suggestions to the corporate terms through LS LMC. This is a handy way ofcollecting candidates for terminology updates as term collection is available in the same user interface asthe normal tasks are performed.TranslationsLS LMC exposes interfaces to several popular machine translation services, allowing for a greatvariety of source and target languages. Lingsoft www.lingsoft.fi 12 (12 )