TaaS Workshop 2014, Term Mining and Terminology Management in a Corporate Setting Perspective, Luigi Muzii, sQuid


he time spent looking for and not finding information cost an organization a total of $6 million a year, not including opportunity costs or the costs of reworking existing information that could not be located. Only 41% of localization-mature organizations have some terminology management policy in place, almost solely translation-oriented. Today we will talk about how terminology management works, demonstrate its power, through controlled languages, ontologies, search engine applications, content and knowledge management applications, and e-learning systems.

  1. 1. Wednesday,  4  June  /10:50  –  11:20     Term  Mining  and  Terminology  Management  in  a  Corporate  Se@ng   PerspecCve    Luigi  Muzii,  sQuid   TaaS  Workshop  2014   4  June,  Dublin  (Ireland)   The  research  within  the  project  TaaS  leading  to  these  results  has  received  funding  from  the  European  Union  Seventh  Framework  Programme  (FP7/2007-­‐2013),  grant  agreement  no  296312  
  2. 2. Term  Mining  and   Terminology   Management   A  Corporate  Setting  Perspective  
  3. 3. Awareness   Globally  active  organizations  whose  core   business  is  not  communications-­‐related   (translation,  localization,  information   management,  etc.)  are  generally  unaware   of  the  benefits  of  performing  terminology   management.   Kara  Warburton,  LISA,  2001   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  
  4. 4. Translation-­‐ oriented   terminology   Only  41%  of  localization-­‐mature   organizations  have  some  terminology   management  policy  in  place,  almost  solely   translation-­‐oriented   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  
  5. 5. Scope   •  Technical  documentation   •  Controlled  languages   •  Translation  and  localization   •  Translation  automation   •  Content  and  Knowledge  Management   Systems   •  Knowledge  organization   •  Taxonomies  and  ontologies   •  Learning  Management  Systems   •  Knowledge  nugget  (knowledge  representation)   •  Self-­‐contained  reusable  educational  entities  (Learning  Object   Metadata,  IEEE  1484.12.1)   •  Marketing  management   •  Customer  service   •  SEM/SEO   •  Sentiment  analysis   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  
  6. 6. Integrations   Documentation   CMS   Website   Marketing   Service  &  Support   LMS   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective   CVS  
  7. 7. Costs   (IDC,  2004)   •  Productivity  of  knowledge  workers   •  15%  to  35%  searching  for  information   •  Successfully  completed  50%  of  the  time  or  less   •  Only  21%  found  the  information  they  needed  85%  to  100%  of  the  time   •  $6  million  a  year  looking  for  and  not  finding  information   •  15%  of  time  for  duplicating  existing  information   •  Opportunity  costs   •  Reworking  existing  information  that  could  not  be  located   •  $12  million  a  year   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  
  8. 8. Terminology   cost  multiplier   (Jörg  Schütz/Rita  Nübel)   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective   Product  data   Documentation  development   Authoring   Editing   Approval   Localization   Maintenance   0.1  -­‐  0.2   0.5   1.0   2.0   5.0   10.0   20.0  
  9. 9. Costs/Benefits   •  Huge  costs  in  the  short  term   •  $150  per  terminological  entry  (J.D.  Edwards,  2001)   •  The  practical  value  does  not  match  the   technical  value   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  
  10. 10. Accuracy   Fundamental accuracy of statement is the one sole morality of writing.   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  
  11. 11. Payback   •  Cost  reduction   •  Authoring,  localization,  training,  customer  service   •  Overhead   •  Time  reduction  in  the  production  cycle   •  Immediate  1%  payback  for  larger  businesses   •  Productivity  increase   •  Time-­‐to-­‐market   •  Qualitative  improvements   •  Branding   •  Safety   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  
  12. 12. Controlled   languages   The  most  valuable  of  all  talents  is  that  of   never  using  two  words  when  one  will  do.   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  
  13. 13. Fatal  errors   •  The  Linate  Airport  disaster  (Oct  8,  2001)   •  Deficiencies  in  the  airport  layout  and  procedures   •  Violations  of  ICAO  regulations   •  Incorrect  signs  to  runway   •  Incorrect,  uncorrected  readback   •  Non-­‐standard  phraseology   •  Irrelevant  term  (extension)  leading  to  fatal  misunderstanding   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  
  14. 14. Keywords   advertising   Rem  tene,  verba  sequentur   (Keep  to  the  subject,  words  will  follow)   Marcus  Porcius  Cato  (Cato  the  Censor)   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  
  15. 15. The  long  tail   Rerum  enim  copia  verborum  copiam  gignit   (All  this  gives  rise  to  a  plethora  of  words)   Cicero   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  
  16. 16. Term  mining   •  Complex  knowledge-­‐intensive  task   •  Different  approach  for  different  scope   •  Hard  to  grasp  in  a  corporate  setting  perspective   •  Business  intelligence   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  
  17. 17. Mining  terms   •  Linguistic  approach     •  Based  on  rules  and  dictionaries   •  Collocations   •  One  language  at  a  time   •  Issues   •  Loans   •  Synonyms,  variants,   abbreviations   •  Ellipses   •  Improper  usage   •  Bitext   •  Knowledge  bases   •  Knowledge  discovery   •  Statistical  approach     •  Language  independent   •  Based  on  frequency   •  Repeated  sequences  of   syntagmas   •  The  frequency  threshold   must  be  specified   •  Frequency  does  not  necessarily   means  importance   •  Much  “noise”   •  Monolingual  corpus     •  Indices   •  Controlled  languages   •  Keywords   •  TQA   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  
  18. 18. TaaS  test  drive   •  Building  a  Localization  Kit   •  13688  words,  142  repetitions   •  memoQ  Term  Extraction   •  Statistical  analysis   •  815  term  entries  from  the  English  document   •  647  term  entries  from  translation  memory   •  Tilde  Wrapper  System  for  CollTerm  (TWSC)   •  Linguistic  analysis  enriched  by  statistical  features   •  3046  term  entries   •  Kilgray  Terminology  extractor   •  Statistical  analysis   •  3218  term  entries   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  
  19. 19. Terminology   management   in  the  cloud   Pros   •  Zero  TCO   •  Availability  and  deployability   •  Collaboration  features   Cons   •  Limited  scalability   •  Security  issues   •  Integration  costs   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  
  20. 20. ROI   The  proof  of  performance,  i.e.  ROI   considerations,  of  terminology   management  within  the  corporate  setting   is  a  challenge  for  future  projects.   Stefan  Kremer,  2005   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  
  21. 21. Thank  you   Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective