Searching for the Best Machine Translation CombinationMatīss
Matīss Rikters is researching hybrid machine translation methods. He used a count-based language model for candidate selection from full translations, combining translations of sentence chunks, and combining translations of linguistically motivated chunks. He also used a character-level neural language model for candidate selection. His methods achieved BLEU scores up to 19.51. Future work includes completing experiments on English-Estonian, winning the WMT17 news translation task for English-Latvian, performing chunking on the target side, and experimenting with other language models for candidate selection.
2. Constantin Orasan (UoW) EXPERT IntroductionRIILP
The document introduces the EXPERT ITN project, which aims to train young researchers on improving data-driven machine translation through empirical approaches. The project will support researchers during their training and research, with the goal of producing future leaders in the field. It describes the objectives to improve existing corpus-based translation tools by considering user needs, collecting data, incorporating linguistic processing, and developing hybrid approaches. The project consists of 12 individual research projects across 6 work packages and is led by an academic consortium with involvement from private sector partners.
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
In this talk I'll start by introducing the recent breakthroughs in NLP that resulted from the combination of Transfer Learning schemes and Transformer architectures. The second part of the talk will be dedicated to an introduction of the open-source tools released by Hugging Face, in particular our transformers, tokenizers, and NLP libraries as well as our distilled and pruned models.
Parallel data has become an extremely valuable resource, not only for building new statistical machine translation systems, but also for building other useful resources for translators, such as bilingual concordancers, translation memories or bilingual lexicons. One of the most important and under-exploited sources of bilingual information is the Internet: many strategies have been proposed to crawl specific websites, but defining methods for surfing the whole Web and harvesting bitexts is still an open problem. Recently, the free/open-source tool Bitextor has become one of the reference tools for this task: it has been one of the basic tools featured in European projects such as Panacea or Abu-MaTran, and it has been chosen as the reference tool for the shared task on document alignment of the 1st Conference on Machine Translation (WMT 2016). In this presentation we will describe this tool, explaining the advantages when compared to other state-of-the-art tools, and the strategies chosen to crawl large amounts of parallel data.
This document discusses the pros and cons of statistical machine translation (SMT) and neural machine translation (NMT) for translation service providers and their clients. While NMT produces more fluent translations, it is harder to control and needs extensive testing, making SMT more predictable. MT can help increase translator productivity but only if the right conditions are met, such as content suited to MT, sufficient volume, and supportive workflows. Overall, MT is best viewed as a tool to aid translators rather than replace them, and human factors like motivation and compensation are important to realize any productivity gains.
The document discusses the development of KantanNeuralTM, a neural machine translation platform. It describes how the time to build neural machine translation engines has decreased from 4 weeks to 4 days to potentially 4 hours. The platform allows users to build, improve, and deploy their own neural machine translation engines through an easy-to-use interface. The system also supports seamless switching between statistical and neural machine translation methods.
Uso del diccionario monolingüe como estrategia para eligualadoholmes
El documento describe la importancia del uso del diccionario monolingüe como una estrategia efectiva para el aprendizaje del inglés. El diccionario permite consultar el significado y tipo de palabras, ver su pronunciación, y entrenar el oído con diccionarios en línea. Se recomienda elegir un diccionario más grande para encontrar más palabras y ejemplos, y se pueden encontrar diccionarios especializados para inglés británico o americano.
Panelists: Yoshiyasu Yamakawa (Intel), JP Barraza (Systran), Konstantin Dranch (Memsource), David Koot (TAUS)
The focus of this session will be on predictions and risk management. What kind of things can you predict and how can you manage risks by by analyzing your translation data or monitoring your productivity and quality. Tracking translation data in different cycles of the translation process (translation, post-editing, review, proof-reading) offers tremendous value when it comes to predicting future trends or making informed choices. What type of data can be valuable and what kind of predictions can we make using this data? How can we make more efficient use of already available data? How can we use this type of data to improve machine translation, automatic QA, error-recognition, sampling or quality estimation? How can academia and industry work together towards a common goal?
Searching for the Best Machine Translation CombinationMatīss
Matīss Rikters is researching hybrid machine translation methods. He used a count-based language model for candidate selection from full translations, combining translations of sentence chunks, and combining translations of linguistically motivated chunks. He also used a character-level neural language model for candidate selection. His methods achieved BLEU scores up to 19.51. Future work includes completing experiments on English-Estonian, winning the WMT17 news translation task for English-Latvian, performing chunking on the target side, and experimenting with other language models for candidate selection.
2. Constantin Orasan (UoW) EXPERT IntroductionRIILP
The document introduces the EXPERT ITN project, which aims to train young researchers on improving data-driven machine translation through empirical approaches. The project will support researchers during their training and research, with the goal of producing future leaders in the field. It describes the objectives to improve existing corpus-based translation tools by considering user needs, collecting data, incorporating linguistic processing, and developing hybrid approaches. The project consists of 12 individual research projects across 6 work packages and is led by an academic consortium with involvement from private sector partners.
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
In this talk I'll start by introducing the recent breakthroughs in NLP that resulted from the combination of Transfer Learning schemes and Transformer architectures. The second part of the talk will be dedicated to an introduction of the open-source tools released by Hugging Face, in particular our transformers, tokenizers, and NLP libraries as well as our distilled and pruned models.
Parallel data has become an extremely valuable resource, not only for building new statistical machine translation systems, but also for building other useful resources for translators, such as bilingual concordancers, translation memories or bilingual lexicons. One of the most important and under-exploited sources of bilingual information is the Internet: many strategies have been proposed to crawl specific websites, but defining methods for surfing the whole Web and harvesting bitexts is still an open problem. Recently, the free/open-source tool Bitextor has become one of the reference tools for this task: it has been one of the basic tools featured in European projects such as Panacea or Abu-MaTran, and it has been chosen as the reference tool for the shared task on document alignment of the 1st Conference on Machine Translation (WMT 2016). In this presentation we will describe this tool, explaining the advantages when compared to other state-of-the-art tools, and the strategies chosen to crawl large amounts of parallel data.
This document discusses the pros and cons of statistical machine translation (SMT) and neural machine translation (NMT) for translation service providers and their clients. While NMT produces more fluent translations, it is harder to control and needs extensive testing, making SMT more predictable. MT can help increase translator productivity but only if the right conditions are met, such as content suited to MT, sufficient volume, and supportive workflows. Overall, MT is best viewed as a tool to aid translators rather than replace them, and human factors like motivation and compensation are important to realize any productivity gains.
The document discusses the development of KantanNeuralTM, a neural machine translation platform. It describes how the time to build neural machine translation engines has decreased from 4 weeks to 4 days to potentially 4 hours. The platform allows users to build, improve, and deploy their own neural machine translation engines through an easy-to-use interface. The system also supports seamless switching between statistical and neural machine translation methods.
Uso del diccionario monolingüe como estrategia para eligualadoholmes
El documento describe la importancia del uso del diccionario monolingüe como una estrategia efectiva para el aprendizaje del inglés. El diccionario permite consultar el significado y tipo de palabras, ver su pronunciación, y entrenar el oído con diccionarios en línea. Se recomienda elegir un diccionario más grande para encontrar más palabras y ejemplos, y se pueden encontrar diccionarios especializados para inglés británico o americano.
Panelists: Yoshiyasu Yamakawa (Intel), JP Barraza (Systran), Konstantin Dranch (Memsource), David Koot (TAUS)
The focus of this session will be on predictions and risk management. What kind of things can you predict and how can you manage risks by by analyzing your translation data or monitoring your productivity and quality. Tracking translation data in different cycles of the translation process (translation, post-editing, review, proof-reading) offers tremendous value when it comes to predicting future trends or making informed choices. What type of data can be valuable and what kind of predictions can we make using this data? How can we make more efficient use of already available data? How can we use this type of data to improve machine translation, automatic QA, error-recognition, sampling or quality estimation? How can academia and industry work together towards a common goal?
MT best practices for price, speed AND quality, as well as Lexcelera’s machine translation case studies and services including training, integration, post-editing and hosted MT
The Innovation Language and The Social Innovation NetworkStefan Ianta
Introduction to the concepts and benefits of the Universal Innovation Language and how to implement it as the Semantic AI Internet / Digital Democracy.
This document discusses big data analytics tools for non-technical users. It introduces Tuktu, a platform that makes big data science accessible through a visual drag-and-drop interface. It also describes using deep learning models trained on linguistic resources to perform natural language tasks across languages with less effort. Finally, it presents CEMistry, a customer experience monitoring product that analyzes text, web, mobile, and backend data to build customer profiles.
AI for Translation Technologies and Multilingual EuropeGeorg Rehm
Georg Rehm. AI for Translation Technologies and Multilingual Europe. DG TRAD Conference - Translation Services in the Digital World: A Sneak Peek into the (near) Future. Luxembourg. October 16/17, 2017.
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana
Here are a few approaches to address the context demand challenge for machine translation of cultural heritage content:
- Leverage knowledge graphs and ontologies to disambiguate terms based on conceptual relationships
- Train domain-specific models on large cultural heritage corpora to capture nuances of language use in different contexts
- Perform multi-task learning to optimize models for both translation accuracy and conceptual mapping between languages
- Allow users to provide feedback to iteratively improve disambiguation of ambiguous terms over time
- Develop specialized interfaces that surface contextual clues from objects to help machine translation
The goal is to mimic how humans understand intended meaning based on surrounding context clues. Combining linguistic and conceptual techniques can help machines do the same.
Webinar automotive and engineering content 16.06.16kantanmt
High quality translations that are delivered quickly are a result of a seamless and efficient translation process, but getting to this stage requires a well thought out plan, rigorous content preprocessing techniques and most importantly, clear and transparent communication between the automated translation vendor and language service provider.
In this webinar, Christian Taube and Brian Coyle discusses how the Matrix and KantanMT partnership delivers a high quality, scalable solution that increases translation productivity and supports engineering and automotive terminology standards. The webinar uses specific case study examples including a discussion on what types of content to focus on and preparing and managing Translation Memory data. Discussion includes:
• Managing content for best results
• Preparing TM data
• Tools that generate high quality results
An MT Journey Intuit and Welocalize Localization World 2013Welocalize
Insights how Intuit, working with Welocalize, architectures a machine translation (MT) program meeting an aggressive launch schedule that now supports the entire enterprise. Presentation given at Localization World 2013 in Silicon Valley http://www.welocalize.com/welocalize-intuit-machine-translation-locworld/
Integration of speech recognition with computer assisted translationChamani Shiranthika
This document discusses the integration of speech recognition with computer-assisted translation. It begins by introducing machine translation and computer-assisted translation, then describes how automatic speech recognition works and how it can be integrated with translation. Key approaches to integration include using word graphs from ASR and MT systems or rescoring ASR hypotheses with translation models. Neural machine translation models that use attention mechanisms are also discussed. The document concludes by noting areas for further development in reducing human effort in translation and increasing quality and effectiveness of speech-to-text and translation tools.
5 challenges of scaling l10n workflows KantanMT/bmmt webinarkantanmt
In this joint presentation, Tony O’Dowd, Founder and Chief Architect of KantanMT and Maxim Khalilov, Technical Lead of bmmt deliver an overview of the MT technology currently available in the language technology market, the challenges of operating MT systems at scale and speed, and their opinions on the future trajectory of MT.
Each presentation will be grounded with client examples, and how they’ve successfully integrated MT into their localization workflows.
Finally, both presenters will finish off with a 5 point checklist for successful MT deployment based on both the MT provider and LSP point of view.
If you have any questions about this presentation or want to get in touch with either company please contact:
Louise Irwin, Marketing Specialist at KantanMT (louisei@kantanmt.com)
Peggy Linder, Operations Manager at bmmt (peggy.lindner@bmmt.eu)
Delivered at Machine Translation Summit during a special workshop on post-editing.
November 3rd 2015
Miami, Florida.
In this talk, we describe the latest advances in the world of commercial and academic machine translation development that are having the effect of improving acceptance of the technology and keeping its users happy.
1) The document discusses Pangeanic's new neural machine translation (NMT) technology called PangeaMT Neural and ActivaTM and compares its performance to their previous statistical machine translation (SMT) system.
2) Experimental results on English to Japanese translation show that NMT outperforms SMT in BLEU, TER, and WER scores, especially for shorter sentences between 0-25 words, producing translations requiring less post-editing effort.
3) Additional testing of NMT on other language pairs like English to French and Russian also showed superior results compared to SMT, with translators rating 85-90% of NMT translations as good or very good quality versus only 50-60
Managing Translation Memories for Engineering and Automotive TranslationPoulomi Choudhury
High quality translations that are delivered quickly are a result of a seamless and efficient translation process, but getting to this stage requires a well thought out plan, rigorous content preprocessing techniques and most importantly, clear and transparent communication between the automated translation vendor and language service provider.
In this webinar, Christian Taube and Brian Coyle discusses how the Matrix and KantanMT partnership delivers a high quality, scalable solution that increases translation productivity and supports engineering and automotive terminology standards. The webinar uses specific case study examples including a discussion on what types of content to focus on and preparing and managing Translation Memory data. Discussion includes:
• Managing content for best results
• Preparing TM data
• Tools that generate high quality results
This document discusses different types of machine translation, including statistical machine translation (SMT), rule-based machine translation (RBMT), and hybrid machine translation. It provides details on the SMT training and decoding processes, considerations for SMT and RBMT, common machine translation applications like Google Translate, Microsoft Translator, and SDL Language Weaver, and the types of files and content that can be machine translated.
- MT@EC is a machine translation system developed by the European Commission to provide automated translations for all 24 official EU languages.
- It was launched in 2013 to address the growing translation needs of the EU, which far exceed the translation capacity of the Commission.
- MT@EC is used both for disseminating information to understand texts in other languages, and as a tool to aid human translators in drafting translations more efficiently.
- The system continues to be improved through customization pilots with public institutions and by incorporating translator feedback to enhance quality over time.
Machine translation and its role in crisis translationkhetam Al Sharou
This presentation gives a brief introduction to Machine Translation and its role in crisis settings. Machine Translation (MT) is a technology whereby a computer produces translations from one language into another. In recent years, research in MT has experienced a huge development and nowadays fairly accurate translations can be obtained for certain language pairs. This substantial quality improvement has led to its introduction into professional translation workflows and its potential has been proven beyond the translation industry itself. In fact, in 2010 MT was proven useful as an aid in the response efforts to the Haiti earthquake. In this presentation, we will briefly see what machine translation is and how an MT engine works to then explain how it could be used in disaster preparation and relief operations.
The document describes the Tenth MT Marathon 2015 conference held in Prague, Czech Republic from September 7-12, 2015. It includes details about lectures on topics like MT evaluation, language modelling, and neural network models. It also lists papers presented on various MT topics as well as keynotes from organizations like Google. Labs and projects related to MT are mentioned, including tools like Moses, cdec, and Treex.
Language Resources for Multilingual EuropeGeorg Rehm
META-NET has received funding from the EU to support several language technology projects, including CRACKER, T4ME, CESAR, METANET4U, and META-NORD. It brings together over 60 research centers across 34 countries to build infrastructure for sharing language resources and tools. The goal is to improve the visibility, documentation, identification, availability, and interoperability of language resources in order to support both academic and commercial language technology research and development across Europe.
Learn the different approaches to machine translation and how to improve the ...SDL
SDL provides machine translation solutions to customers. They have a team of over 50 professionals across various locations that work on driving MT adoption, building custom engines, and conducting linguistic projects. SDL's approach involves evaluating data, training machine translation engines, testing outputs, and refining engines through an iterative process with a focus on maximizing quality. They provide customized solutions through domain-specific engines and language verticals to meet the needs of different customers and content types.
Dr. Dimitar Shterionov (KantanLabs) and Laura Casanellas (KantanMT Professional Services) presented very interesting results gleaned from a comparative ranking of Neural and Statistical MT systems. These systems were developed with KantanMT and ranked using the KantanLQR quality evaluation platform. As ranked by Professional Translators, Neural MT demonstrated clear quality improvements in terms of fluency and adequacy compared to equivalent statistical based outputs.
MT best practices for price, speed AND quality, as well as Lexcelera’s machine translation case studies and services including training, integration, post-editing and hosted MT
The Innovation Language and The Social Innovation NetworkStefan Ianta
Introduction to the concepts and benefits of the Universal Innovation Language and how to implement it as the Semantic AI Internet / Digital Democracy.
This document discusses big data analytics tools for non-technical users. It introduces Tuktu, a platform that makes big data science accessible through a visual drag-and-drop interface. It also describes using deep learning models trained on linguistic resources to perform natural language tasks across languages with less effort. Finally, it presents CEMistry, a customer experience monitoring product that analyzes text, web, mobile, and backend data to build customer profiles.
AI for Translation Technologies and Multilingual EuropeGeorg Rehm
Georg Rehm. AI for Translation Technologies and Multilingual Europe. DG TRAD Conference - Translation Services in the Digital World: A Sneak Peek into the (near) Future. Luxembourg. October 16/17, 2017.
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana
Here are a few approaches to address the context demand challenge for machine translation of cultural heritage content:
- Leverage knowledge graphs and ontologies to disambiguate terms based on conceptual relationships
- Train domain-specific models on large cultural heritage corpora to capture nuances of language use in different contexts
- Perform multi-task learning to optimize models for both translation accuracy and conceptual mapping between languages
- Allow users to provide feedback to iteratively improve disambiguation of ambiguous terms over time
- Develop specialized interfaces that surface contextual clues from objects to help machine translation
The goal is to mimic how humans understand intended meaning based on surrounding context clues. Combining linguistic and conceptual techniques can help machines do the same.
Webinar automotive and engineering content 16.06.16kantanmt
High quality translations that are delivered quickly are a result of a seamless and efficient translation process, but getting to this stage requires a well thought out plan, rigorous content preprocessing techniques and most importantly, clear and transparent communication between the automated translation vendor and language service provider.
In this webinar, Christian Taube and Brian Coyle discusses how the Matrix and KantanMT partnership delivers a high quality, scalable solution that increases translation productivity and supports engineering and automotive terminology standards. The webinar uses specific case study examples including a discussion on what types of content to focus on and preparing and managing Translation Memory data. Discussion includes:
• Managing content for best results
• Preparing TM data
• Tools that generate high quality results
An MT Journey Intuit and Welocalize Localization World 2013Welocalize
Insights how Intuit, working with Welocalize, architectures a machine translation (MT) program meeting an aggressive launch schedule that now supports the entire enterprise. Presentation given at Localization World 2013 in Silicon Valley http://www.welocalize.com/welocalize-intuit-machine-translation-locworld/
Integration of speech recognition with computer assisted translationChamani Shiranthika
This document discusses the integration of speech recognition with computer-assisted translation. It begins by introducing machine translation and computer-assisted translation, then describes how automatic speech recognition works and how it can be integrated with translation. Key approaches to integration include using word graphs from ASR and MT systems or rescoring ASR hypotheses with translation models. Neural machine translation models that use attention mechanisms are also discussed. The document concludes by noting areas for further development in reducing human effort in translation and increasing quality and effectiveness of speech-to-text and translation tools.
5 challenges of scaling l10n workflows KantanMT/bmmt webinarkantanmt
In this joint presentation, Tony O’Dowd, Founder and Chief Architect of KantanMT and Maxim Khalilov, Technical Lead of bmmt deliver an overview of the MT technology currently available in the language technology market, the challenges of operating MT systems at scale and speed, and their opinions on the future trajectory of MT.
Each presentation will be grounded with client examples, and how they’ve successfully integrated MT into their localization workflows.
Finally, both presenters will finish off with a 5 point checklist for successful MT deployment based on both the MT provider and LSP point of view.
If you have any questions about this presentation or want to get in touch with either company please contact:
Louise Irwin, Marketing Specialist at KantanMT (louisei@kantanmt.com)
Peggy Linder, Operations Manager at bmmt (peggy.lindner@bmmt.eu)
Delivered at Machine Translation Summit during a special workshop on post-editing.
November 3rd 2015
Miami, Florida.
In this talk, we describe the latest advances in the world of commercial and academic machine translation development that are having the effect of improving acceptance of the technology and keeping its users happy.
1) The document discusses Pangeanic's new neural machine translation (NMT) technology called PangeaMT Neural and ActivaTM and compares its performance to their previous statistical machine translation (SMT) system.
2) Experimental results on English to Japanese translation show that NMT outperforms SMT in BLEU, TER, and WER scores, especially for shorter sentences between 0-25 words, producing translations requiring less post-editing effort.
3) Additional testing of NMT on other language pairs like English to French and Russian also showed superior results compared to SMT, with translators rating 85-90% of NMT translations as good or very good quality versus only 50-60
Managing Translation Memories for Engineering and Automotive TranslationPoulomi Choudhury
High quality translations that are delivered quickly are a result of a seamless and efficient translation process, but getting to this stage requires a well thought out plan, rigorous content preprocessing techniques and most importantly, clear and transparent communication between the automated translation vendor and language service provider.
In this webinar, Christian Taube and Brian Coyle discusses how the Matrix and KantanMT partnership delivers a high quality, scalable solution that increases translation productivity and supports engineering and automotive terminology standards. The webinar uses specific case study examples including a discussion on what types of content to focus on and preparing and managing Translation Memory data. Discussion includes:
• Managing content for best results
• Preparing TM data
• Tools that generate high quality results
This document discusses different types of machine translation, including statistical machine translation (SMT), rule-based machine translation (RBMT), and hybrid machine translation. It provides details on the SMT training and decoding processes, considerations for SMT and RBMT, common machine translation applications like Google Translate, Microsoft Translator, and SDL Language Weaver, and the types of files and content that can be machine translated.
- MT@EC is a machine translation system developed by the European Commission to provide automated translations for all 24 official EU languages.
- It was launched in 2013 to address the growing translation needs of the EU, which far exceed the translation capacity of the Commission.
- MT@EC is used both for disseminating information to understand texts in other languages, and as a tool to aid human translators in drafting translations more efficiently.
- The system continues to be improved through customization pilots with public institutions and by incorporating translator feedback to enhance quality over time.
Machine translation and its role in crisis translationkhetam Al Sharou
This presentation gives a brief introduction to Machine Translation and its role in crisis settings. Machine Translation (MT) is a technology whereby a computer produces translations from one language into another. In recent years, research in MT has experienced a huge development and nowadays fairly accurate translations can be obtained for certain language pairs. This substantial quality improvement has led to its introduction into professional translation workflows and its potential has been proven beyond the translation industry itself. In fact, in 2010 MT was proven useful as an aid in the response efforts to the Haiti earthquake. In this presentation, we will briefly see what machine translation is and how an MT engine works to then explain how it could be used in disaster preparation and relief operations.
The document describes the Tenth MT Marathon 2015 conference held in Prague, Czech Republic from September 7-12, 2015. It includes details about lectures on topics like MT evaluation, language modelling, and neural network models. It also lists papers presented on various MT topics as well as keynotes from organizations like Google. Labs and projects related to MT are mentioned, including tools like Moses, cdec, and Treex.
Language Resources for Multilingual EuropeGeorg Rehm
META-NET has received funding from the EU to support several language technology projects, including CRACKER, T4ME, CESAR, METANET4U, and META-NORD. It brings together over 60 research centers across 34 countries to build infrastructure for sharing language resources and tools. The goal is to improve the visibility, documentation, identification, availability, and interoperability of language resources in order to support both academic and commercial language technology research and development across Europe.
Learn the different approaches to machine translation and how to improve the ...SDL
SDL provides machine translation solutions to customers. They have a team of over 50 professionals across various locations that work on driving MT adoption, building custom engines, and conducting linguistic projects. SDL's approach involves evaluating data, training machine translation engines, testing outputs, and refining engines through an iterative process with a focus on maximizing quality. They provide customized solutions through domain-specific engines and language verticals to meet the needs of different customers and content types.
Dr. Dimitar Shterionov (KantanLabs) and Laura Casanellas (KantanMT Professional Services) presented very interesting results gleaned from a comparative ranking of Neural and Statistical MT systems. These systems were developed with KantanMT and ranked using the KantanLQR quality evaluation platform. As ranked by Professional Translators, Neural MT demonstrated clear quality improvements in terms of fluency and adequacy compared to equivalent statistical based outputs.
Tony (Chief Architect, KantanMT.com) opens the proceedings with a temporal look at how MT technology has progressed. While embracing Rule Based MT in the 1970s, the industry switched over to Statistical MT around 2002 and is now faced with a new paradigm of Neural MT in 2016. For each technology progression, improved translation quality and fluency were achieved.
Summary: https://www.youtube.com/watch?v=19yyDa6mAsc
Full video: https://www.youtube.com/watch?v=EtbML0DTNHk
2017 will see the emergence of Machine Translation 2.0, and KantanNeural signals a giant step towards using cutting-edge technology to improve automated translation accuracy and increase productivity.
In this webinar, Tony provides an overview of KantanNeural and discuss how users can translate documents using NMT. He discusses how to evaluate the translation quality of the NMT engines with the new A/B testing feature on KantanLQR™. Dimitar briefly talks about the benefits of translating using Neural technology and the future development plans for NMT at KantanLabs.
YouTube: https://youtu.be/_2yIZxVqqmw
This webinar will discuss connecting machine translation systems to various CAT tools, the benefits of customized MT systems such as instant deployment and support, and how to involve reviewers in the MT process to improve quality. It will also cover topics such as how many users can access the MT system, what types of content are best for MT, options for customers without sufficient translation memories, and various pricing models.
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systemskantanmt
Translation quality management is key for Project Managers to improving the translation process. Producing high quality translations from the start of projects will reduce costs and improve speed to market.
When considering automated translation, we think of automatic metrics, such as BLEU, F-Measure and TER and how they can correlate with the translation quality. However, the step of reviewing translation output for MT engine retraining is still a very manual process incorporating multiple iterations of excel documents. In this presentation, Brian will discuss how the process can be automated and the impact automation will have on reducing costs and increasing translation productivity.
Cross Border Selling: Breaking the Language Barrier with Automated Translationkantanmt
This document summarizes a webinar about using machine translation (MT) for cross-border e-commerce. It discusses how MT can help businesses sell across borders by overcoming language barriers. Specific benefits mentioned include faster translation to new markets, leveraging back catalogues to increase sales, improving communication to reduce costs, and providing auto-usable translations directly on websites. Case studies demonstrate productivity gains and cost savings when using MT for e-commerce catalogues and customer support.
Go global with this Winning Combination – Content strategy and Machine Transl...kantanmt
Reaching customers in new target locales requires an enterprise-wide content strategy that will circumvent language and cultural barriers, fit seamlessly into existing content workflows and not break the bank.
In this webinar Brian discusses how to develop a flawless content strategy by bringing the power of Custom Machine Translated content in the mix.
YouTube link: https://youtu.be/HG8-9vlKZkk
Cloud computing in its various forms can offer significant business advantages for companies large and small. However, for companies considering moving their operations to the cloud many options exist and choices can be confusing and challenging. Not least of the many concerns are those about security and safety of data and indeed cloud computing poses both opportunity and risk in this regard. With the goal of drawing back the curtain on cloud security and helping companies make more informed choices on their cloud security posture IC4 is hosted a workshop on the challenges and opportunities of cloud security.
During this workshop, Dr Dimitar Shterionov, Machine Translation Researcher in KantanMT, presented a case study on the topic of cloud security and how it is implemented in a real-world business scenario.
New Ways to Engage Clients with Custom Machine Translationkantanmt
Brian Coyle, Chief Commercial Officer at KantanMT talks about the solid benefits of integrating a powerful Machine Translation tool in a localization project. He shares measurable and significant market facts and figures in order to discuss how Custom Machine Translation engines are a cut above broad-based MT systems. Brian goes on to discuss some of the main features that any scalable, powerful MT system must include in order to improve translation productivity for projects, and by extension to increase the returns for LSP clients.
Learning outcome:
• How Custom Machine Translation will help LSPs improve and enhance their service offerings
• The tangible quantitative benefits of integrating MT within the translation workflow
• What are the “must-have features” to look for when identifying a suitable MT system
• The webinar will empower Project Managers by providing them with information and industry insights that can be utilised to pitch projects to enterprise clients, and thereby bring bigger projects to the table
Improving your Bottom Line with Custom Machine Translationkantanmt
KantanMT’s Chief Commercial Officer, Brian Coyle provides in-depth insights into the growing Machine Translation (MT) industry, including the benefits of integrating Custom MT into existing workflows to generate exceptional localization cost savings by reducing translation time and increasing productivity.
This presentation is relevant to anyone selling products and services in global markets, or those who aim to enter newer markets fast, before their competitors.
You will learn:
• About the necessity of Machine Translation to be competitive
• Quantitative benefits of integrating MT within the Localization workflow
• What questions to ask your Localization Partner when choosing an MT system
How to Achieve Agile Localization for High-Volume Content with Machine Transl...kantanmt
This slide deck on achieving agile localization for high-volume content with the help of Machine Translation was presented by Tony O’Dowd, Founder and Chief Architect at KantanMT during the annual tcworld conference 2015, which was held in Stuttgart, Germany. It outlines the best practices for developing and implementing a dynamic and agile localization strategy that integrates Custom Machine Translation (CMT) into the localization workflow, with the final aim of developing a scalable localization strategy that makes it possible to create and publish high-volume multilingual content.
KantanMT Founder and Chief Architect, Tony O'Dowd and Technical Project Manager, Louise Faherty show you how to improve the translation productivity of your team, manage post-editing effort and translation project schedules better with powerful Machine Translation engines.
You will learn:
• How to deal with Translation challenges
• About the necessity of Machine Translation to be competitive
• How KantanMT.com can be integrated with existing Translation Management Systems
How to save 16 million euro for your start up businesskantanmt
The document discusses KantanMT, a statistical machine translation platform. It provides an overview of KantanMT's capabilities including being cloud-based, scalable, and providing high-quality translations through fusion of translation memory, machine translation, and rules. The document then discusses KantanMT's journey and growth, leveraging the cloud to maximize performance and availability while minimizing costs. It highlights how the cloud provides abundant and elastic computing resources to power KantanMT's machine translation engines.
What is the Economic Case for Machine Translation?kantanmt
Machine Translation (MT) is a productivity tool in the production workflow with the potential to significantly boost a company’s economic performance. In today’s world, one of the greatest challenges an organisation faces is how to increase profits when revenue streams become saturated.
This presentation covers the economic arguments in favour of including Machine Translation into existing content production workflows.
For more information about KantanMT.com, or to sign up to the platform, contact us (sales@kantanmt.com).
Tips for Preparing Training Data for High Quality Machine Translationkantanmt
This document discusses tips for preparing high quality training data for machine translation systems. It covers:
- The key factors that influence training data quality are quantity, quality, and relevance to the domain. Balancing these is important.
- Suitable training data sources include translation memories, terminology databases, and client translated documents.
- Statistical machine translation systems use bilingual and monolingual data to form patterns and map source to target language. Additional data and rules can improve accuracy.
- Data preparation includes preprocessing, training the translation and language models, and postprocessing. Ensuring data is clean, normalized, and domain relevant improves results.
This document discusses building and measuring machine translation engines using KantanMT. It includes sections on building your first engine in 5 minutes, types of training data, factors to consider like quality, relevance and quantity of data. It also discusses automated measurements for MT like F-measure, BLEU score and TER, and how Kantan BuildAnalytics can provide comparative measurements between engines. The document provides an overview of key aspects of creating and evaluating MT systems with KantanMT.
Breaking Language Barriers: Machine Translation for eCommercekantanmt
73% of online shoppers prefer buying in their native language – for businesses selling online this means translating thousands of product descriptions into that locale’s target language.
Translating dynamic product description content needs a highly automated workflow that can be easily scaled to meet demand. Machine translation is the solution – data driven MT can mimic product description styles quickly and effectively to produce translations that are fit for their purpose.
Aimed at eCommerce localization professionals, this presentation will give invaluable tips on how to develop an MT workflow that can be used to reach new markets, and attract and keep loyal customers.
Tony O’Dowd presented at the Cloud On-Boarding Clinic in the Business School, Dublin City University, Ireland. The clinic, which was hosted by the Irish Centre for Cloud Computing and Commerce (IC4) aims to help companies leverage the benefits of using cloud computing in their businesses.
Tony's presentation, ‘Cloud Computing It’s a world of complexity!’ draws upon Tony’s experience successfully launching a cloud based company, and addresses four of the greatest challenges facing companies wishing to implement a cloud infrastructure across their businesses.
For more information about KantanMT contact; info@kantanmt.com.
How to set up a high tech business in the Cloud for 2,000 EURkantanmt
All small business owners have big ideas on how they want to streamline their business and drive sales. However, to help achieve this they need business applications, which are often expensive, complex to install and configure, and challenging to manage and maintain. When there is a problem or software needs to be updated even contacting technical support for help can be a painful experience.
KantanMT is a successful cloud based statistical machine translation business and this presentation narrates a story of how to set up a new, high technology business from scratch using only cloud based business applications, and for less than €2,000!
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
A Comprehensive Guide to DeFi Development Services in 2024Intelisync
DeFi represents a paradigm shift in the financial industry. Instead of relying on traditional, centralized institutions like banks, DeFi leverages blockchain technology to create a decentralized network of financial services. This means that financial transactions can occur directly between parties, without intermediaries, using smart contracts on platforms like Ethereum.
In 2024, we are witnessing an explosion of new DeFi projects and protocols, each pushing the boundaries of what’s possible in finance.
In summary, DeFi in 2024 is not just a trend; it’s a revolution that democratizes finance, enhances security and transparency, and fosters continuous innovation. As we proceed through this presentation, we'll explore the various components and services of DeFi in detail, shedding light on how they are transforming the financial landscape.
At Intelisync, we specialize in providing comprehensive DeFi development services tailored to meet the unique needs of our clients. From smart contract development to dApp creation and security audits, we ensure that your DeFi project is built with innovation, security, and scalability in mind. Trust Intelisync to guide you through the intricate landscape of decentralized finance and unlock the full potential of blockchain technology.
Ready to take your DeFi project to the next level? Partner with Intelisync for expert DeFi development services today!
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on automated letter generation for Bonterra Impact Management using Google Workspace or Microsoft 365.
Interested in deploying letter generation automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfflufftailshop
When it comes to unit testing in the .NET ecosystem, developers have a wide range of options available. Among the most popular choices are NUnit, XUnit, and MSTest. These unit testing frameworks provide essential tools and features to help ensure the quality and reliability of code. However, understanding the differences between these frameworks is crucial for selecting the most suitable one for your projects.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
KantanFest: Andy Way
1. The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
Andy Way
ADAPT Centre @ Dublin City University
Separating the Hype from the Reality:
Neural Machine Translation
8. www.adaptcentre.ieTranslators are quite well-disposed towards technology
| 8
“The public thinks that technology takes
care of translation. Good technology is all
about making good translators better”
Jost Zetzsche (@jeromobot)
24th June 2017
14. www.adaptcentre.ieSo what can we learn from the past?
Why did we start doing SMT (and Hybrid MT)?
We wrote a (I thought) good paper on EBMT, submitted to
ACL, and were rejected by all three reviewers. Why?
Because we hadn't compared our results with 'state-of-
the-art' SMT!
| 14
16. www.adaptcentre.iePhrase-Based SMT then came along in earnest
• “Let the data decide”!
• But what have we spent the
last 10 years doing?
Smuggling in Syntax,
Semantics, and (lately)
Discourse features to
break through the glass
ceiling.
| 16
17. www.adaptcentre.iePhrase-Based SMT then came along in earnest
• “Let the data decide”!
• But what have we spent the
last 10 years doing?
Smuggling in Syntax,
Semantics, and (lately)
Discourse features to
break through the glass
ceiling.
| 17
18. www.adaptcentre.ieSMT & Linguistics
SMT practitioners know now about the value of
linguistic information
cf. Alex Fraser's keynote at EAMT-16:
agreement phenomena (gender, person, number, case),
verbal inflection,
compounding,
terminology,
lexical/structural ambiguity,
pronouns ...
| 18
19. www.adaptcentre.ieWhat’s happened since?
Deep Learning came along and took off!
“Let the data decide”!
Recent (accepted) ACL 2016 paper on SMT:
“you haven't compared your results with
'state-of-the-art' NMT”!
| 19
21. www.adaptcentre.ieWhat is the actual situation?
• Wins for NMT for numerous language pairs at IWSLT/WMT 2015 & WMT 2016
• Bentivogli et al. (2016 – arxiv; EMNLP)
– IWSLT 2015 English-German: NMT compared to 4 SMT systems
– Automatic Evaluation:
• NMT outperforms SMT system in any length bin, with statistically significant differences
– Human Evaluation:
• NMT makes at least 19% fewer morphology errors than SMT
• NMT makes at least 17% fewer lexical errors than SMT
• NMT translations require about 50% fewer shifts than SMT
• NMT reduces verb order errors by 70% with respect to best SMT system
• NMT reduces noun order errors by 47% with respect to best SMT system
• NMT gains also for prepositions (-18%), negation particles (-17%) and articles (-4%)
• NMT generates outputs that considerably lower the overall post-editing effort w.r.t best SMT
system (-26%)
| 21
22. www.adaptcentre.ieOther Use-Cases
• NMT for E-Commerce
• NMT for Patents
• NMT for MOOCs
[Castilho et al. 2017, EAMT]
• Five other human evaluations of NMT/SMT at
EAMT 2017 (inc. from ))
| 22
23. www.adaptcentre.ieNMT for E-Commerce
• Translate product listings
• Systems (Calixto et al. 2017—EACL):
• (1) a PBSMT baseline model built with the Moses SMT Toolkit
• (2) a text-only NMTt model
• (3) a multi-modal NMT model (NMTm)
• English into German
• Data set: 24k parallel product listings + images
• Validation/test data: 480/444 tuples
• 18 German native speakers
• Ranking
• Translations from the 3 systems + product image
• Adequacy (Likert scale 1- All of it to 4- None of it)
• Source + translation + product image
| 23
24. www.adaptcentre.ieNMT for E-Commerce
• AEM:
• PBSMT outperforms both NMT models (BLEU, METEOR and chrF3)
• NMTm performs as well as PBSMT (TER)
• Adequacy
• NMTm performs as well as PBSMT
• Ranking
• PBSMT: 56.3% preferred system
• NMTm: 24.8%
• NMTt: 18.8%
| 24
25. www.adaptcentre.ieNMT for Patents
| 25
• Compare performance of mature patent MT engines used in
production with new NMT system
• Systems
• PBSMT (a combination of elements of phrase-based, syntactic, and
rule-driven MT, along with automatic post-editing)
• NMT (baseline)
• English into Chinese
• Data set: ~1M sentence pairs chemical abstracts, ~350K chemical
titles, ~12M general patent, and ~2K glossaries.
• 2 reviewers
• Ranking
• Error analysis
• Punctuation, part of speech, omission, addition, wrong terminology,
literal translation, and word form.
27. www.adaptcentre.ieNMT for MOOCs
• Decide which system would provide better quality translations for the
project domain
• Systems
• PBMST (Moses)
• NMT (baseline)
• English into German, Greek, Portuguese and Russian
• Data set:
• OFD : ~24M (DE), ~31M (EL), ~32M (PT), ~22M (RU)
• In-domain : ~270K (DE), ~140K (EL), ~58K (PT), ~2M (RU)
• Ranking
• Post-editing
• Fluency and Adequacy (1-4 Likert scale)
• Error analysis: inflectional morphology, word order, omission, addition,
and mistranslation
| 27
28. www.adaptcentre.ieNMT for MOOCs
• AEM:
• NMT outperforms SMT in terms of BLEU and METEOR
• More PE for SMT
• Fluency and Adequacy
• NMT is preferred across all languages for Fluency
• Adequacy results a bit less consistent
| 28
29. www.adaptcentre.ieNMT for MOOCs
Post-editing
Technical effort improved for DE, but marginally for other languages
Temporal effort marginally improved
Ranking
NMT is preferred across all languages (DE 80%, EL 56%, PT 61% and RU 63%)
| 29
32. www.adaptcentre.ieObservations (from an old guy)
• MT is hard; it’s about as hard a problem as we’ve some up with.
• Just by adopting a new paradigm, the problems don’t become
any easier.
| 32
33. www.adaptcentre.ieObservations (from an old guy)
• MT is hard; it’s about as hard a problem as we’ve some up with.
• Just by adopting a new paradigm, the problems don’t become
any easier.
• (Some) newcomers to the field will soon find that MT is too
hard for them and will disappear …
• The same thing happened with SMT – people came into the
field, published an ACL paper with their favourite statistical
method and ran off to their next field.
• For them, MT was just another application, whereas some of us
have been doing this for half our lives and more!
| 33
34. www.adaptcentre.ieConcluding Remarks
• NMT results are really promising!
• But … human evaluations show that results are not yet so
clear-cut
• Especially where data is scarce, NMT hopelessly
underperforms compared to SMT
• Translation industry is eager for improved MT quality in
order to minimise costs
• The hype around NMT must be treated cautiously;
overselling a technology that is still in need of more
research may cause more negativity about MT
| 34
36. www.adaptcentre.ieFood for Thought?
• Imagine NMT really is better than SMT:
– for all domains
– for all language pairs
• Is the translation industry set up to provide this
technology now?
| 36
37. www.adaptcentre.ieFood for Thought?
• Imagine NMT really is better than SMT:
– for all domains
– for all language pairs
• Is the translation industry set up to provide this
technology now?
• If not, what needs to happen? And by when? Who
can help?
| 37
38. www.adaptcentre.ieFood for Thought?
• Imagine NMT really is better than SMT:
– for all domains
– for all language pairs
• Is the translation industry set up to provide this
technology now?
• If not, what needs to happen? And by when? Who
can help?
• Finally: training NMT engines typically takes weeks
rather than days for SMT.
| 38
39. www.adaptcentre.ieFood for Thought?
• Imagine NMT really is better than SMT:
– for all domains
– for all language pairs
• Is the translation industry set up to provide this
technology now?
• If not, what needs to happen? And by when? Who
can help?
• Finally: training NMT engines typically takes weeks
rather than days for SMT.
– What’s the impact on the climate of all these
GPU servers running 24/7?
| 39