Choosing the right Machine Translation for a project is hard: the performance of every engine varies across domains and language pairs, and changes often. With domain-adaptive NMT it's even harder, as there's an additional difference in training corpus requirements and complex cost of ownership models.
In this talk, we describe the approach to address those issues. A large-scale reference-based scoring filters candidate MT engines and identifies a "decisive" set of segments for human LQA. A specific LQA procedure is tuned to choosing the best rather than just an acceptance. Also, we look at the learning gradient to estimate if the same engine will be suitable as more data become available.
GPT and other Text Transformers: Black Swans and Stochastic ParrotsKonstantin Savenkov
Over the last year, we see increasingly more performant Text Transformers models, such as GPT-3 from OpenAI, Turing from Microsoft, and T5 from Google. They are capable of transforming the text in very creative and unexpected ways, like generating a summary of an article, explaining complex concepts in a simple language, or synthesizing realistic datasets for AI training. Unlike more traditional Machine Learning models, they do not require vast training datasets and can start based on just a few examples.
In this talk, we will make a short overview of such models, share the first experimental results and ask questions about the future of the content creation process. Are those models ready for prime time? What will happen to the professional content creators? Will they be able to compete against such powerful models? Will we see GPT post-editing similar to MT post-editing? We will share some answers we have based on the extensive experimenting and the first production projects that employ this new technology.
Dodging AI biases in future-proof Machine Translation solutionsKonstantin Savenkov
We all want to act locally while going global, and maintain an inclusive multilingual work environment for the international workforce. Every AI model has its linguistic, cultural, and geopolitical biases. Besides providing better linguistic quality for specific languages and domains, a particular Machine Translation system may not be fully compliant with local dialect, tone of voice, gender, and data locality rules. In this talk, we consider practical cases when those biases create obstacles in building a global presence and an inclusive multilingual work environment for an international company. We discuss how to dodge those biases by using multi-vendor international AI, and in some cases go further, by leveraging those biases to create more diverse and inclusive translations.
This presentation covers our approach to building multi-purpose MT deployments. We talk about different enterprise use-cases for MT and the requirements of those use-cases. Since those requirements often have nothing to do with the objective linguistic quality, sometimes you don't want to select a specific MT engine just to meet them. Therefore, we provide some examples of how it's possible to fulfill those requirements by building NLP on top of your favorite Machine Translation black box.
Talk at Stanford HAI Workshop on "Measurement in AI Policy: Opportunities and Challenges", October 30, 2019, Stanford, USA
When we procure Machine Translation vendors for the multi-vendor MT solutions we build for enterprises, we run a lot of MT evaluation projects. We evaluate commercial MT systems on public and private data to find the best system for a specific language pair and domain. These evaluations are quite different from what you see in WMT benchmarks, as we evaluate commercial systems, which are optimized for economic efficiency and real-time performance.
Talk by Konstantin Savenkov (Intento, Inc.) at Developer Week 2019 (Seattle: Cloud Edition).
There are already hundreds of AI functions available via different APIs. Pick Machine Translation, Sentiment Analysis, Image Tagging or anything else - there's already a choice of 20-30 AI vendors to pick from. I will make a brief overview of what types of models are already available in the cloud, which of those enable customization and important things to look after when selecting the model (or a set of those) for a specific project. I will also touch AI API developer experience, to give an idea what a developer should be prepared for when choosing the API to work with.
https://devweeksea2019.sched.com/event/OGSp/pro-talk-cloud-ai-landscape
GPT and other Text Transformers: Black Swans and Stochastic ParrotsKonstantin Savenkov
Over the last year, we see increasingly more performant Text Transformers models, such as GPT-3 from OpenAI, Turing from Microsoft, and T5 from Google. They are capable of transforming the text in very creative and unexpected ways, like generating a summary of an article, explaining complex concepts in a simple language, or synthesizing realistic datasets for AI training. Unlike more traditional Machine Learning models, they do not require vast training datasets and can start based on just a few examples.
In this talk, we will make a short overview of such models, share the first experimental results and ask questions about the future of the content creation process. Are those models ready for prime time? What will happen to the professional content creators? Will they be able to compete against such powerful models? Will we see GPT post-editing similar to MT post-editing? We will share some answers we have based on the extensive experimenting and the first production projects that employ this new technology.
Dodging AI biases in future-proof Machine Translation solutionsKonstantin Savenkov
We all want to act locally while going global, and maintain an inclusive multilingual work environment for the international workforce. Every AI model has its linguistic, cultural, and geopolitical biases. Besides providing better linguistic quality for specific languages and domains, a particular Machine Translation system may not be fully compliant with local dialect, tone of voice, gender, and data locality rules. In this talk, we consider practical cases when those biases create obstacles in building a global presence and an inclusive multilingual work environment for an international company. We discuss how to dodge those biases by using multi-vendor international AI, and in some cases go further, by leveraging those biases to create more diverse and inclusive translations.
This presentation covers our approach to building multi-purpose MT deployments. We talk about different enterprise use-cases for MT and the requirements of those use-cases. Since those requirements often have nothing to do with the objective linguistic quality, sometimes you don't want to select a specific MT engine just to meet them. Therefore, we provide some examples of how it's possible to fulfill those requirements by building NLP on top of your favorite Machine Translation black box.
Talk at Stanford HAI Workshop on "Measurement in AI Policy: Opportunities and Challenges", October 30, 2019, Stanford, USA
When we procure Machine Translation vendors for the multi-vendor MT solutions we build for enterprises, we run a lot of MT evaluation projects. We evaluate commercial MT systems on public and private data to find the best system for a specific language pair and domain. These evaluations are quite different from what you see in WMT benchmarks, as we evaluate commercial systems, which are optimized for economic efficiency and real-time performance.
Talk by Konstantin Savenkov (Intento, Inc.) at Developer Week 2019 (Seattle: Cloud Edition).
There are already hundreds of AI functions available via different APIs. Pick Machine Translation, Sentiment Analysis, Image Tagging or anything else - there's already a choice of 20-30 AI vendors to pick from. I will make a brief overview of what types of models are already available in the cloud, which of those enable customization and important things to look after when selecting the model (or a set of those) for a specific project. I will also touch AI API developer experience, to give an idea what a developer should be prepared for when choosing the API to work with.
https://devweeksea2019.sched.com/event/OGSp/pro-talk-cloud-ai-landscape
State of the Machine Translation by Intento (stock engines, Jun 2019)Konstantin Savenkov
Evaluation of 25 major Cloud Machine Translation Services with Stock (pre-trained) models (Alibaba, Amazon, Baidu, CloutTranslate, DeepL, Google Translate, GTCom Yeecloud, IBM Watson v3, Microsoft Text Translator v3, ModernMT, Naver Papago, Niutrans, PROMT, SAP Translation Hub, SDL Language Cloud and BeGlobal, Systran SMT and PNMT, Sogou, Tencent, Tilde, Yandex, Youdao) for 48 language pairs: pricing, performance, quality, and language coverage. We also analyze how the MT landscape changed over the last year.
State of the Machine Translation by Intento (stock engines, Jan 2019)Konstantin Savenkov
Evaluation of 23 major Cloud Machine Translation Services with Stock (pre-trained) models (Alibaba, Amazon, Baidu, DeepL, Google Translate, GTCom Yeecloud, IBM Watson v3, Microsoft Text Translator v3, ModernMT, Naver Papago, Niutrans, PROMT, SAP Translation Hub, SDL Language Cloud and BeGlobal, Systran SMT and PNMT, Sogou, Tencent, Yandex, Youdao) for 48 language pairs: pricing, performance, quality, and language coverage. We also analyze how the MT landscape changed over the last year.
State of the Domain-Adaptive Machine Translation by Intento (November 2018)Konstantin Savenkov
In this report, we have evaluated 6 modern domain-adaptive NMT engines on Biomedical dataset (English to German). ModernMT, Globalese, Google AutoML, IBM Custom NMT, Microsoft Custom Translate, and Tilde. We explored how they compare by performance (using reference-based scores, linguistic quality analysis and automatic quality estimation), total cost of ownership, dataset size requirements, training time, data protection policy and how to start using this advanced technology.
EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...Konstantin Savenkov
We discuss the importance of evaluating pre-built and customizable MT engines towards different goals in Post-Edited Machine Translation (PEMT) and raw MT settings, as well as different approaches to those evaluations. We'll cover main pitfalls on the path to choose the right MT engine and possible workarounds. The primary focus is on reference-based assessment and how we run them at Intento.
School of Advanced Technologies for Translators
Friday 14 and Saturday 15 September 2018 - Milano (Italy)
https://satt2018.fbk.eu/
Improving the Demand Side of the AI Economy (API World 2018)Konstantin Savenkov
Training AI in-house is often infeasible as it requires a critical mass of talent and data, and has high R&D risks. For Cognitive AI, like machine translation and speech recognition, hundreds of pre-trained and adaptive models are already available on the market via APIs from many vendors. Their performance varies case by case and change often. Their prices are 100x-200x times different, hence a wrong choice may be a complete miss.
In this talk, we argue that the only way to go is to evaluate and continuously optimize AI vendor portfolio and introduce our vendor-agnostic demand-side API platform for AI.
Evaluation of 19 major Cloud Machine Translation Engines (Alibaba, Amazon, Baidu, DeepL, Google, GRCom, IBM SMT and NMT, Microsoft SMT and NMT, ModernMT, PROMT, SAP, SDL Language Cloud, Systran SMT and PNMT, Tencent, Yandex, Youdao) for 48 language pairs: pricing, performance, quality, and language coverage. We also analyse how the MT landscape changed over the last year.
In this survey, we compare features, language support, and pricing for 15 vendors of Sentiment Analysis.
We consider only hosted services with public API: several algorithms on Algorithmia marketplace, Microsoft Text Analytics, Repustate, Google Cloud Natural Language, IBM Watson NLU,
Meaning Cloud, TheSay PreCeive, AWS Comprehend, Aylien,
Bozon NLP, Salesforce Einstein Language, Twinword.
Evaluation of 14 major Cloud Machine Translation Engines (Google, Microsoft, IBM, IBM NMT, SAP, Amazon, Yandex, SDL, Systran, Systran PNMT, Baidu, GTCom, PROMT, DeepL) for 48 language pairs: performance, quality, language coverage, API update frequency.
State of the Machine Translation by Intento (November 2017)Konstantin Savenkov
Evaluation of 11 major Machine Translation (Google, Microsoft, IBM, SAP, Yandex, SDL, Systran, Baidu, GTCom, PROMT, DeepL) providers for 35 most popular language pairs: performance, quality, language coverage, API update frequency.
We have evaluated intent prediction performance, false positives, learning rate, language coverage, response time and pricing for 7 NLU providers: Amazon Lex, Facebook’s wit.ai, IBM Watson Conversation, Google’s API.ai, Microsoft LUIS, Recast.ai, Snips.ai
Краткий обзор видов рекомендательных систем (персонализированные/неперсонализированные, холодный старт, коллаборативная фильтрация тип) и способов их применения в B2C сервисах для улучшения бизнес-показателей: снижения стоимости привлечения пользователя, улучшения удержания и повышения маржинальности.
In this talk I presented a ratio analysis approach to measuring agile performance we use at Bookmate. It is invariant to story size and story point value. We measure productivity as a ratio of estimated committed & done effort to all resources available, further it cascades down as a product of accuracy, ability to ship, ability to plan and resource utilisation.
The presentation describe what business goals may be driven by Recommender Systems, how to estimate the economic impact and determine when to start spending resources on RS.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
State of the Machine Translation by Intento (stock engines, Jun 2019)Konstantin Savenkov
Evaluation of 25 major Cloud Machine Translation Services with Stock (pre-trained) models (Alibaba, Amazon, Baidu, CloutTranslate, DeepL, Google Translate, GTCom Yeecloud, IBM Watson v3, Microsoft Text Translator v3, ModernMT, Naver Papago, Niutrans, PROMT, SAP Translation Hub, SDL Language Cloud and BeGlobal, Systran SMT and PNMT, Sogou, Tencent, Tilde, Yandex, Youdao) for 48 language pairs: pricing, performance, quality, and language coverage. We also analyze how the MT landscape changed over the last year.
State of the Machine Translation by Intento (stock engines, Jan 2019)Konstantin Savenkov
Evaluation of 23 major Cloud Machine Translation Services with Stock (pre-trained) models (Alibaba, Amazon, Baidu, DeepL, Google Translate, GTCom Yeecloud, IBM Watson v3, Microsoft Text Translator v3, ModernMT, Naver Papago, Niutrans, PROMT, SAP Translation Hub, SDL Language Cloud and BeGlobal, Systran SMT and PNMT, Sogou, Tencent, Yandex, Youdao) for 48 language pairs: pricing, performance, quality, and language coverage. We also analyze how the MT landscape changed over the last year.
State of the Domain-Adaptive Machine Translation by Intento (November 2018)Konstantin Savenkov
In this report, we have evaluated 6 modern domain-adaptive NMT engines on Biomedical dataset (English to German). ModernMT, Globalese, Google AutoML, IBM Custom NMT, Microsoft Custom Translate, and Tilde. We explored how they compare by performance (using reference-based scores, linguistic quality analysis and automatic quality estimation), total cost of ownership, dataset size requirements, training time, data protection policy and how to start using this advanced technology.
EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...Konstantin Savenkov
We discuss the importance of evaluating pre-built and customizable MT engines towards different goals in Post-Edited Machine Translation (PEMT) and raw MT settings, as well as different approaches to those evaluations. We'll cover main pitfalls on the path to choose the right MT engine and possible workarounds. The primary focus is on reference-based assessment and how we run them at Intento.
School of Advanced Technologies for Translators
Friday 14 and Saturday 15 September 2018 - Milano (Italy)
https://satt2018.fbk.eu/
Improving the Demand Side of the AI Economy (API World 2018)Konstantin Savenkov
Training AI in-house is often infeasible as it requires a critical mass of talent and data, and has high R&D risks. For Cognitive AI, like machine translation and speech recognition, hundreds of pre-trained and adaptive models are already available on the market via APIs from many vendors. Their performance varies case by case and change often. Their prices are 100x-200x times different, hence a wrong choice may be a complete miss.
In this talk, we argue that the only way to go is to evaluate and continuously optimize AI vendor portfolio and introduce our vendor-agnostic demand-side API platform for AI.
Evaluation of 19 major Cloud Machine Translation Engines (Alibaba, Amazon, Baidu, DeepL, Google, GRCom, IBM SMT and NMT, Microsoft SMT and NMT, ModernMT, PROMT, SAP, SDL Language Cloud, Systran SMT and PNMT, Tencent, Yandex, Youdao) for 48 language pairs: pricing, performance, quality, and language coverage. We also analyse how the MT landscape changed over the last year.
In this survey, we compare features, language support, and pricing for 15 vendors of Sentiment Analysis.
We consider only hosted services with public API: several algorithms on Algorithmia marketplace, Microsoft Text Analytics, Repustate, Google Cloud Natural Language, IBM Watson NLU,
Meaning Cloud, TheSay PreCeive, AWS Comprehend, Aylien,
Bozon NLP, Salesforce Einstein Language, Twinword.
Evaluation of 14 major Cloud Machine Translation Engines (Google, Microsoft, IBM, IBM NMT, SAP, Amazon, Yandex, SDL, Systran, Systran PNMT, Baidu, GTCom, PROMT, DeepL) for 48 language pairs: performance, quality, language coverage, API update frequency.
State of the Machine Translation by Intento (November 2017)Konstantin Savenkov
Evaluation of 11 major Machine Translation (Google, Microsoft, IBM, SAP, Yandex, SDL, Systran, Baidu, GTCom, PROMT, DeepL) providers for 35 most popular language pairs: performance, quality, language coverage, API update frequency.
We have evaluated intent prediction performance, false positives, learning rate, language coverage, response time and pricing for 7 NLU providers: Amazon Lex, Facebook’s wit.ai, IBM Watson Conversation, Google’s API.ai, Microsoft LUIS, Recast.ai, Snips.ai
Краткий обзор видов рекомендательных систем (персонализированные/неперсонализированные, холодный старт, коллаборативная фильтрация тип) и способов их применения в B2C сервисах для улучшения бизнес-показателей: снижения стоимости привлечения пользователя, улучшения удержания и повышения маржинальности.
In this talk I presented a ratio analysis approach to measuring agile performance we use at Bookmate. It is invariant to story size and story point value. We measure productivity as a ratio of estimated committed & done effort to all resources available, further it cascades down as a product of accuracy, ability to ship, ability to plan and resource utilisation.
The presentation describe what business goals may be driven by Recommender Systems, how to estimate the economic impact and determine when to start spending resources on RS.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
8. 2018 in Machine Translation
Rise of Domain-Adaptive NMT*
8
Sep
2017
Oct
2018
* Neural Machine Translation with an automated customisation using domain-specific corpora, also known as the
domain adaptation.
Nov
2017
May
2018
Jun
2018
Jul
2018
Globalese
Custom
NMT
Lilt
Adaptive
NMT
IBM
Custom
NMT
Microsoft
Custom
Translate
Google
AutoML
Translation
SDL
ETS 8.0
ModernMT
Enterprise
Apr
2018
Systran
PNMT
Intento