Spanish Language Technology Plan. David Pérez Fernández, Cabinet of State Secretary for Telecommunication and Information Society, Ministry of industry, energy and tourism
In the past, software development used to follow a waterfall approach; today, localization must be embedded in an agile software development model. Most current i18n/l10n processes are not well suited for an agile software development process and definitely not suitable for Continuous Integration and Continuous Delivery. What are the implications of this change? What must be changed to allow i18n and l10n in an agile software development model? What is the role of the “traditional” Language Service Provider in this new scenario? Is there any out-of-the-box localization technology capable of covering all the requirements that a large company has to automate software localization and documentation today? In this Roundtable, we will discuss in which the direction our business is heading. We will try to answer some key questions on the requirements that a new business model should meet from the translation services buyer point of view.
1) The TKUN project aims to create a network for companies and institutions to share knowledge and data to facilitate effective multilingual communication through machine translation.
2) It focuses on improving MT quality through controlled language, terminology alignment from parallel texts, and post-editing.
3) Recent activities include collaborating with industries on controlled language for Japanese to English translation, standardizing post-editing guidelines, and partnering with Microsoft on a project to empower global PR from Japan using domain data and MT.
There are many online translation agencies that have focused on streamlining the buying experience and making it easy to procure translation. However, in these solutions the translators are faceless, often times leaving much to be desired on the quality side. At TM-Town, we believe the next generation is a system that builds on what these companies have done but also allows the buyer a better selection process to ensure that the best translator (the best specialist) for a particular job is chosen.
The document introduces MTradumàtica, a web-based SMT application created by Tradumàtica Research Group and Prompsit for translators and researchers. It aims to address gaps in current SMT tools by providing a user-friendly GUI, web-based access, and support for decentralized and private customization of MT engines. The application prototype was developed as part of ProjecTA, a publicly-funded project, and its goal is to empower translators to create and manage customized MT without extensive technical expertise. Feedback is sought to continue improving MTradumàtica.
This document outlines the key points of the LIDER Roadmap, which was developed through workshops with hundreds of stakeholders to outline a vision for linked data and content analytics over the next 3, 5, and 10 years. The roadmap addresses challenges and opportunities in several sectors, including global customer engagement, content publishing and delivery, marketing and customer relationship management, and the public sector/civil society. Cross-cutting issues like localization and translation, data privacy, licensing, and developing linguistic linked data ecosystems are also discussed.
The Innovation Language and The Social Innovation NetworkStefan Ianta
Introduction to the concepts and benefits of the Universal Innovation Language and how to implement it as the Semantic AI Internet / Digital Democracy.
Cloud services in Estonian, startups cloud meetupRiho Kurg
The document summarizes a meeting of Estonian cloud service providers, software developers, hardware vendors, integrators and consultants, and law firms to discuss collaborating on growing Estonia's cloud industry. It describes the participants and goals of working together, does a SWOT analysis of Estonia's cloud infrastructure and platform, and its software industry. It proposes next steps of being more open with startups and improving cloud services in Estonia.
Spanish Language Technology Plan. David Pérez Fernández, Cabinet of State Secretary for Telecommunication and Information Society, Ministry of industry, energy and tourism
In the past, software development used to follow a waterfall approach; today, localization must be embedded in an agile software development model. Most current i18n/l10n processes are not well suited for an agile software development process and definitely not suitable for Continuous Integration and Continuous Delivery. What are the implications of this change? What must be changed to allow i18n and l10n in an agile software development model? What is the role of the “traditional” Language Service Provider in this new scenario? Is there any out-of-the-box localization technology capable of covering all the requirements that a large company has to automate software localization and documentation today? In this Roundtable, we will discuss in which the direction our business is heading. We will try to answer some key questions on the requirements that a new business model should meet from the translation services buyer point of view.
1) The TKUN project aims to create a network for companies and institutions to share knowledge and data to facilitate effective multilingual communication through machine translation.
2) It focuses on improving MT quality through controlled language, terminology alignment from parallel texts, and post-editing.
3) Recent activities include collaborating with industries on controlled language for Japanese to English translation, standardizing post-editing guidelines, and partnering with Microsoft on a project to empower global PR from Japan using domain data and MT.
There are many online translation agencies that have focused on streamlining the buying experience and making it easy to procure translation. However, in these solutions the translators are faceless, often times leaving much to be desired on the quality side. At TM-Town, we believe the next generation is a system that builds on what these companies have done but also allows the buyer a better selection process to ensure that the best translator (the best specialist) for a particular job is chosen.
The document introduces MTradumàtica, a web-based SMT application created by Tradumàtica Research Group and Prompsit for translators and researchers. It aims to address gaps in current SMT tools by providing a user-friendly GUI, web-based access, and support for decentralized and private customization of MT engines. The application prototype was developed as part of ProjecTA, a publicly-funded project, and its goal is to empower translators to create and manage customized MT without extensive technical expertise. Feedback is sought to continue improving MTradumàtica.
This document outlines the key points of the LIDER Roadmap, which was developed through workshops with hundreds of stakeholders to outline a vision for linked data and content analytics over the next 3, 5, and 10 years. The roadmap addresses challenges and opportunities in several sectors, including global customer engagement, content publishing and delivery, marketing and customer relationship management, and the public sector/civil society. Cross-cutting issues like localization and translation, data privacy, licensing, and developing linguistic linked data ecosystems are also discussed.
The Innovation Language and The Social Innovation NetworkStefan Ianta
Introduction to the concepts and benefits of the Universal Innovation Language and how to implement it as the Semantic AI Internet / Digital Democracy.
Cloud services in Estonian, startups cloud meetupRiho Kurg
The document summarizes a meeting of Estonian cloud service providers, software developers, hardware vendors, integrators and consultants, and law firms to discuss collaborating on growing Estonia's cloud industry. It describes the participants and goals of working together, does a SWOT analysis of Estonia's cloud infrastructure and platform, and its software industry. It proposes next steps of being more open with startups and improving cloud services in Estonia.
Laura Dent: Single-Source and LocalizationJack Molisani
The document discusses single-sourcing and localization to streamline global content. It defines key terms like single-sourcing, localization, and internationalization. It outlines the benefits of single-sourcing and localization like cost savings and quality control. The document also discusses challenges, provides an overview of the localization process, and offers tips and examples for writing content that localizes more easily. Tools for single-sourcing and localization are also mentioned.
This document discusses the localization lifecycle from content management systems (CCMS) to translation management systems (TMS). It covers the localization and translation processes, roles involved, core technologies like DITA and XML, commercial use cases, and considerations when working with localization service providers. The goal is to capture approved content across multiple releases and languages while preventing unnecessary re-translation and efficiently incorporating last-minute translations.
Today’s web and mobile app localization industry relies on numerous standards, libraries and file formats to facilitate the exchange between developers and translators. While some formats are somewhat sophisticated, others lack even the most basic features, like pluralization and contextualization. And most can’t offer support for more advanced localization features, like language cases.
The most common localization formats include Gettext PO, PHP Arrays, Android XML, YAML, .Net RESX, iOS Strings and many others. A typical developer today works with many frameworks - for instance a Laravel backend app (PHP Arrays) with Ember front end (i18n JS) and iOS mobile app (Strings). Since all standards have distinct syntax, in many cases translations cannot be shared across applications.
Translation Markup Language (TML) aims to solve both these problems by introducing a powerful extensible cross-platform syntax that offers support for pluralization, language contextualization, language cases, reusable decorators and much more. TML libraries are available for all major web and mobile platforms. TML allows translators to do in-context translations - where they can translate right from within the apps. TML libraries also eliminate the need for developers to ever deal with the resource files, as all extractions and translation substitution is done realtime and the resource files are only used as a transport between the apps and the Translation Exchange platform.
Translation Exchange stores all translations in Universal Translation Memory (UTM), a graph database which stores all translations with their context, tone, rank and other attributes for accurate matching. This allows translations to be shared across all apps in the Translation Exchange Network. The translation memories of each app are extracted from the UTM graph and are managed by their individual localization teams. During this presentation we will look at some of the features of TML and how it can be used to quickly translate a Ruby on Rails application into any number of languages using in-context translation tools. We will also look at how the data is stored and shared across applications using UTM.
How do you measure translation quality and the effectiveness of localization activities and what will the opportunities and benefits be for sharing metrics in the future? The industry has been working on this topic, and discussing it in detail for some time now – and rightly so - but where are we ‘along the road’, and what progress are we making, and is it clear where we are going? The change from Waterfall to Agile development, the move from project based to continuous localization, mean that we cannot expect a static definition of quality, however comprehensive, to define translation or localization quality. Furthermore the detailed measurement of costs versus benefit means we cannot make wholesale changes to QA processes without having data on ROI – and quality is really about ensuring products are saleable, and profitable. Where do these facts leave us? This panel will try to place us on the QA map, discuss the TAUS DQF offering, and put forward some new ideas and thoughts on how we can work collaboratively in this area.
Session host: Paul Mangell (Alpha CRC)
Presenters and panelists are: Dragos Munteanu (SDL), Istvan Lengyel (Kilgray), Guylaine Tritton (Alpha CRC)
Yogesh Chaturvedi is seeking a leadership role as a Purchase Specialist where he can utilize over 11 years of experience in sourcing electronics components and managing vendors. He has worked at Schneider Electric, Delta Power Solutions, Minda Corporation, and Auto Meters Alliance in procurement and supply chain roles. Chaturvedi has expertise in global sourcing, cost reduction, vendor evaluation, inventory management, and ERP systems like Oracle and SAP. He holds a diploma in electronics, bachelor's degree in arts, and an MBA.
Hosted by Jessica Roland (SDL) with panelists: Mimi Hills (VMWare), David Snider (LinkedIn) and Melissa Biggs (Informatica).
So you have your localization tools and data in a private, public or hybrid Cloud. Great! But how “Cloud” are you, really? Do you have your people working at the speed of Cloud? If you don’t, does it matter if you have all the Cloud infrastructure/systems in place? And which is more important, speed of Cloud or ease of Cloud? Our panel of expert leaders has seen the full-spectrum journey to the Cloud and will share their thoughts on what it takes to really leverage this oh-so-popular model…and why.
Move Our DITA Content to Another CCMS? Seriously? - IXIASOFT User Conference ...IXIASOFT
Presented by Nancy Howe and gg Heath, Information Architects, Teradata Labs at the IXIASOFT User Conference 2016.
How do you move half a million highly intertwined objects from one CCMS to another, while producing new content at a higher rate than ever? In this presentation we'll share the challenges we discovered, the solutions we developed, and the processes we used to migrate DITA content from a proprietary CCMS to the IXIASOFT DITA CMS. Teradata is a long-time DITA user, supporting approximately 300 deliverables, thousands of pages of content with up to 90% reuse, intense filtering, and translation.
The long and winding road of migrating to IXIASOFT DITA-CMS included overcoming technical challenges like bursting massive conref containers into thousands of referable-content topics, completely changing our approach to versioning content, standardizing filtering, and implementing a new translation workflow.
Parkour: Lessons in Agility - July 2016patricia_gale
This is the text-based Powerpoint version of our visual Prezi presentation for the CIDM Ideas online conference. Use these notes to follow along during our presentation, which happens on Wednesday, July 20, 2016 at 2:00 p.m. EDT.
Translating your content into other languages is one of the greatest challenges in the technical communication process. We as technical communicators can meet these challenges with the combination of single-sourcing and localization. This presentation covers a combination of tools, best practices, tips and tricks for managing your localization projects.
Predicting the quality of an MT engine without existing target reference is one of the tricky part in MT technology. It plays an essential role in making MT usable in real life scenarios. Perspective by Gábor Bessenyei (CEO of MorphoLogic Localisation Ltd.).
Localization and DITA: What you Need to Know - LocWorld32IXIASOFT
The document discusses localization best practices when using DITA (Darwin Information Typing Architecture). It provides an overview of key DITA features like content reuse and separation of form and content. It also looks at current adoption of DITA, with over 650 companies using it worldwide across many sectors. Localization considerations with DITA are examined, including challenges around incomplete translation packages, content reuse with conrefs and conditions, and ensuring proper context for translation. Best practices are suggested for localization teams and LSPs (language service providers) working with DITA content.
Game Localization, Indie devs edition by Silvia FornósSilvia Fornós
The document discusses the game localization process. It outlines the main steps as internationalization (I18N) to make a game localization-ready, followed by localization (L11N) which involves translation, testing, and product adaptation for local markets. The key roles in localization include project managers, reviewers/leads, translators, and testers. Common tools are glossaries, style guides, CAT tools, translation memories, and query logs. For indie games, the document recommends building an in-house localization team rather than using an agency, and using agile methodologies with localization in small batches. It also notes official guidelines must be followed for console localization.
This document discusses the impact of the printing press on book production and distribution. It notes that before the printing press, books were luxurious items due to the high costs of manual transcription and illumination. The printing press lowered costs by reducing labor and using less expensive materials like paper. This made books more widely available and accelerated the dissemination of information. While artistic merit declined as printing replaced hand-crafted books, literacy and information sharing increased dramatically due to cheaper printed books.
How to write effective requirements in an Agile environment by Matteo TaddeiBosnia Agile
Clear and Effective Requirements are a key component of any Product Development LifeCycle (PDLC). In Agile requirements are collected using User Stories; in this talk we will cover what a User Story is, and the major characteristics a User Story should have to be clear and effective for any team.
HP Enterprise had been enjoying its prestigious position in Application Development Management area. HP Enterprise Software Linguistic QA (LQA) team is going to share their practice of deploying the Quality Center to manage LQA activities - this solution integrates with other Quality Center from product teams, CAT tools (Passolo), as well as test automation tools, enabling the LQA team to prepare, execute, monitor, and analyze the tests.
Enterprise Localization Directors speak about trends in their job.
Loïc Dufresne de Virel from Intel
Patrick McLoughlin from Eventbrite
Mika Pehkonen from F-Secure
Long tail of languages, staffing, integration, owning the localization budget.
The document discusses continuous globalization and localization workflows. It describes Lingoport's suite of tools which provide continuous measurement, automation and visibility for internationalization and localization. The tools accommodate continuous change, measure localization issues, and integrate localization into every sprint and release. Lingoport representatives were available to answer questions.
It’s no longer a question of whether technologies will reshape language service industry; it's just a matter of how quickly and how deeply. The future of translation industry will be prosperously stimulated by the further development of economic globalization and the popularization of internet usage, but clients' requirement will be further diverse in terms of quality, turnaround, volume, content, and cost. To deliver matched services, language service providers need to enable agile translation, machine translation and scalable translation capacity. However, it is hard to deliver all these diverse services without the help of technology and an innovative business model. New technologies, such as workflow automation, online service, crowdsourcing supported by new technologies and internet becoming a competitive edge for LSPs, and new relationships with resources (translators) are also a must after millennial generation dominate the labor market, the succeed of a LSP will be more technologies- and innovations-driven. This presentation will focus on how technology/Internet is reshaping the translation industry in China.
Jaap van der Meer, Director of TAUS, shares a compilation of the feedback on the Big Idea as well as a complete overview of new TAUS features and services and new partnerships.
The document discusses open data and its benefits. It outlines 5 levels or "stars" of open data, with 5 stars being the most open. Open government data can include transportation and financial data, helping cities and giving citizens visibility. A pilot open data project is proposed, starting with one UNDP dataset to understand features and stakeholder needs before a larger launch. The pilot would test an API or open data platform over 2-3 months to inform a full open data service.
Laura Dent: Single-Source and LocalizationJack Molisani
The document discusses single-sourcing and localization to streamline global content. It defines key terms like single-sourcing, localization, and internationalization. It outlines the benefits of single-sourcing and localization like cost savings and quality control. The document also discusses challenges, provides an overview of the localization process, and offers tips and examples for writing content that localizes more easily. Tools for single-sourcing and localization are also mentioned.
This document discusses the localization lifecycle from content management systems (CCMS) to translation management systems (TMS). It covers the localization and translation processes, roles involved, core technologies like DITA and XML, commercial use cases, and considerations when working with localization service providers. The goal is to capture approved content across multiple releases and languages while preventing unnecessary re-translation and efficiently incorporating last-minute translations.
Today’s web and mobile app localization industry relies on numerous standards, libraries and file formats to facilitate the exchange between developers and translators. While some formats are somewhat sophisticated, others lack even the most basic features, like pluralization and contextualization. And most can’t offer support for more advanced localization features, like language cases.
The most common localization formats include Gettext PO, PHP Arrays, Android XML, YAML, .Net RESX, iOS Strings and many others. A typical developer today works with many frameworks - for instance a Laravel backend app (PHP Arrays) with Ember front end (i18n JS) and iOS mobile app (Strings). Since all standards have distinct syntax, in many cases translations cannot be shared across applications.
Translation Markup Language (TML) aims to solve both these problems by introducing a powerful extensible cross-platform syntax that offers support for pluralization, language contextualization, language cases, reusable decorators and much more. TML libraries are available for all major web and mobile platforms. TML allows translators to do in-context translations - where they can translate right from within the apps. TML libraries also eliminate the need for developers to ever deal with the resource files, as all extractions and translation substitution is done realtime and the resource files are only used as a transport between the apps and the Translation Exchange platform.
Translation Exchange stores all translations in Universal Translation Memory (UTM), a graph database which stores all translations with their context, tone, rank and other attributes for accurate matching. This allows translations to be shared across all apps in the Translation Exchange Network. The translation memories of each app are extracted from the UTM graph and are managed by their individual localization teams. During this presentation we will look at some of the features of TML and how it can be used to quickly translate a Ruby on Rails application into any number of languages using in-context translation tools. We will also look at how the data is stored and shared across applications using UTM.
How do you measure translation quality and the effectiveness of localization activities and what will the opportunities and benefits be for sharing metrics in the future? The industry has been working on this topic, and discussing it in detail for some time now – and rightly so - but where are we ‘along the road’, and what progress are we making, and is it clear where we are going? The change from Waterfall to Agile development, the move from project based to continuous localization, mean that we cannot expect a static definition of quality, however comprehensive, to define translation or localization quality. Furthermore the detailed measurement of costs versus benefit means we cannot make wholesale changes to QA processes without having data on ROI – and quality is really about ensuring products are saleable, and profitable. Where do these facts leave us? This panel will try to place us on the QA map, discuss the TAUS DQF offering, and put forward some new ideas and thoughts on how we can work collaboratively in this area.
Session host: Paul Mangell (Alpha CRC)
Presenters and panelists are: Dragos Munteanu (SDL), Istvan Lengyel (Kilgray), Guylaine Tritton (Alpha CRC)
Yogesh Chaturvedi is seeking a leadership role as a Purchase Specialist where he can utilize over 11 years of experience in sourcing electronics components and managing vendors. He has worked at Schneider Electric, Delta Power Solutions, Minda Corporation, and Auto Meters Alliance in procurement and supply chain roles. Chaturvedi has expertise in global sourcing, cost reduction, vendor evaluation, inventory management, and ERP systems like Oracle and SAP. He holds a diploma in electronics, bachelor's degree in arts, and an MBA.
Hosted by Jessica Roland (SDL) with panelists: Mimi Hills (VMWare), David Snider (LinkedIn) and Melissa Biggs (Informatica).
So you have your localization tools and data in a private, public or hybrid Cloud. Great! But how “Cloud” are you, really? Do you have your people working at the speed of Cloud? If you don’t, does it matter if you have all the Cloud infrastructure/systems in place? And which is more important, speed of Cloud or ease of Cloud? Our panel of expert leaders has seen the full-spectrum journey to the Cloud and will share their thoughts on what it takes to really leverage this oh-so-popular model…and why.
Move Our DITA Content to Another CCMS? Seriously? - IXIASOFT User Conference ...IXIASOFT
Presented by Nancy Howe and gg Heath, Information Architects, Teradata Labs at the IXIASOFT User Conference 2016.
How do you move half a million highly intertwined objects from one CCMS to another, while producing new content at a higher rate than ever? In this presentation we'll share the challenges we discovered, the solutions we developed, and the processes we used to migrate DITA content from a proprietary CCMS to the IXIASOFT DITA CMS. Teradata is a long-time DITA user, supporting approximately 300 deliverables, thousands of pages of content with up to 90% reuse, intense filtering, and translation.
The long and winding road of migrating to IXIASOFT DITA-CMS included overcoming technical challenges like bursting massive conref containers into thousands of referable-content topics, completely changing our approach to versioning content, standardizing filtering, and implementing a new translation workflow.
Parkour: Lessons in Agility - July 2016patricia_gale
This is the text-based Powerpoint version of our visual Prezi presentation for the CIDM Ideas online conference. Use these notes to follow along during our presentation, which happens on Wednesday, July 20, 2016 at 2:00 p.m. EDT.
Translating your content into other languages is one of the greatest challenges in the technical communication process. We as technical communicators can meet these challenges with the combination of single-sourcing and localization. This presentation covers a combination of tools, best practices, tips and tricks for managing your localization projects.
Predicting the quality of an MT engine without existing target reference is one of the tricky part in MT technology. It plays an essential role in making MT usable in real life scenarios. Perspective by Gábor Bessenyei (CEO of MorphoLogic Localisation Ltd.).
Localization and DITA: What you Need to Know - LocWorld32IXIASOFT
The document discusses localization best practices when using DITA (Darwin Information Typing Architecture). It provides an overview of key DITA features like content reuse and separation of form and content. It also looks at current adoption of DITA, with over 650 companies using it worldwide across many sectors. Localization considerations with DITA are examined, including challenges around incomplete translation packages, content reuse with conrefs and conditions, and ensuring proper context for translation. Best practices are suggested for localization teams and LSPs (language service providers) working with DITA content.
Game Localization, Indie devs edition by Silvia FornósSilvia Fornós
The document discusses the game localization process. It outlines the main steps as internationalization (I18N) to make a game localization-ready, followed by localization (L11N) which involves translation, testing, and product adaptation for local markets. The key roles in localization include project managers, reviewers/leads, translators, and testers. Common tools are glossaries, style guides, CAT tools, translation memories, and query logs. For indie games, the document recommends building an in-house localization team rather than using an agency, and using agile methodologies with localization in small batches. It also notes official guidelines must be followed for console localization.
This document discusses the impact of the printing press on book production and distribution. It notes that before the printing press, books were luxurious items due to the high costs of manual transcription and illumination. The printing press lowered costs by reducing labor and using less expensive materials like paper. This made books more widely available and accelerated the dissemination of information. While artistic merit declined as printing replaced hand-crafted books, literacy and information sharing increased dramatically due to cheaper printed books.
How to write effective requirements in an Agile environment by Matteo TaddeiBosnia Agile
Clear and Effective Requirements are a key component of any Product Development LifeCycle (PDLC). In Agile requirements are collected using User Stories; in this talk we will cover what a User Story is, and the major characteristics a User Story should have to be clear and effective for any team.
HP Enterprise had been enjoying its prestigious position in Application Development Management area. HP Enterprise Software Linguistic QA (LQA) team is going to share their practice of deploying the Quality Center to manage LQA activities - this solution integrates with other Quality Center from product teams, CAT tools (Passolo), as well as test automation tools, enabling the LQA team to prepare, execute, monitor, and analyze the tests.
Enterprise Localization Directors speak about trends in their job.
Loïc Dufresne de Virel from Intel
Patrick McLoughlin from Eventbrite
Mika Pehkonen from F-Secure
Long tail of languages, staffing, integration, owning the localization budget.
The document discusses continuous globalization and localization workflows. It describes Lingoport's suite of tools which provide continuous measurement, automation and visibility for internationalization and localization. The tools accommodate continuous change, measure localization issues, and integrate localization into every sprint and release. Lingoport representatives were available to answer questions.
It’s no longer a question of whether technologies will reshape language service industry; it's just a matter of how quickly and how deeply. The future of translation industry will be prosperously stimulated by the further development of economic globalization and the popularization of internet usage, but clients' requirement will be further diverse in terms of quality, turnaround, volume, content, and cost. To deliver matched services, language service providers need to enable agile translation, machine translation and scalable translation capacity. However, it is hard to deliver all these diverse services without the help of technology and an innovative business model. New technologies, such as workflow automation, online service, crowdsourcing supported by new technologies and internet becoming a competitive edge for LSPs, and new relationships with resources (translators) are also a must after millennial generation dominate the labor market, the succeed of a LSP will be more technologies- and innovations-driven. This presentation will focus on how technology/Internet is reshaping the translation industry in China.
Jaap van der Meer, Director of TAUS, shares a compilation of the feedback on the Big Idea as well as a complete overview of new TAUS features and services and new partnerships.
The document discusses open data and its benefits. It outlines 5 levels or "stars" of open data, with 5 stars being the most open. Open government data can include transportation and financial data, helping cities and giving citizens visibility. A pilot open data project is proposed, starting with one UNDP dataset to understand features and stakeholder needs before a larger launch. The pilot would test an API or open data platform over 2-3 months to inform a full open data service.
MLi project is working to deliver the strategic vision and operational specifications needed for building a comprehensive European MultiLingual data & services Infrastructure, along with a multiannual plan for its development and deployment, and foster multi-stakeholders alliances ensuring its long term sustainability.
This document summarizes the evolution of the translation technology landscape from the 1980s to the present and potential future trends. It covers major developments in each decade including the rise of personal computing and tools like translation memory in the 1980s and 1990s. The 2000s saw the growth of the internet, web services, and content management systems. Recent years have seen increased integration, convergence of technologies like machine translation and speech-to-speech, and a shift to cloud-based and mobile offerings. The document predicts translation may increasingly involve cultural consulting and transcreation rather than direct translation as machine translation capabilities advance.
The translation industry has undergone a paradigm shift every decade since 1980, but none was as big as the one we are facing now. We are entering the Convergence era: automatic translation will be a utility embedded in every app, device, sign board and screen. Businesses will prosper by finding new customers in new markets. Governments and citizens will connect and communicate easily. Consumers will become world-wise, talking to everyone everywhere as if language barriers never existed. It will not be perfect, but it will open doors and break down barriers. And it will give a boost to the translation industry, which will be chartered to constantly improve the technology and fill the gaps in global communications. In this interactive opening session Jaap van der Meer zooms in on the choices we are facing and the decision factors that help us make planning for an uncertain future opportunistic and profitable.
One of the biggest challenges for translation teams today is that the translation tends to be pushed to the very end of the product cycle and, if deadlines aren't met, can have an adverse impact on the total cost of product marketing campaign due to delayed releases. Regardless of our role in the translation process, we need to understand how both the documentation process and the translation process affect each other, where are the bottle-necks in the workflow, and how we can merge the two so that our customers can meet their goals.
"For-Information-Purposes-Only" translation meets the needs of a specific group of people in each country, among many others. Who are them? How localization engineering technologies can help? How big data and cloud computing can make a difference? This presentation will try to answer these questions.
TAUS Roundtables are one-day meetings for buyers and providers of language services and technologies, aimed at an open exchange about language business innovation and translation technology. These industry brainstorming sessions are organized in various cities around the world, supported by TAUS members and regional partners.
The format of TAUS Roundtable is open and informal. The goal is to learn from your peers in the industry. Participants may present their experience and perspective on a particular question and request others in the meeting to share their reflections.
This document discusses how big data can enable the travel and tourism industries. It defines big data as large datasets characterized by their volume, velocity, variety, and veracity. Big data comes from a variety of sources as people leave digital traces online and through mobile technologies. The benefits of big data for businesses include improved customer experience personalization, optimized marketing and products, predictive analytics, and risk management. The big data market is expected to double from 2014 to 2018. Future developments include improvements in data processing, centralized data repositories, and analytics solutions in the public cloud to reduce costs and security risks. Big data can deliver business insights, innovation, better customer relationships, and continuously improved experiences for the tourism industry.
Exploration, visualization and querying of linked open data sourcesLaura Po
afternoon hands-on session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
Closing plenary: the future of public sector websites #BPCW11Headstar
Closing plenary: 'The future of public sector websites', at Building Perfect Council Websites 11, 14 July 2011 #BPCW11 Speakers: Paul Davidson and Ingrid Koehler
UNIT I Streaming Data & Architectures.pptxRahul Borate
The document provides an introduction and overview of streaming data. It discusses sources of streaming data such as operational monitoring, web analytics, online advertising, social media, and mobile/IoT data. It explains that streaming data is different from other data types in that it is always flowing in, loosely structured, and can have high-cardinality dimensions. Real-time architectures for streaming data need to have high availability, low latency, and horizontal scalability.
The document discusses the Internet of Things ecosystem and how to unlock business value from connected devices. It defines IoT and provides projections on growth. It outlines the complex IoT ecosystem and stakeholders involved. It presents a business value framework focused on financial metrics, operating metrics, and relationships. Common value drivers of cost reduction and risk management are discussed. Strategies to unlock more value through revenue generation and innovation are suggested, including focusing on product/customer lifecycles. Overcoming security and privacy challenges is also addressed.
The data-driven economy promises the creation of enormous amounts of economic activity and growth opportunities. However these projections lie to a large extent in the development of new services. Currently, the results in terms of service creation remain below the expectations of open data promoters. Indeed most services created are not sustainable and / or do not use the variety of datasets. They are to a wide extent relying on a limited number of very popular datasets. To increase the reuse and the value extracted by services from data, our hypothesis is that service innovation approaches can help understand the mechanisms that drive the creation of services. We therefore propose a review the current approaches to encouraging the creation of services based on data, an analysis of the creation of services from two open data platforms, in the UK and in Singapore, and a description of the roles that the data can have in the design of services based on a theoretical framework of service innovation.
Muriel Foulonneau 1, Slim Turki 1, Géradine Vidou 1, Sébastien Martin 2
1 Public Research Centre Henri Tudor, Luxembourg-Kirchberg, Kirchberg
2 Université Paris 8, Vincennes-Saint-Denis, France
muriel.foulonneau@tudor.lu
slim.turki@tudor.lu
geraldine.vidou@tudor.lu
Proceedings of 14th European Conference on eGovernment – ECEG 2014
12-13 June 2014
Brasov, Romania
Local Open Data: a perspective from local government in England 2014Gesche Schmid
The document discusses open data from the perspective of local government in England. It outlines four phases of working with open data: 1) publishing data, 2) standardizing data, 3) analyzing and using data, and 4) engaging users. The benefits of open data include innovation, improved services, and empowering citizens, businesses and communities. However, engagement with users has been limited due to lack of skills and understanding of what can be done with data. Efforts are needed to stimulate interest, find and analyze relevant data, and tell stories with data to empower communities.
Local Open Data: A perspective from local government in England by Gesche SchmidOpening-up.eu
Local Open Data: A perspective from local government in England
to help government and companies to
develop innovative services through the
use of open data and to encourage smart
use of Social Media
HLEG thematic workshop on Measurement of Well Being and Development in Africa...StatsCommunications
HLEG thematic workshop on Measurement of Well Being and Development in Africa, 12-14 November 2015, Durban, South Africa, More information at: www.oecd.org/statistics/measuring-economic-social-progress
Similar to TAUS 2.0 and the Game Changers in Localization (Jaap van der Meer, director of TAUS (20)
The document introduces the Dynamic Quality Framework (DQF), which aims to standardize quality measurements across the translation industry. It describes DQF as inclusive, industry-shared, and data-informed. The framework integrates with common CAT tools and TMS through open APIs to collect translation and review data and provide interactive dashboards and reports for performance tracking and benchmarking at the project and organizational level.
The document discusses the evolution of machine translation (MT) technology over time from early conceptual ideas to modern neural machine translation (NMT) systems. It uses metaphors of a band changing their sound over time by adding new band members, such as an "MT guy", to represent how translation companies can adapt to new technologies. The presentation encourages translation businesses to thoughtfully integrate new tools like NMT by involving stakeholders and focusing on people in the process of transition.
The document summarizes the results of a machine translation evaluation that compared human and machine translations. Several human and machine translation systems were evaluated on a test set containing sentences translated between English and Chinese. The top performing systems were combinations of human and machine translations. There was criticism of claims of machine translation achieving "human parity" due to limitations in the test set only using sentences rather than documents, and evaluators not being qualified translators. Neural machine translation systems are argued to have advantages over statistical and rule-based systems by processing full sentences and storing additional context in hidden layers.
The document discusses how artificial intelligence and neural machine translation will change the role of human translation over time. While AI can handle the translation process at scale, humans will still be needed for local knowledge, problem solving, and tasks like optimizing processes, improving output quality, and ensuring quality. However, a fragmented technology landscape slows businesses down. The solution proposed is an integrated localization hub that connects content systems, translation technology, and translation services through a single API to address current issues where technical knowledge and system fragmentation are still barriers.
As contents published on the Internet are becoming more and more dominated by videos, requirements on the language translation have also changed. Specifically, video publishers and distributors have a significant interest in balancing both the translation time and the accuracy. To this end, Pactera has invested in solutions, which leverage machine translation to reduce the overall translation time, and recruit human translators to improve the accuracy in a Wikipedia-like fashion. At Pactera, we aim to help video contents to reach billions of people that were not possible before.
The document discusses innovation in machine translation and language technology. It notes that translation is becoming more data-driven and algorithmic, with machines learning from large amounts of data. It also mentions that translation may become invisible and automated like utilities such as electricity. The document then lists some concepts characterizing innovative contest candidates in game changer awards, such as advanced machine translation, artificial intelligence, and automated quality evaluation. Finally, it states that six contestants will each have six minutes to pitch their innovative ideas.
Review processes as the last step in quality assurance workflows are “notorious for causing delays and frustrations”. The reason normally is a flawed process: Many manual steps for the PMs, the lack of intuitive, layout-oriented collaboration software, plus the expectation of review to “fix a broken translation” in the last second rather than giving strategic process input. globalReview shifts this paradigm: As an integrated, collaborative platform with full layout editing it provides a positive review experience. At the same time, it pushes quality upstream applying DQF principles: Flexible content profiles define precise quality expectations; issue categories and scoring effectively gauge and also track translation quality over time; a sampling module allows for fast yet accurate quality evaluation. Put together, this allows the customer to raise the process from painful review to strategic quality management and gain valuable business intelligence.
A global P2P Trading Platform for TMs will be introduced. Tmxmall TM marketplace is the core, and client TM software and CATs are the input and output respectively. User of CATs is able to search the TMs of client users while it does not require client users to upload TMs to the cloud.
The presentation will introduce the NLP technologies used in Shiyibao and the main product features, covering the following points:
Function of giving automatic grades for translations based on translation quality automatic evaluation algorithm;
Function of giving automatic comments based on rules matching;
Function of sorting translations according to their similarity or some specific fragments to dramatically improve the efficiency of reviewing and commenting on translations.
In today’s digital economy, content is becoming smaller, more fragmented, and in need of on-demand translation in minutes and around the clock. Traditional localization models are no longer sufficient in meeting these always-on, agile, fast, and small translation requirements of the digital age. This is why mobile translation services like Stepes that are able to deliver quality, speed, and scalability are poised to see tremendous growth. During this 6-minute presentation, Stepes will demonstrate live its instant human translation service for micro content. Powered by human translators from around the world, Stepes is the world’s first mobile translation ecosystem delivering quality translation services using a networking model similar to Uber and Lyft.
This document discusses TruTran's open machine translation platform and the trends in machine translation engine development. It notes that neural network technology allows each company to have its own customized trained neural machine translation engine. The open source nature of neural networks means that machine translation will be "generalized" or available to more users. However, enterprises currently lack professionals skilled in natural language processing and training data can be difficult to process. TruTran's platform aims to address these issues by allowing users to easily upload custom training data and corpora, select a domain to train an engine, and have the engine trained within 6 days on the platform's resources. This gives each company their own commercial-grade machine translation engine at low cost and with their
Kirk Zhang, the COO of Wiitrans, presented on their semantic matching and translator resource management tools which aim to deliver high quality translations by matching content to appropriate translators based on their individual translator profiles and histories. The tools analyze translator-specific language assets, glossaries, and translation memories to best match work to translators and simplify the translation process.
The document describes a computer-aided translation and interpretation training system called CATS. It provides course management, multi-lingual resource centers, and translation management platforms to support online translation and interpretation courses. CATS allows instructors to upload multimedia content and documents, create translation cases and assignments, and evaluate student work. It aims to improve over traditional methods of collecting assignments through email by offering an integrated online platform for pre-class, in-class, and post-class activities.
The document announces a Translation Technology Showcase event hosted by TAUS on February 28, 2017 in Shenzhen. The event will feature presentations from various translation technology companies on topics like multichannel translation for the digital economy, using free and open source tools, leveraging large translation memories, and neural machine translation. The agenda lists out the scheduled presentations and their times. The document also mentions that TAUS recently published an updated Translation Technology Landscape Report covering trends in the industry and profiles of over 80 companies.
Most of LSPs have not converted the translated bilingual documents to TM till now. Even the LSPs have established TMs, they are also confronted with disordered management of TMs and low efficiency. This report will share the way of quick TM establishment with Tmxmall Cloud-Based Smart Aligner, the way of Management of large-scale TMs with Private Cloud-Based TM for achieving pre-translation with large-scale TMs and team cooperation and etc.. Besides, the report will introduce Tmxmall TM marketplace, which is expected to promote TM sharing. Finally, we will share the experience of LSPs on alignment and Private Cloud-Based TM management for reducing translation costs and increasing profits.
SDL is the leader in global content management and language translation solutions. With more than 20 years of experience, SDL helps companies build relevant online experiences that deliver transformative business results on a global scale. Translation Industry continues to grow, and Freelancers, LSPs and Corporate clients all see increased demand as more and more content is created, so we have to address them all. As a Market-leading translation productivity tool, SDL Trados Studio is trusted by over 200,000 translation professionals to boost productivity, control quality and aid collaboration. SDL has launched Trados Studio 2017. This presentation will introduce SDL Trados Studio 2017 and highlight SDL’s new productivity booster- UPLIFT, which is well welcomed by global clients.
This document discusses Lingosail's translation technology products and services, including machine translation, corpus construction, and translation services. It outlines how Lingosail's machine translation process editing (MTPE) solutions can provide easier entry into translation for clients, higher translation efficiency, and more scalable management of translation workflows. The document also describes Lingosail's patent post-editing training course for translators, which saw hundreds of participants last year, and resulted in trainees increasing their translation speed and quality after training.
This document discusses how to introduce machine translation (MT) into a company to improve localization processes. It outlines challenges with the current process of 30 localization loops involving 40 translators across different locations with no quality or cost control. Introducing MT for display text localization could speed up availability, lower costs by 25%, and reduce unnecessary translation loops by 50%. A short-term goal is to use MT for development phases with a final quality loop involving human translation and post-editing. Long-term preparation is needed to expand MT use while addressing risks, quality guidelines, and system environments.
This document discusses integrating XTM Cloud and TAUS DQF to enable higher quality translation projects. Key steps include creating accounts in both systems, configuring LQA parameters and issues in XTM, creating translation projects in XTM with LQA steps, performing translations and LQA reviews in XTM, and then viewing productivity and quality results in the TAUS DQF system. The integration is meant to provide benefits like higher productivity, improved quality, and better data to evaluate machine translation systems.
Quality standards in the industry have come a long way. They have evolved over the years, but their focus on quality definitions based on errors and metrics has remained the accepted wisdom. Expectations of end users are changing. Every piece of content has a job to do, and it is often to touch the heart of users rather than just the mind by delivering information that is accurate and whose quality is measurable. A new “quality evaluation paradigm” is emerging. This calls for a new profile for translators, one that is different from what has been typical for the past few decades. This presentation will look at this trend in more detail, considering how to test these new types of translators fast and effectively. What matters in this emerging quality model and what does it possibly mean for DQF?
CAKE: Sharing Slices of Confidential Data on BlockchainClaudio Di Ciccio
Presented at the CAiSE 2024 Forum, Intelligent Information Systems, June 6th, Limassol, Cyprus.
Synopsis: Cooperative information systems typically involve various entities in a collaborative process within a distributed environment. Blockchain technology offers a mechanism for automating such processes, even when only partial trust exists among participants. The data stored on the blockchain is replicated across all nodes in the network, ensuring accessibility to all participants. While this aspect facilitates traceability, integrity, and persistence, it poses challenges for adopting public blockchains in enterprise settings due to confidentiality issues. In this paper, we present a software tool named Control Access via Key Encryption (CAKE), designed to ensure data confidentiality in scenarios involving public blockchains. After outlining its core components and functionalities, we showcase the application of CAKE in the context of a real-world cyber-security project within the logistics domain.
Paper: https://doi.org/10.1007/978-3-031-61000-4_16
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfTechgropse Pvt.Ltd.
In this blog post, we'll delve into the intersection of AI and app development in Saudi Arabia, focusing on the food delivery sector. We'll explore how AI is revolutionizing the way Saudi consumers order food, how restaurants manage their operations, and how delivery partners navigate the bustling streets of cities like Riyadh, Jeddah, and Dammam. Through real-world case studies, we'll showcase how leading Saudi food delivery apps are leveraging AI to redefine convenience, personalization, and efficiency.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Choosing The Best AWS Service For Your Website + API.pptx
TAUS 2.0 and the Game Changers in Localization (Jaap van der Meer, director of TAUS
1. TAUS Translation Data Landscape Report
Authors: Andrew Joscelyne & Anna Samiotou
Reviewer: Jaap van der Meer
2. The report…
• was published in December 2015
• has been written by TAUS in consultation with
the EU project LT Observatory supervised by
LT Innovate
• has drawn insights through surveys of industry
and interviews with a broad range of
stakeholders
3. The report attempts to answer to:
• Who are the producers and consumers of translation
data? How are they changing?
• Is there a viable “market” for translation data, beyond
the current informal sharing or web- scraping model?
• What can we do to overcome the legal/technical issues
and concerns regarding translation data sharing?
• How could translation data sharing as a natural
practice integrate with the European Digital Single
Market program?
• Which models of translation data circulation work
best? For how long? What could disrupt them?
4. Methods to obtain Translation data
• Leveraging public and open resources
• Creating one’s own resources by human, semi-
automatic or automatic means
• Scraping the web by web crawling: Parallel text
collections to be used mainly by MT systems
• Sharing or exchanging data
• Paying for data: Stakeholders will pay for translation data
when these are known to be uniquely valuable in terms of
relevance and impact to the task at hand, are affordable and
there is no other solution
6. Scenarios for a Translation data
Marketplace
• Datasets: Buy data, sell data, exchange data, bid for data,
order data, offer specific in-domain translation data.
• Datasets & Tools: A commercial service for translation
data together with multilingual enablers and tools that can
provide fingerprints of the data, curate, benchmark, validate
the quality and relevance of the data to the task at hand.
• Trained domain MT engines: Deliver in-domain
translation engines
• Plug & play model: This is the current model used today
for accessing a service in one go.
9. How about a Translation data
Marketplace?
Drivers: highly globalized market – providing
translation data for reasonable price – allow for
benchmarking prior to purchase
Inhibitors: Using other peoples’ resources can be a
blind guess – current lack of tools – imbalance of
high & low resource languages
Challenges: enhance language coverage – address
high risk of local markets being edged by global
players and by plug & play technologies
11. Critical determinants of the way ahead
• We are at the beginning of the translation data
age.
• Content will be king and queen.
• Innovation will be vital: many different competing
solutions will emerge for streamlining the value
chain between raw data and specific translation
requirements.
• The term “translation data” has two meanings:
– we need the data to drive translation automation.
– we also vitally need data about translation: find good
data about global data usage.
Editor's Notes
These facts suggest that globally there is at present little role for any kind of independent translation data marketplace/data hub or data sharing platform.