The document provides information about the University of Wolverhampton's Research Group in Computational Linguistics and Statistical Cybermetrics Research Group. It discusses the groups' expertise in various areas of natural language processing and information retrieval. Key personnel are mentioned, including Ruslan Mitkov, Constantin Orasan, and Mike Thelwall. Ongoing and past projects funded by sources like the EC and NBME are summarized.
This document outlines the structure and deliverables for the EXPERT project. It consists of 8 work packages related to management, user perspectives, data collection, language technology, learning from translators, hybrid approaches, training, and dissemination. Each work package has 2 deliverables and deadlines for completion. The project involves early stage researchers who will receive training, complete secondments, and participate in workshops and a winter school.
The document discusses developing a publication strategy. It emphasizes that a strategy is a plan to achieve goals with limited resources and allows for flexibility. It notes that publishing is important for academic success and discusses elements of a strategy such as understanding publication types, venues, planning objectives, and adapting the plan. The document provides tips for choosing publication venues and journals, considering factors like reputation, impact, and relevance to one's research area.
1. Grant Proposal Writing & Research Policy - Maren Pannemann (UvA)RIILP
This document discusses grant proposal writing and research policy. It provides an overview of various research funding opportunities at the EU, international, and national levels. Some key funding sources discussed include Marie Skłodowska-Curie grants, ERC grants, and NWO grants in the Netherlands. The document offers best practices for grant writing, including structuring the proposal, formulating clear objectives, and emphasizing the scientific problem and how the proposed research will address it. It also discusses developing a competitive CV and gaining early career achievements to strengthen funding applications.
ResEval: Resource-oriented Research Impact Evaluation platformMuhammad Imran
This document proposes a new open and resource-oriented platform for research impact evaluation. It discusses problems with existing solutions like limited data sources and predefined metrics. The proposed solution features a common platform to access various scientific resources, support for personalized metrics, natural language queries, and evaluation of individuals and groups. The architecture defines three layers and prototypes have been implemented for individual/contribution evaluation and group comparison. Future work includes improving the language module and adding more prototype options.
This study investigated the information seeking behavior of 14 final year undergraduate students at the Faculty of Computer Science and Information Technology, University of Malaya. The objectives were to understand how students choose research topics, what information sources and channels they use and prefer, their use of libraries and librarians, use of the Internet, search strategies, and thoughts on ethics. Most students relied heavily on the Internet, past projects, and lecturers for information. They evaluated sources by comparing with other materials and getting input from lecturers. While students were aware of intellectual property issues, many admitted to using pirated software due to the expense of legitimate versions.
The document summarizes information about European Research Council grants, including Starting Independent Researcher Grants and Advanced Investigator Grants. It describes the goals of the grants, eligibility requirements, funding amounts, application deadlines and restrictions. The ERC aims to support excellent researchers and their investigator-driven projects across all fields. Success rates for UK applicants to ERC grants are provided at the end.
This document provides an overview of the C-SAP OER pilot project which aimed to explore open sharing of teaching materials from academic partners in sociology, politics, anthropology, and criminology. The project sought to examine tacit assumptions around resource creation and sharing. Partners contributed approximately 60 credits of materials to repositories like JORUM and MERLOT. The project developed mapping and review tools to help facilitate understanding and reuse of resources by revealing tacit elements normally left unstated. Case studies examined partners' experiences with the process of opening up materials.
Learning and Text Analysis for Ontology Engineeringbutest
This document calls for papers and participation in a workshop on learning and text analysis for ontology engineering to be held in conjunction with the ECAI 2002 conference in Lyon, France. The workshop aims to bring together researchers from linguistics, natural language processing, knowledge representation, and machine learning to discuss issues around building, maintaining, and reusing ontologies and terminological resources. Topics of interest include using texts and linguistic/terminological resources as knowledge sources for building ontologies, applying machine learning and NLP tools to ontology engineering, and learning ontologies from sources like the web. The deadline for paper submissions is March 15th and for motivation abstracts is May 24th. The workshop will include paper presentations, discussions, and
This document outlines the structure and deliverables for the EXPERT project. It consists of 8 work packages related to management, user perspectives, data collection, language technology, learning from translators, hybrid approaches, training, and dissemination. Each work package has 2 deliverables and deadlines for completion. The project involves early stage researchers who will receive training, complete secondments, and participate in workshops and a winter school.
The document discusses developing a publication strategy. It emphasizes that a strategy is a plan to achieve goals with limited resources and allows for flexibility. It notes that publishing is important for academic success and discusses elements of a strategy such as understanding publication types, venues, planning objectives, and adapting the plan. The document provides tips for choosing publication venues and journals, considering factors like reputation, impact, and relevance to one's research area.
1. Grant Proposal Writing & Research Policy - Maren Pannemann (UvA)RIILP
This document discusses grant proposal writing and research policy. It provides an overview of various research funding opportunities at the EU, international, and national levels. Some key funding sources discussed include Marie Skłodowska-Curie grants, ERC grants, and NWO grants in the Netherlands. The document offers best practices for grant writing, including structuring the proposal, formulating clear objectives, and emphasizing the scientific problem and how the proposed research will address it. It also discusses developing a competitive CV and gaining early career achievements to strengthen funding applications.
ResEval: Resource-oriented Research Impact Evaluation platformMuhammad Imran
This document proposes a new open and resource-oriented platform for research impact evaluation. It discusses problems with existing solutions like limited data sources and predefined metrics. The proposed solution features a common platform to access various scientific resources, support for personalized metrics, natural language queries, and evaluation of individuals and groups. The architecture defines three layers and prototypes have been implemented for individual/contribution evaluation and group comparison. Future work includes improving the language module and adding more prototype options.
This study investigated the information seeking behavior of 14 final year undergraduate students at the Faculty of Computer Science and Information Technology, University of Malaya. The objectives were to understand how students choose research topics, what information sources and channels they use and prefer, their use of libraries and librarians, use of the Internet, search strategies, and thoughts on ethics. Most students relied heavily on the Internet, past projects, and lecturers for information. They evaluated sources by comparing with other materials and getting input from lecturers. While students were aware of intellectual property issues, many admitted to using pirated software due to the expense of legitimate versions.
The document summarizes information about European Research Council grants, including Starting Independent Researcher Grants and Advanced Investigator Grants. It describes the goals of the grants, eligibility requirements, funding amounts, application deadlines and restrictions. The ERC aims to support excellent researchers and their investigator-driven projects across all fields. Success rates for UK applicants to ERC grants are provided at the end.
This document provides an overview of the C-SAP OER pilot project which aimed to explore open sharing of teaching materials from academic partners in sociology, politics, anthropology, and criminology. The project sought to examine tacit assumptions around resource creation and sharing. Partners contributed approximately 60 credits of materials to repositories like JORUM and MERLOT. The project developed mapping and review tools to help facilitate understanding and reuse of resources by revealing tacit elements normally left unstated. Case studies examined partners' experiences with the process of opening up materials.
Learning and Text Analysis for Ontology Engineeringbutest
This document calls for papers and participation in a workshop on learning and text analysis for ontology engineering to be held in conjunction with the ECAI 2002 conference in Lyon, France. The workshop aims to bring together researchers from linguistics, natural language processing, knowledge representation, and machine learning to discuss issues around building, maintaining, and reusing ontologies and terminological resources. Topics of interest include using texts and linguistic/terminological resources as knowledge sources for building ontologies, applying machine learning and NLP tools to ontology engineering, and learning ontologies from sources like the web. The deadline for paper submissions is March 15th and for motivation abstracts is May 24th. The workshop will include paper presentations, discussions, and
The document provides an overview of resources available for searching library databases and the UCO library catalog. It discusses searching for articles in periodical databases which require login credentials, and includes ERIC as an example which contains education-related journal articles and documents. It also outlines effective search techniques for databases, such as using keywords, Boolean operators, truncation, and nesting terms. The document concludes by mentioning the Professional Development Collection database and clarifying that the library catalog searches for physical materials and not articles.
This document outlines the course Research Methodology 4IC501. The course aims to develop students' research skills and understanding of the research process. It will help students identify research problems and solutions, understand literature reviews and various data analysis techniques, and communicate research effectively. The course covers topics like formulating research problems, experimental design, data collection/analysis methods, writing papers, and intellectual property rights. Assessment includes assignments, exams, and a research paper. The goal is for students to gain skills in applying research methods, constructing problems, analyzing data, and creating original work.
Closing the Gap: Data Models for Documentary LinguisticsBaden Hughes
This document discusses challenges in managing linguistic data electronically and proposes formal data encoding models. It notes the increasing amounts of linguistic data from fieldwork and issues with disparate encoding formats. Recently developed models address lexicons, interlinear texts, paradigms, syntactic trees, and annotation standards. These new models enable new types of data exploration and manipulation while reducing barriers to use. They may affect linguistic analysis by making some types easier and discovering new possibilities and challenges.
This document provides tips for researching Rachel Heyes Lecturer at The Manchester College. It recommends taking a methodical approach and making organized notes. Sources to consider include textbooks, libraries, the internet, advertising agencies, films, TV, and audience surveys. When using the internet, focus on reliable sources. Textbooks and libraries provide specialist materials, while advertising agencies may have campaign information. Surveys can provide qualitative data but may be difficult. Primary research includes your own analysis, while secondary research uses other people's work. References should be listed and cited properly.
This document outlines the typical sections and structure of a research proposal flow chart. It includes sections for an introduction explaining the research topic and questions, a literature review on previous work in the area, a methodology section detailing how the research will be conducted, preliminary data if available, limitations of the proposed research, and a conclusion restating the importance and contributions of the work. The goal is to clearly present the rationale, approach, and significance of the proposed research project.
This document provides guidelines for a case study assignment on the role of legislation in urban planning for a Master's course. Students are asked to research a case study from international literature on how legislation impacted urban planning in another location. They must write a 5-page report in a specific format, including an introduction with background on the case, body of analysis, and conclusion with recommendations. The report should follow sections with headings and citations should be in APA style. Students have 3 weeks to complete the individual assignment, which is due on November 18, 2015.
Dr Louise Byrne, Research Executive Agency (European Commission) MSCA Present...IrishHumanitiesAlliance
The Marie Skłodowska-Curie Actions (MSCA) are European Union funded programmes that support researcher training, mobility, and career development. The MSCA offer prestigious career opportunities with competitive salaries, full social security, and chances to work with top researchers across Europe and the world. Funding is available for researchers at all career levels in all domains through individual fellowships, innovative training networks, and other programs. Over 10,600 projects have been funded with over 50,000 researchers from 141 countries participating in the 2007-2013 period.
The document discusses the European Reference Index for the Humanities (ERIH), which aims to provide a benchmarking tool for comparing humanities research excellence across Europe. It outlines ERIH's objectives, processes, coverage of disciplines and journals. Key points include that ERIH uses peer review to identify high quality journals, has published initial journal lists in 15 disciplines, and is working to update these lists based on feedback. It also discusses open questions around measuring the influence of open access publications in the humanities.
For more course tutorials visit
www.newtonhelp.com
Technical Paper: Classes and Class Hierarchies in C++
Due Week 10 and worth 125 points
C++ is a general-purpose programming language designed as an improvement to the C programming language. In short, the language is a super set of C. The most important feature of C++ is the concept of a
This document summarizes the TDT39 Empirical Research Methodology course. It is intended for students interested in research in real-world settings and will teach methods for exploring how and why information systems are designed, implemented, and used. The main deliverable is a research plan for the student's master's thesis. The plan will include the research purpose, contributions, method, participants, and paradigm. Students will meet with the instructor to discuss their plan and present it for feedback. The instructor is available for questions but students should consult their thesis supervisor for project-specific questions.
9. Ethics - Juan Jose Arevalillo Doval (Hermes)RIILP
This document discusses ethics in the translation industry. It provides definitions of ethics from Webster's and Oxford dictionaries and lists key ethical values like integrity, transparency, and responsibility. It also outlines professional values for translators such as competence, confidentiality, and avoiding practices that undermine the profession. The document discusses issues in the industry like non-paid internships and accepting unrealistic translation projects. It provides examples of codes of conduct and outlines models for project outsourcing in the translation field.
The document discusses terminology in the translation industry. It outlines several benefits of using terminology, including higher translation quality, shorter turnaround times, and stronger brand identity. Higher quality is achieved through consistent translations and automated quality assessment. Turnaround times are shortened by avoiding time spent searching for terms. Brand identity is strengthened when customers use consistent terminology to affirm their product uniqueness. However, the document notes that in reality, most customers do not invest in terminology management and language service providers have limited time and resources to dedicate to it.
8. Qun Liu (DCU) Hybrid Solutions for TranslationRIILP
The document provides an overview of hybrid machine translation approaches. It discusses selective machine translation which selects the best translation from multiple systems. Pipelined machine translation uses one system for pre-processing or post-processing of another system. Statistical post-editing uses statistical machine translation as a post-editor for rule-based machine translation outputs to improve the translation quality.
17. Anne Schuman (USAAR) Terminology and Ontologies 2RIILP
This document discusses current research topics in terminology and ontologies. It covers trends like term variation, culture-specific semantic differences, definitions, contexts, and knowledge-rich contexts. It also discusses term extraction and mapping. Key areas of research include improving techniques for specialised domains, identifying term variants, providing richer semantic descriptions, and supporting terminological workflows and users.
16. Anne Schumann (USAAR) Terminology and Ontologies 1RIILP
This document provides an overview of terminology and ontologies. It discusses why terminology is important, including for expert communication, knowledge transfer, and management. Terms are defined as linguistic symbols that represent concepts, with the relationship between terms and concepts being one-to-one in terminology. Conceptual relations between concepts are also discussed, including hierarchical relations like "is-a" that define a concept's location within a concept system. The document emphasizes that terminology work should be concept-oriented, structuring concepts into organized concept systems.
2. Constantin Orasan (UoW) EXPERT IntroductionRIILP
The document introduces the EXPERT ITN project, which aims to train young researchers on improving data-driven machine translation through empirical approaches. The project will support researchers during their training and research, with the goal of producing future leaders in the field. It describes the objectives to improve existing corpus-based translation tools by considering user needs, collecting data, incorporating linguistic processing, and developing hybrid approaches. The project consists of 12 individual research projects across 6 work packages and is led by an academic consortium with involvement from private sector partners.
14. Michael Oakes (UoW) Natural Language Processing for TranslationRIILP
This document discusses information retrieval and describes its three main phases: 1) asking a question to define an information need, 2) constructing an answer by matching queries to documents, and 3) assessing the relevance of the retrieved answers. It also covers several important information retrieval concepts like keywords, indexing documents, stemming words, calculating TF-IDF weights, and evaluating system performance using recall and precision.
9. Manuel Harranz (pangeanic) Hybrid Solutions for TranslationRIILP
This document discusses PangeaMT, a machine translation system, and experiences with hybridization. It provides a brief history of PangeaMT, describing its use of open-source Moses and capabilities. It outlines features for experts, including domain adaptation, engine creation and training. The document also discusses experiences with hybridization for linguistically distant language pairs, including challenges of word order differences and tokenization. It compares approaches using Toshiba and Mecab for Japanese reordering, finding Mecab produced higher accuracy. Future work is noted on morphology-rich languages like Russian and distant language reordering.
This document discusses statistical machine translation decoding. It begins with an overview of decoding objectives and challenges, such as ambiguity in possible translations. It then describes decoding phrase-based models using a linear model and dynamic programming approach, with approximations like beam search. Grammar-based decoding is also covered, including synchronous context-free grammar parsing and translation. Key challenges like search complexity and language model integration are addressed.
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translationRIILP
The document discusses a company's evaluation of their machine translation systems. They had hoped automated metrics would correlate with productivity gains reported by post-editors, but found no correlation. Reasons for variability included different translation environments, engines, clients, post-editors, and word volumes. While some metrics indicated better translation quality, other factors like automatic terminology tools impacted productivity more. The company now combines automated metrics with time/productivity data and qualitative reviews to evaluate their machine translation performance.
5. manuel arcedillo & juanjo arevalillo (hermes) translation memoriesRIILP
This document discusses translation memory (TM) tools and features. It provides an overview of the history and evolution of TM tools, including their move to the cloud. It describes key TM features like leveraging previous translations, fuzzy matching, and analysis capabilities. It also explains that while TM tools all provide similar basic functions, they analyze data and display matches differently, which can result in varying word count metrics. Weighted word counts aim to standardize metrics by assigning different values to matches based on their degree of fuzziness.
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...RIILP
This document provides an overview of example-based machine translation (EBMT). It discusses the core steps of EBMT including matching, alignment, and recombination. It also describes different varieties of EBMT such as character-based, word-based, pattern-based, syntax-based, and marker-based matching. Finally, it discusses approaches to EBMT including pure/runtime EBMT and compiled EBMT.
The document provides an overview of resources available for searching library databases and the UCO library catalog. It discusses searching for articles in periodical databases which require login credentials, and includes ERIC as an example which contains education-related journal articles and documents. It also outlines effective search techniques for databases, such as using keywords, Boolean operators, truncation, and nesting terms. The document concludes by mentioning the Professional Development Collection database and clarifying that the library catalog searches for physical materials and not articles.
This document outlines the course Research Methodology 4IC501. The course aims to develop students' research skills and understanding of the research process. It will help students identify research problems and solutions, understand literature reviews and various data analysis techniques, and communicate research effectively. The course covers topics like formulating research problems, experimental design, data collection/analysis methods, writing papers, and intellectual property rights. Assessment includes assignments, exams, and a research paper. The goal is for students to gain skills in applying research methods, constructing problems, analyzing data, and creating original work.
Closing the Gap: Data Models for Documentary LinguisticsBaden Hughes
This document discusses challenges in managing linguistic data electronically and proposes formal data encoding models. It notes the increasing amounts of linguistic data from fieldwork and issues with disparate encoding formats. Recently developed models address lexicons, interlinear texts, paradigms, syntactic trees, and annotation standards. These new models enable new types of data exploration and manipulation while reducing barriers to use. They may affect linguistic analysis by making some types easier and discovering new possibilities and challenges.
This document provides tips for researching Rachel Heyes Lecturer at The Manchester College. It recommends taking a methodical approach and making organized notes. Sources to consider include textbooks, libraries, the internet, advertising agencies, films, TV, and audience surveys. When using the internet, focus on reliable sources. Textbooks and libraries provide specialist materials, while advertising agencies may have campaign information. Surveys can provide qualitative data but may be difficult. Primary research includes your own analysis, while secondary research uses other people's work. References should be listed and cited properly.
This document outlines the typical sections and structure of a research proposal flow chart. It includes sections for an introduction explaining the research topic and questions, a literature review on previous work in the area, a methodology section detailing how the research will be conducted, preliminary data if available, limitations of the proposed research, and a conclusion restating the importance and contributions of the work. The goal is to clearly present the rationale, approach, and significance of the proposed research project.
This document provides guidelines for a case study assignment on the role of legislation in urban planning for a Master's course. Students are asked to research a case study from international literature on how legislation impacted urban planning in another location. They must write a 5-page report in a specific format, including an introduction with background on the case, body of analysis, and conclusion with recommendations. The report should follow sections with headings and citations should be in APA style. Students have 3 weeks to complete the individual assignment, which is due on November 18, 2015.
Dr Louise Byrne, Research Executive Agency (European Commission) MSCA Present...IrishHumanitiesAlliance
The Marie Skłodowska-Curie Actions (MSCA) are European Union funded programmes that support researcher training, mobility, and career development. The MSCA offer prestigious career opportunities with competitive salaries, full social security, and chances to work with top researchers across Europe and the world. Funding is available for researchers at all career levels in all domains through individual fellowships, innovative training networks, and other programs. Over 10,600 projects have been funded with over 50,000 researchers from 141 countries participating in the 2007-2013 period.
The document discusses the European Reference Index for the Humanities (ERIH), which aims to provide a benchmarking tool for comparing humanities research excellence across Europe. It outlines ERIH's objectives, processes, coverage of disciplines and journals. Key points include that ERIH uses peer review to identify high quality journals, has published initial journal lists in 15 disciplines, and is working to update these lists based on feedback. It also discusses open questions around measuring the influence of open access publications in the humanities.
For more course tutorials visit
www.newtonhelp.com
Technical Paper: Classes and Class Hierarchies in C++
Due Week 10 and worth 125 points
C++ is a general-purpose programming language designed as an improvement to the C programming language. In short, the language is a super set of C. The most important feature of C++ is the concept of a
This document summarizes the TDT39 Empirical Research Methodology course. It is intended for students interested in research in real-world settings and will teach methods for exploring how and why information systems are designed, implemented, and used. The main deliverable is a research plan for the student's master's thesis. The plan will include the research purpose, contributions, method, participants, and paradigm. Students will meet with the instructor to discuss their plan and present it for feedback. The instructor is available for questions but students should consult their thesis supervisor for project-specific questions.
9. Ethics - Juan Jose Arevalillo Doval (Hermes)RIILP
This document discusses ethics in the translation industry. It provides definitions of ethics from Webster's and Oxford dictionaries and lists key ethical values like integrity, transparency, and responsibility. It also outlines professional values for translators such as competence, confidentiality, and avoiding practices that undermine the profession. The document discusses issues in the industry like non-paid internships and accepting unrealistic translation projects. It provides examples of codes of conduct and outlines models for project outsourcing in the translation field.
The document discusses terminology in the translation industry. It outlines several benefits of using terminology, including higher translation quality, shorter turnaround times, and stronger brand identity. Higher quality is achieved through consistent translations and automated quality assessment. Turnaround times are shortened by avoiding time spent searching for terms. Brand identity is strengthened when customers use consistent terminology to affirm their product uniqueness. However, the document notes that in reality, most customers do not invest in terminology management and language service providers have limited time and resources to dedicate to it.
8. Qun Liu (DCU) Hybrid Solutions for TranslationRIILP
The document provides an overview of hybrid machine translation approaches. It discusses selective machine translation which selects the best translation from multiple systems. Pipelined machine translation uses one system for pre-processing or post-processing of another system. Statistical post-editing uses statistical machine translation as a post-editor for rule-based machine translation outputs to improve the translation quality.
17. Anne Schuman (USAAR) Terminology and Ontologies 2RIILP
This document discusses current research topics in terminology and ontologies. It covers trends like term variation, culture-specific semantic differences, definitions, contexts, and knowledge-rich contexts. It also discusses term extraction and mapping. Key areas of research include improving techniques for specialised domains, identifying term variants, providing richer semantic descriptions, and supporting terminological workflows and users.
16. Anne Schumann (USAAR) Terminology and Ontologies 1RIILP
This document provides an overview of terminology and ontologies. It discusses why terminology is important, including for expert communication, knowledge transfer, and management. Terms are defined as linguistic symbols that represent concepts, with the relationship between terms and concepts being one-to-one in terminology. Conceptual relations between concepts are also discussed, including hierarchical relations like "is-a" that define a concept's location within a concept system. The document emphasizes that terminology work should be concept-oriented, structuring concepts into organized concept systems.
2. Constantin Orasan (UoW) EXPERT IntroductionRIILP
The document introduces the EXPERT ITN project, which aims to train young researchers on improving data-driven machine translation through empirical approaches. The project will support researchers during their training and research, with the goal of producing future leaders in the field. It describes the objectives to improve existing corpus-based translation tools by considering user needs, collecting data, incorporating linguistic processing, and developing hybrid approaches. The project consists of 12 individual research projects across 6 work packages and is led by an academic consortium with involvement from private sector partners.
14. Michael Oakes (UoW) Natural Language Processing for TranslationRIILP
This document discusses information retrieval and describes its three main phases: 1) asking a question to define an information need, 2) constructing an answer by matching queries to documents, and 3) assessing the relevance of the retrieved answers. It also covers several important information retrieval concepts like keywords, indexing documents, stemming words, calculating TF-IDF weights, and evaluating system performance using recall and precision.
9. Manuel Harranz (pangeanic) Hybrid Solutions for TranslationRIILP
This document discusses PangeaMT, a machine translation system, and experiences with hybridization. It provides a brief history of PangeaMT, describing its use of open-source Moses and capabilities. It outlines features for experts, including domain adaptation, engine creation and training. The document also discusses experiences with hybridization for linguistically distant language pairs, including challenges of word order differences and tokenization. It compares approaches using Toshiba and Mecab for Japanese reordering, finding Mecab produced higher accuracy. Future work is noted on morphology-rich languages like Russian and distant language reordering.
This document discusses statistical machine translation decoding. It begins with an overview of decoding objectives and challenges, such as ambiguity in possible translations. It then describes decoding phrase-based models using a linear model and dynamic programming approach, with approximations like beam search. Grammar-based decoding is also covered, including synchronous context-free grammar parsing and translation. Key challenges like search complexity and language model integration are addressed.
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translationRIILP
The document discusses a company's evaluation of their machine translation systems. They had hoped automated metrics would correlate with productivity gains reported by post-editors, but found no correlation. Reasons for variability included different translation environments, engines, clients, post-editors, and word volumes. While some metrics indicated better translation quality, other factors like automatic terminology tools impacted productivity more. The company now combines automated metrics with time/productivity data and qualitative reviews to evaluate their machine translation performance.
5. manuel arcedillo & juanjo arevalillo (hermes) translation memoriesRIILP
This document discusses translation memory (TM) tools and features. It provides an overview of the history and evolution of TM tools, including their move to the cloud. It describes key TM features like leveraging previous translations, fuzzy matching, and analysis capabilities. It also explains that while TM tools all provide similar basic functions, they analyze data and display matches differently, which can result in varying word count metrics. Weighted word counts aim to standardize metrics by assigning different values to matches based on their degree of fuzziness.
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...RIILP
This document provides an overview of example-based machine translation (EBMT). It discusses the core steps of EBMT including matching, alignment, and recombination. It also describes different varieties of EBMT such as character-based, word-based, pattern-based, syntax-based, and marker-based matching. Finally, it discusses approaches to EBMT including pure/runtime EBMT and compiled EBMT.
10. Lucia Specia (USFD) Evaluation of Machine TranslationRIILP
This document discusses various methods for evaluating translation quality, including manual metrics, task-based metrics, and reference-based automatic metrics. It notes that evaluating translation quality is difficult because the definition of quality depends on factors like the end user and intended purpose. Methods discussed include n-point scales for adequacy and fluency, ranking translations, and counting errors. Issues with subjective judgments, reliability, and defining what makes a translation "best" are also covered.
This document provides an overview of a tutorial on statistical machine translation given by Dr. Khalil Sima'an. The tutorial is divided into two parts, with Part I covering data and models, including word-based models, alignment, symmetrization, and phrase-based models. Part II, given by Trevor Cohn, will cover decoding and efficiency. The tutorial will examine the statistical approach to machine translation using parallel corpora and will discuss generative source-channel frameworks and challenges in estimating translation probabilities from sparse data. It will also explore how current models induce structure in translation data using alignments between source and target language structures.
This document discusses human translation workflow and contains three sections. Section I provides an overview of human translation workflow. Section II discusses professional translation, including market studies, emerging trends, and the translation workflow. Section III focuses on corpus-based translation, outlining guidelines for corpus creation, using corpora for translation training, and concordancing tools.
13. Constantin Orasan (UoW) Natural Language Processing for TranslationRIILP
This document discusses how natural language processing (NLP) techniques can help improve machine translation (MT). It describes some of the linguistic challenges in MT, such as ambiguity at the lexical, syntactic, semantic and pragmatic levels. It then discusses how various NLP tasks, such as tokenization, word sense disambiguation, and handling of named entities could enhance MT systems. Several studies that have successfully integrated NLP techniques like word sense disambiguation into statistical machine translation systems are also summarized.
Sustainability in OER for less used languagesLangOER
Sustainability in OER for less used languages
An initiative of the LangOER network
Open Education Week, Friday, March 14, 2014
Authors: Linda Bradley, Simon Horrocks, Jüri Lõssenko, Anne-Christin Tannhaüser, Sylvi Vigmo, Katerina Zourou
This document provides an introduction to the chapter on design methodologies for CALL courseware projects. It synthesizes responses from contributors on how they conducted needs analyses and determined their didactic approaches. Key points discussed include the importance of valid needs analysis, pedagogical priorities over technology, and the relationship between content and media. The introduction also reflects on long-standing issues in CALL design such as linear vs. hypermedia learning and the drive for "design neutrality".
How can OER enhance the position of less used languages on a global scale?LangOER
Presentation by Gard Titlestad, Secretary General, International Council For Open and Distance Education, (ICDE) at the workshop "The OCW Consortium global conference", Ljubljana 25 April 2014
How can OER enhance the position of less used languages on a global scale?
The event will deliver input to the assessment of the situation for open educational resource around the globe with particular reference to less used languages .
The session will focus on:
· What is the situation when it comes to OER and less used languages?
· What issues arise from that situation – and how could they be met?
· How can OER enhance the position of less used languages on a global scale?
· What policies are favourable to the uptake of quality OER and quality open educational practices in less used language communities?
The workshop will provide input to a working policy paper on OER and challenges and opportunities for less used languages in a global, European, Nordic and national perspective.
How can OER enhance the position of less used languages on a global scale?
Workshop at the OCW Consortium global conference, Ljubljana 25 April 2014
Gard Titlestad, Secretary General, International Council For Open and Distance Education, ICDE
Challenges for OER in non-English-speaking countriesicdeslides
This presentation was for a panel discussion on “Challenges for OER in non-English-speaking countries”, organised by the UNESCO Institute for Information Technologies in Education. It organized a special session on OER in non-English-speaking countries as a satellite event of the 2nd OER World Congress.
Chances and Challenges in Comparing Cross-Language Retrieval ToolsGiovanna Roda
The document discusses the CLEF-IP track, which evaluates cross-language intellectual property retrieval tools. It notes that CLEF-IP is organized by the IRF and first ran in 2009. The 2009 track involved finding prior art for patents, with 15 academic participants submitting 48 experiments. The track produced experimental data that could help improve systems and fostered collaboration.
The document outlines plans for the School of Digital Technologies at Tallinn University. It discusses the scope and focus areas of applied informatics, including digital safety, language technology, data analysis, smart houses, and ICT curriculum development. It provides details on specific projects and research in these areas, led by staff members. It also proposes the development of a software laboratory to support interdisciplinary project work and software development. In summary, the document presents an overview of the research, teaching, and development activities of the School of Digital Technologies across various domains of applied informatics.
Annotated Bibliography Of Language DocumentationSarah Marie
This document provides a summary of key works related to language documentation. It begins by defining language documentation and discussing its goals of creating organized language corpora. It then summarizes several reference works on language documentation theory and practice. It also summarizes anthologies and collections of papers on language documentation, as well as conference proceedings. Finally, it discusses journals, and theoretical aspects of language documentation like defining its scope, data collection and analysis, and metadata standards.
I held this presentation at the first PKP Scholarly Publishing Conference in Vancouver Canada, on July 12th 2007. Check out the general conference blog if you want to know more about the event:
http://scholarlypublishing.blogspot.com/
You may also be interested in things marked with the "open-access" tag in my own blog:
http://corpblawg.ynada.com/
EMMA presentation - Alfons Juan - Language technologies for Education: recent...EUmoocs
During the 2nd Internet of Education Conference 2015 that took place on 18 September 2015, Sarajevo, Alfons Juan presented the recent results by the MLLP group, with these slides, which include considerable data about the EMMA project.
To know more about the EMMA project go to: http://platform.europeanmoocs.eu/
Eurogene is an e-learning system in the domain of genetics that provides free multimedia learning resources in nine languages for statistical, medical and molecular genetics and delivers them to students and professionals. The Eurogene content includes presentations, reviewed research articles, images, videos and learning packages submitted by world-leading geneticists.
An essential part of the Eurogene system is a multilingual search engine that allows to search for content in one language while retrieving the results in other languages. This is complemented by the use of a machine translation system fine-tuned for genetic terminology. The search engine uses a query language similar to PubMed.
Eurogene also aims at providing intelligent ways of navigation through the e-Learning system. As new learning resources are being continuously submitted to the system, it is not possible to maintain links between them manually. Eurogene automatically links resources that are semantically similar using natural language processing.
This document outlines the concepts and history of Content and Language Integrated Learning (CLIL). It defines CLIL as teaching subjects through a foreign language. CLIL began in international schools in the 1990s and spread across Europe. It aims to integrate language learning into mainstream education to promote multilingualism. The document discusses key terms, advantages of CLIL, challenges, and examples of CLIL programs in Spain.
Eurocall2015 enhancing teaching and learning of less used languages through o...LindaBradley35
This document summarizes the LangOER network project which aims to enhance teaching and learning of less used languages through open educational resources (OER) and practices (OEP). The network involves 9 partners across Europe. It addresses how OER can benefit less used languages and foster linguistic diversity. It conducted research, teacher training, and engaged stakeholders of regional languages. The training course for teachers exceeded expectations by providing useful resources, feedback, and inspiration for using and contributing OER to support language learning.
Enhancing teaching and learning of less used languages through Open Education...Web2Learn
Presentation of LangOER project at the EUROCALL 2015 conference, Padova, Italy, 26-29 August. Joint presentation by Linda Bradley, Gosia Kurek and Katerina Zourou
TPCK: Use of ICT to teach/improve competence in listening to Englishpaula hodgson
The document discusses using ICT to improve competence in listening to English as a second/foreign language. It outlines the technological, pedagogical and content knowledge required and provides examples of online resources that can be used for listening practice, including podcasts, videos, and interactive exercises. The intended learning outcomes are to develop skills in designing listening tasks and identifying global listening resources using blended learning approaches.
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana
Here are a few approaches to address the context demand challenge for machine translation of cultural heritage content:
- Leverage knowledge graphs and ontologies to disambiguate terms based on conceptual relationships
- Train domain-specific models on large cultural heritage corpora to capture nuances of language use in different contexts
- Perform multi-task learning to optimize models for both translation accuracy and conceptual mapping between languages
- Allow users to provide feedback to iteratively improve disambiguation of ambiguous terms over time
- Develop specialized interfaces that surface contextual clues from objects to help machine translation
The goal is to mimic how humans understand intended meaning based on surrounding context clues. Combining linguistic and conceptual techniques can help machines do the same.
The GRIAL research group was established at the University of Salamanca to conduct interdisciplinary research in fields related to human-computer interaction and e-learning. The group has numerous national and international research projects, teaches various university courses, and provides consulting services related to e-learning/technology solutions. Current projects include the MIH project to develop multilingual teaching tools on history and geography, and the ELVIN project to create an online social network for language learning in public administration.
Similar to 1. EXPERT Winter School Partner Introductions (20)
Gabriela Gonzalez attended an expert project showcase in Rome, Italy in May 2016 where she participated in roundtable discussions on the relationship between academia, industry, and translators. She noted that while improvements are needed for translators, the main issue is whether translator needs align with industry interests. Gonzalez advocated for greater collaboration between translators, software developers, and researchers to create more user-friendly translation tools. She concluded by expressing her hope that the industry would adopt research findings and that she could be more involved in sharing experiences to improve quality assurance processes.
Pangeanic is an MT company founded in Valencia, Spain with offices in Tokyo, London, and Shanghai. Pangeanic's PangeaMT system was the first commercial application of the open-source Moses platform. It has been further developed and customized for the localization industry. Pangeanic has worked with clients such as Sony Europe to provide MT services and experiences. The company's system includes features such as monolingual training, integration with Apertium, and automated data cleaning. Pangeanic advocates for empowering translators and users in controlling MT systems and sees MT as a business opportunity to transform how translation services are provided and create new revenue streams.
Carla Parra Escartin - ER2 Hermes Traducciones RIILP
This document discusses a study on the productivity of translators when post-editing machine translation (MT) outputs compared to translating from scratch. The study was conducted with 10 in-house translators post-editing the output of an MT system customized for 3 years. It found that all but one translator were faster at post-editing MT outputs compared to translating from scratch. Automatic evaluation metrics like BLEU, TER and a fuzzy match score were found to correlate with productivity gains from MT. Thresholds for productivity gains were proposed based on these metrics.
Hermes Traducciones is the 15th largest translation company in Southern Europe and 154th globally. It is certified under quality standards ISO 9001 and EN 15038. The company has 25-30 permanent employees, over 150 freelance translators in its database, and translation teams in Portugal and Brazil. Hermes provides a wide range of translation and localization services, especially in technical fields like engineering and software. It also collaborates with universities on research projects evaluating machine translation and its potential to increase translation productivity and savings.
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic RIILP
This document describes improving hybrid translation tools using a full-text search engine approach. It discusses using natural language processing techniques and a translation memory database indexed with ElasticSearch to improve fuzzy matching. The goal is to maximize reuse of existing human translations by handling linguistic features like string transformations, part-of-speech tagging, and tokenization.
KantanMT.com is a statistical machine translation platform that is cloud-based and highly scalable. It provides automated translations at high speed and quality by fusing translation memory, machine translation, and rules. The document then discusses KantanMT's vision, some of its key features and statistics, locations it operates from including the INVENT Concept Space and School of Computing, how it obtained funding from the Commercialization Fund, and its journey from starting as a prototype to becoming widely adopted with billions of words translated.
This document describes CATaLog, a translation tool that provides:
- Incremental machine translation, automatic post-editing, and translation memory capabilities to enhance translations over time.
- Color-coded matching of source segments to translated segments to reduce cognitive load on translators.
- Online project management, translation, and review capabilities without requiring local installation.
This document discusses optimizing machine translation systems for user benefit. It outlines several ways to measure translation quality and utility, including editing time and effort. Current approaches include post-processing machine translation, learning from translator feedback, and using quality estimation to guide humans. The document advocates formalizing the task purpose and taking advantage of user context to explicitly train systems to maximize user benefit, such as optimizing interactive prediction for translation or post-editing tasks. The vision is for task-based optimization to be applied beyond machine translation to any user-agent interaction scenario.
The document summarizes the results of a survey investigating the needs and preferences of translators regarding translation technologies. The survey looked at translators' usage of computer-assisted translation (CAT) tools, machine translation, terminology management tools, and corpora. It found that while CAT tools are widely used, features like machine translation and terminology management that appear as both most useful and most disliked require further improvements to be truly useful. Respondents emphasized needing tools that are simple to use and integrate multiple resources like translation memories and corpora. The survey revealed both opportunities to better meet translators' needs and their varying attitudes towards the role of technology in translation work.
This document discusses quality estimation of machine translation using the QuEst++ framework. It summarizes that QuEst++ can predict the quality of unseen machine translated text using only the source and target texts without references, extracting features to build models that estimate metrics like post-editing effort and time from limited labeled training data. The framework extracts features at the word, sentence and document level from the source and target texts and information from the machine translation system, then trains models using those features to predict quality scores for new translations.
The document discusses evaluating terminology tools through their features. It first introduces how terminology is important for translation and natural language processing. It then explores the features of Terminology Extraction Tools and Terminology Management Tools. These include functions like term extraction, context extraction, and glossary management. The document evaluates several specific tools to compare their feature sets. It concludes by emphasizing the importance of identifying user needs and systematically testing tools to select the most appropriate one.
This document discusses combining translation memory (TM) and statistical machine translation (SMT). It summarizes that TM works best for repetitive text but SMT is more reliable when there are no close matches. It then reviews the speaker's previous work on combining TM and SMT during decoding and before decoding, and presents results showing BLEU score improvements on several language pairs.
The document discusses the differences between how ontologies are used in scientific research versus industry. In scientific research, ontologies focus on creating and extending existing generic ontologies and validating ontology induction methods, using ontologies to improve natural language processing technologies. In industry, ontologies are used as value-adding knowledge bases for various purposes like matching product reviews to categories, terminology standardization in machine translation, and matching resumes to jobs. The document argues that bridging the gap between scientific and industry usage of ontologies requires more domain-specific data and discoveries, true application focus, and open data flow.
This document discusses Acclaro's quality management program. It introduces their services and clients, then describes their quality program which uses a customized memoQ feature to track errors. It discusses two client cases, including a technical software company and an online media company. The quality assurance model and customization options are demonstrated. Benefits include quantitative measurement and issue identification. Challenges include scalability and technical bugs. The goal is more integrated quality reporting and useful statistics.
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015RIILP
The document discusses collecting and cleaning multilingual data. It describes estimating the amount of parallel data that exists in Common Crawl, testing different crawlers, and developing a machine learning approach to classify translation units as either true translations or errors. Key points include estimating that Common Crawl contains around 1 billion parallel pages, crawlers tested had low recall, and the best performing model for classifying translation units was an SVM classifier with an F1-score of 0.81.
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015RIILP
This document summarizes the results of a survey on machine translation (MT) usage among professional translators. Some key findings include:
- 36% of respondents currently use MT, while 38% do not use it and do not plan to. Most saw potential benefits from high-quality MT.
- MT is used equally for resource-rich and resource-poor languages. Technical domains like ICT saw higher MT usage.
- Higher computer competence and IT training were associated with greater MT use. Translators working with agencies also used MT more.
- While MT can provide benefits, respondents noted it cannot replace humans and may threaten jobs or lower wages. Better quality is needed.
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015RIILP
The document describes a statistical automatic post-editing (APE) system that aims to improve machine translation output with minimal human effort. The system uses hierarchical phrase-based statistical machine translation trained on machine translation output and reference human translations. The system first cleans and preprocesses data, generates improved word alignments, and then performs hierarchical phrase-based SMT to output post-edits. Evaluation shows the APE system outperforms the baseline machine translation according to both automatic metrics and human evaluation, requiring less post-editing effort.
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015RIILP
This document summarizes a study that investigates using distributional similarity measures (DSMs) to assess the relatedness between documents in comparable corpora. The study uses three DSMs - number of common entities, Spearman's rank correlation coefficient, and Chi-square - on four subcorpora from the INTELITERM corpus. The results show the subcorpora generally contain highly related documents, though the smaller Spanish translated corpus shows more inconsistency. Future work could involve expanding experiments to other languages and DSMs, and using the approach to filter unrelated documents.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfTechgropse Pvt.Ltd.
In this blog post, we'll delve into the intersection of AI and app development in Saudi Arabia, focusing on the food delivery sector. We'll explore how AI is revolutionizing the way Saudi consumers order food, how restaurants manage their operations, and how delivery partners navigate the bustling streets of cities like Riyadh, Jeddah, and Dammam. Through real-world case studies, we'll showcase how leading Saudi food delivery apps are leveraging AI to redefine convenience, personalization, and efficiency.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
4. Mission Statement
RIILP produces internationally leading research,
offers first class research supervision and teaching
in the interdisciplinary areas of information and
language processing and delivers cutting-edge
practical (including commercial) applications to the
benefit of the society based on its research output
5. Structure and context
Research Group in Computational Linguistics
Statistical Cybermetrics Research Group
Benchmark: the very best national and international
expertise in every area
Both groups enjoy considerable national and
international reputation
External income generation > £4,000,000 over last
five years
6. Statistical Cybermetrics
Research Group
Statistical Cybermetrics entered in Unit of Assessment “Library and
Information Management”: in national context, Wolverhampton
ranked joint second with four more universities.
According to league tables (The Guardian, The Times and
Research Fortnight), research in Library and Information
Management at the University of Wolverhampton is one of the six
best in the UK.
Head of SCRG – Prof. Mike Thelwall
Rated 3rd most successful UK library and information science
researcher of all time (Jan. 2007)
8. RAE’2008 results
Computational Linguistics entered in Unit of Assessment
“Linguistics”: Wolverhampton ranked joint third with two
more universities in a large company of old, researchintensive universities.
According to league tables (The Guardian, The Times
and Research Fortnight), research in Linguistics at the
University of Wolverhampton is one of the six best in the
UK
Due to Computational Linguistics, in Linguistics we are
ahead of Oxford, Cambridge, UCL, Lancaster,
Manchester, Reading...
9. Research Group in Computational Linguistics:
People
Founded in 1997 by Ruslan Mitkov
Currently:
1 full-time Professor
2 part-time Professors
2 Readers
1 Senior Lecturer
7 research fellow and research associates
12 PhD students
4 Administrators
Project research assistants, Masters students, Visiting
Professors, Honorary Research Fellows, guest
researchers
10. Research Group in Computational Linguistics: key personnel
Ruslan Mitkov
Publications: more than 200 publications in areas including:
anaphora resolution (>2,000 citations, >40 keynote speeches
generation of multiple-choice tests (>15 keynote speeches)
Key books:
Mitkov R. 2002. Anaphora resolution. Longman.
Mitkov R. (Ed). 2003, 2005. Oxford Handbook of Computational Linguistics. Oxford
University Press.
Current editorial distinctions:
Executive Editor of the Journal of Natural Language Engineering (Cambridge University
Press)
Editor of the Oxford Handbook of Computational Linguistics (Oxford University Press)
Editor-in-Chief of John Benjamins’ book series in Natural Language Processing (NLP)
Editor Consultant of Oxford University Press publications in Computational Linguistics
Chair or member of a number of Programme Committees and Editorial Boards
11. Research Group in Computational Linguistics: key personnel
Dr. Constantin Orasan (Deputy Head of the Group)
Reader in Computational Linguistics; 60+ publications, PI on a
number of projects, leading figure in summarisation, extensively
involved in Master programme teaching
Dr. Michael Oakes
Reader in Computational Linguistics, leading figure in areas
such as information retrieval, authorship identification, statistical
methods for linguistics and translation
12. Research Group in Computational Linguistics: key personnel
Prof. Patrick Hanks:
Professor of Lexicography, leading figure in
(computational) lexicography and corpus-based methods
to dictionary compilation
Dr. Le An Ha:
Lecturer in industrial Natural Language Processing, Project
manager of the US NBME-funded project, involved in NLP
application to e-learning and industrial projects
Richard Evans
Research fellow, involved in NLP application to healthcare
Led the FIRST project proposal process
13. Delivering cutting-edge research in
Coreference/ anaphora
resolution
Automatic generation of multiple
choice texts
Text summarisation
Question answering
Temporal processing
Named entity recognition
Lexical knowledge acquisition
Discourse processing
Information extraction
Computational lexicography
Text simplification
Plagiarism detection
Evaluation of NLP
•
•
•
•
•
•
•
Topics related to translation
Term extraction
Machine Translation
Multilingual NLP
Translation Memory
Translation Universals
Comparable corpora compilation
for translators
• Statistical methods to translation
• Generation of test items
14. Some recent highlights
RAE’2008 feedback: research output internationally leading, internationally
excellent and internationally recognised
World best performing system in temporal processing (Georgiana Puscasu)
World best cross-lingual information retrieval system with English as target
language (Iustin Dornescu, Constantin Orasan, Georgiana Puscasu)
World best GikiP system (a competition dealing with geographical questions
on Wikipedia) (Iustin Dornescu)
Best anaphora resolution system (Iustin Dornescu)
Oxford University Press statement that the Oxford Handbook of
Computational Linguistics has been the most successful OUP Handbook
ever.
15. Project in focus/success story: Rapid Item Generation
Two projects funded by a major US Board of Medical
Examiners (NBME) on generation of test questions for the
medical domain
Pioneering computer-aided approach
First stage successfully passed real user testing
For English but possibility to extend to other languages
Since then, the NBME have gone on to request an annual
rolling contract of around £100,000 for us to continue working
on items for them.
They are currently trialling a second project with us which, if
successful, will bring in an additional £45,000 p.a.
16. Recent EC-funded projects
QALL-ME (Question Answering Learning technologies in a
multiLingual and Multimodal Environment)
Funding body: European Commission FP6 ICT
Total EC contribution: €2,400,000. WLV share: €700,000.
Runs from October 2006 – September 2009
TELL-ME (Towards English Language Learning for
MEdical professionals)
Lifelong Learning Programme Leonardo da Vinci
Total EC contribution: €370,401. WLV share: €95,375.
Runs from January 2012 – December 2013
FIRST (A Flexible Interactive Reading Support Tool)
EC FP7
Total EC contribution: €2,008,754. WLV share €487,440.
Runs from October 2011 – September 2014
17. Other ongoing projects
DVC, AHRC, £605,586
Funding body: AHRC
Total contribution: £605,586. WLV share: £605,586.
Runs from October 2012 – September 2015
NBME projects, NBME, more than £1,000,000
Funding body: NBME
Total contribution: > £ 1,000,000. WLV share: > £ 1,000,000.
Runs from January 2004
18. Strategic topics
Language technology for medical applications (including
language disorders)
E-learning
Translation Technology
Bridging the gap between academia and the industry
Impact on society
20. University of Málaga (Spain)
Research Group in Lexicography and
Translation (Lexytrad, HUM-106)
21. Index
1. Aims and activities of UMA
2. Research Group HUM-106
3. Expertise - HUM 106
4. Key staff involved in TELL-ME
22. Aims and activities of UMA
The Universidad de Málaga (UMA): over 36,000 students
and over 2,500 teaching staff.
Well established history in regional, national and European
project management: 73 international projects (at present
23 onging European projects).
National and international patents for the results of its
research.
UMA is an International Campus of Excellence
(Andalucía TECH) since 2010.
Watch http://www.youtube.com/watch?v=_nXoV8oiGvo
23. Research Group HUM 106 (I)
The research group Lexicography and Translation (HUM-106) at UMA is
an international leader in the field of corpus-based Translation
Studies, E-Learning and Translation Technologies.
Directed by Prof. Gloria Corpas since 1997.
The group comprises 14 researchers and is a recognised leader in
areas of E-Learning, Linguistics, Corpus Compilation, Multilingual
Lexicography, Terminology, Translation Training, Translation Studies,
including Revision, Quality Control, Translation Technologies and Usercentred Translation Evaluation.
24. Research Group HUM 106 (II)
The group works with a number of languages, including Spanish,
German, Italian, French and English.
The research group HUM-106 was rated as one of the top
performing units within Arts and Humanities in the 2010
Autonomic assessment exercise by the Andalusian regional
government (97 points out 100).
Further information at http://www.uma.es/hum106
25. Expertise - HUM 106 (I)
International R&D Projects
2004-2006
- Standard Linguistico Europeo per il Settore del Turismo (SLEST) [Linguistic
standard for the tourism industry]. Funding source: European Comission (20042006).
Funding source: Lifelong Learning Programme (LLP)
2004-2007
- HESPERIA. Repertorio analítico de lexicografía bilingüe: diccionarios italianoespañol y español-italiano. [HESPERIA: Analytical bilingual lexicography index:
Spanish/Italian – Italian/Spanish dictionaries].
Funding source: Italian Ministry of University and Scientific Research (MIUR).
26. Expertise - HUM 106 (II)
2005-2008
- ACTUAL: Lingüística contrastiva [Actual: Contrastive Linguistics]. Funding
source: Italian Ministry of University and Scientific Research (MIUR).
2008-2010
- CHINESECOM – Competences in Elementary Chinese as a mean to improve
competitiveness of European Union companies. Funding source: Lifelong
Learning Programme, (LLP) - Key Activity 2 - Multilateral project.
2012-2013
- TELL-ME (Towards European Language Learning for MEdical professionals).
Funding source: Lifelong Learning Programme, (LLP) - Key Activity 2 Multilateral project.
27. Expertise - HUM 106 (III)
National R&D Projects
1999-2002
- Diseño de un tipologizador textual para la traducción automática de textos jurídicos
español → inglés/alemán/italiano/árabe). [A Textual Typologiser for Machine-Translation
of Legal Texts (Spanish « English/German/Italian/Arabic)].
Funding source: Spanish Ministry of Education: Research & Development National
Programme.
2003-2006
- TURICOR: Compilación de un corpus de contratos turísticos (alemán, español, inglés,
italiano) para la generación textual multilingüe y la traducción jurídica. [TURICOR: A
multilingual corpus of tourism contracts (German, Spanish, English, Italian) for automatic
text generation and legal translation].
Funding source: Spanish Ministry of Science and Technology.
28. Expertise - HUM 106 (IV)
2008-2011
- Espacio único de sistemas de información ontológica y tesauros sobre el
medio ambiente: Ecoturismo
Funding source: Spanish Ministry of Education: Research & Development
National Programme.
2012-2015
- INTELITERM: Sistema inteligente de gestión terminológica para traductores.
Funding source: Spanish Ministry of Education: Research & Development
National Programme.
29. Expertise - HUM 106 (V)
Regional R&D Projects
2006-2009
- La contratación turística electrónica multilingüe como mediación intercultural: aspectos
legales, traductológicos y terminológicos. [Multi-lingual Tourism E-contracts: legal,
translational and terminological aspects].
R&D Project for Excellence. Andalusian Ministry of Education, Science and Technology.
2008-2012
- Nuevo diccionario de aprendizaje (learners' dictionary) del español como lengua
extranjera de difusión on-line.[New on-line learners’ dictionary of Spanish as a Foreign
Language].
R&D Project for Excellence. Andalusian Ministry of Education, Science and Technology.
30. Expertise - HUM 106 (VI)
Others
2 Coordinated Research Activities
5 Networks
More than 20 E-learning and Innovation Projects
More than 20 Thesis Dissertations
More than 40 M.A. Dissertations
For further information see http://www.uma.es/hum106/investigacion_en.html
31. Key staff involved in EXPERT (I)
1. Prof. Gloria CORPAS (gcorpas@uma.es)
-
Professor in Translation and Interpreting at UMA.
-
Prof. G. Corpas in no. 2 in the Spanish national ranking of Translation and Interpreting
(http://hindexscholar.com)
-
She acts as a Ministry advisor on the Bologna Process via the Spanish Agency ANECA.
-
She has been actively involved in the development of the UNE-EN 15038:2006 as AEN/CTN
174 and CEN/BTTF 138 Spanish delegate. Spanish expert for the future ISO Standard (ISO
TC37/SC2-WG6 "Translation and Interpreting").
-
Her publications also deal with didactic innovation, the design of virtual university knowledge
communities for Translation studies, virtual collaborative environments, e-learning platforms and
virtual teaching of subjects specializing in scientific and technical translations.
-
She has one patent (ReCor), and she received in 1995 the Euralex Verbatim Award and in
2007 Spanish Translation Technologies Observatory Award, with Dr. M. Seghiri.
32. Key staff involved in EXPERT (II)
1. Dr. Jorge LEIVA (leiva@uma.es)
- Senior Lecturer in Translation and Interpreting at UMA
UMA and professional translator.
at
- His research fields include specialised translation and phraseology.
- From September 2008 to March 2009 was awarded a research grant
for Harvard University (Massachusetts, USA).
- He has also been a member of a variety of research projects
focusing on specialised translation, text corpora and e-learning.
- University’s 2005 Ph. D. Best Student Prize.
33. Key staff involved in EXPERT (III)
3. Dr. Miriam SEGHIRI (seghiri@uma.es)
-
Senior lecturer in Translation and Interpreting at UMA.
-
She has also worked at Dickinson College (PA, USA), the University of Murcia and the
University of Cordoba.
-
She has participated in several European, national and regional R&D projects.
-
She has been awarded several research grants for Dickinson College (PA, USA) and Università
di Perugia (Italy).
-
Her research fields range from specialised translation to corpus linguistics and ICTs, the
outcome of which has been made public in national and international academic conferences
and publications.
-
-She has one patent (ReCor), and she received in 2007 Spanish Translation Technologies
Observatory Award, with Dr. G. Corpas. University’s 2006 Ph. D. Best Student Prize.
34. Key staff involved in EXPERT (V)
5. ESRs
ESR1:
Anna Zaretskaya, from Russia. Investigation of translators’
requirements from translation technologies (Supervisor: Miriam Serghiri at UMA, and
co-supervised by Elia Yuste from Pangeanic). Permit Visa: pending.
ESR3:
Hernani Costa, from Portugal. Collection and preparation of multilingual
data for multiple corpus-based approaches to translation (Supervisor: Dr. Gloria Corpas
at UMA and co-supervised by Marco Trombetti from Translated and ER1). ERS3 signed
his contract on the 2nd of September 2013 .
6. ERs
ER1 will work on Investigation of automatic methods for collection and preparation of
multilingual data (Supervisor name: Marco Trombetti at Translated and co-supervised by
Jorge Leiva from UMA).
36. - An academia-industry research consortium dedicated to delivering
disruptive innovations in digital media and intelligent content such
as multilingual content analysis
- Led by Trinity College Dublin and co-hosted by Dublin City
University
- Sponsor by both Science Foundation Ireland and Industry
Partners including Symantec, DNP, Microsoft, Intel, Xanadu,
WeLocalize, Alchemy
37. CNGL Research Themes
Tuning Text
Analytics
Event & Opinion
Extraction
Content Aware
Multilingual
Search
Contextualisation
Modality
Independent
Intelligent Machine
Translation
Social Localisation
Intelligent
Post-Editing
38. CNGL @ Dublin City University
Professor Josef van Genabith: NLP, MT
Professor Qun Liu: MT, NLP
Dr. Gareth Jones: IR, Multi-Modal
Dr. Sharon O'Brien: Translation Technology
Dr. Jennifer Foster: NLP
40+ staffs and PhD students
41. 15th company in
Southern Europe and
154th in the world
according to
Common Sense
Advisory’s 2013 listing
42. hermestr@hermestrans.com
www.hermestrans.com
Madrid Office:
Cólquide, 6 - portal 2, 3.º - I
Edificio Prisma
28230 Las Rozas (Madrid, Spain)
Teléfono: (+34) 91 640 7640
Fax: (+34) 91 637 8023
Malaga Office:
Parque Tecnológico de Andalucía
Av. Juan López Peñalver, 17 - 3.ª - 6
Edificio Centro de Empresas
29590 Campanillas (Malaga, Spain)
Teléfono: (+34) 952 020525
Fax: (+34) 952 020529
43. COMMITMENT WITH QUALITY:
Cooperation with official agencies
•
Company present in the Spanish Technical Committee #174 at
AENOR for quality translation services, with the support of the
European Committee for Standardisation (CEN), the Spanish
Standardisation Association (AENOR) and the European Union of
Translation Companies Association (EUATC).
•
Juan José Arevalillo, Hermes Traducciones Managing Director, is
the current Chairman of the Spanish Technical Committee #174 in
AENOR for translation and related services.
44. SGR
PERFORMANCE MANAGEMENT SYSTEM
PRODUCTIVITY AND QUALITY CONTROL
• Daily monitoring of quality and productivity of our team in order to
guarantee an improved control over our translations
• Review, revision and edit of our translations by a second or third
specialist other than the original translator
• Use of proprietary templates for revising,
reviewing and editing our translations in
compliance with EN15038 quality standard
• Use of the LISA QA MODEL standard for
localisation review and SAE J2450
standard for automotive translation review
45. PLUNET-BASED
TRANSLATION PROJECT MANAGEMENT
• End to end translation project management system through a Plunet
platform
• Compliant with our double quality certification requirements
46. HERMES DIFFERENCES
•
Founded in 1991 by former employees of the Localisation Group of Digital
Equipment Corporation (currently Hewlett-Packard).
•
Specialising in software and website localisation, as well as technical
translation.
•
70% of our production is done by our own in house resources.
•
Translation services in 30 language pairs.
•
Ongoing training of our staff.
•
End-to-end solutions for our customers.
•
Internal department of applied technology, including MT.
Image of Hermes god at
Louvre Museum.
47. HERMES EXPERIENCE
• 28 years of localisation experience (22 as a company and 6 at Digital
Equipment Corporation, currently Hewlett-Packard).
• Over 60,000 localisation projects in 22 years, including multi-lingual projects.
• Comprehensive expertise and know-how in computer-assisted translation and localisation-specific
applications: SDL-Trados product family,SDL Studio 2011, memoQ, Déjà-Vu, IBM Translation Manager,
Star Transit, WordFast, Catalyst, Passolo, across, Idiom World Server, Microsoft Helium, Microsoft
Localisation Studio and many others.
• Comprehensive expertise and know-how in quality control programs: HelpQA, HTML HelpQA, Apsic
Xbench, MS Help Workshop, MS HTML Help Workshop and others.
• Comprehensive know-how in DTP, text processing and imaging applications: Adobe FrameMaker,
Microsoft Word, Adobe InDesign, Adobe PageMaker, PaintShop Pro,
Adobe Illustrator, Adobe Photoshop, etc.
• Proprietary terminology database covering more than 1,000,000 entries of different languages and
domains.
• 35 million of managed words per year, and an average of 6,000 translation per year.
• Centralized Plunet-based translation project management system.
52. Pangeanic
• Pangeanic took the initial versions of Moses in 2009 as an in-house project
to help translation production needs. It was the first company in the world to
transition Moses from academic to a commercial environment, as reported
in Euromatrixplus.
• The small in-house project grew into a full platform overcoming many of its
limitations with a full set of new features and offering the translation
community the opportunity to have machine translation for the masses.
• The platform now includes full re-training features, glossary upload, a full
TMX / training material management system, the ability to create engines on
the fly as well as the possibility to hybridate with pre- and post- modules.
• Our presentation will describe the tool we have made available for the
project.
54. Who is Translated?
Web-based Language Service Provider
Since 1999, providing human translation in 80 languages to over
35,000 customers thanks to 70,000 professional translators.
Tech Company
Focus on technology to automate processes and make
translation more efficient.
55. Workflow Automation
Fully automated translation management system
that connects customers and translators.
Automate all repetitive tasks and focus only on
what brings value to our customers.
56. Content Reuse
MyMemory
Largest translation memory server (6 billion words)
Integrated in most computer-assisted translation tools
100% Free
Leverage existing linguistic content to make translators
more productive.
57. Translation Environment
MateCat
Deep integration of MT - MT technology that learns from the
users in real time
Collaborative environment - Online translation with multiple
users
Fast and easy to use - Virtually no learning curve
Increased privacy protection – Clients’ documents are not
sent out to translators
60. USAAR: institution
# students: 18 500
16% international students
• Dept. of Applied linguistics, Translation
and Interpreting
• Dept. of Computational Linguistics &
Phonetics
• German Research Centre for Artificial
Intelligence
• Cluster of Excellence on Multimodal
Computing and Interaction
• Max Planck Institute for Computer Science
• Max Planck Institute for Software Systems
61. USAAR: WP4
WP4 Language technology,
domain ontologies and terminologies
Dr. Paul Schmidt
Chair of Machine Translation
in charge of scientific and
technical/technological
aspects
Prof. Elke Teich
Chair of English Linguistics
and Translation Science
in charge of administrative,
legal and financial aspects
José Manuel Martínez
research assistant
administration
62. USAAR: ESRs
Santanu Pal – ESR2
Investigation of an ideal translation
workflow for hybrid translation
approaches
India
B.Tech, Computer Science &
Engineering
Certification course on Linguistics
M.Tech, Computer Technology
Thesis: “Improved Alignment in
Statistical Machine Translation”
Liling Tan – ESR5
Use of terminologies and
ontologies to improve corpusbased approaches to translation
Singapore
BA in Linguistics
MA in Computational Linguistics
Thesis: Examining Crosslingual
Word Sense Disambiguation
64. University of Sheffield
Natural Language Processing Group
•
Since 1993
•
Areas: language resources and architectures (GATE), information access
•
Q&A, summarisation), foundational topics
•
Collaboration with Machine Learning and Speech groups
•
Newly created MT lab
Academics doing research on MT
• Lucia Specia
• Trevor Cohn
• Rob Gaizauskas
Other MT people
• 3 post-docs, 2 ESRs/PhD students, 5 PhD students
65. Projects and areas
of interest (I)
•
Modist (EPSRC): Modeling Discourse in Statistical
Translation
•
Barista (EPSRC): Non-Parametric Models of Phrase-based
Machine Translation
•
Expert (EU): EXPloiting Empirical appRoaches to Translation
•
QTLaunchpad (EU): Preparation and Launch of a LargeScale Action for Quality Translation Technology
66. Projects and areas
of interest (II)
•
SlaTr (Google): A Joint Model of Spoken Language
Translation
•
QuEst (PASCAL2 Harvest): Open source tool for MT
Quality Estimation
•
TaaS (EU): Terminology as a Service
•
ACCURAT (EU): Analysis and Evaluation of Comparable
Corpora for Under Resourced Areas of Machine Translation