The document discusses the history and state of the art of machine translation, including key figures and their views. It outlines different approaches to machine translation such as rule-based MT, statistical MT, example-based MT and their evolution over time. The document also describes different methods that have been used to evaluate machine translation systems, such as the BLEU metric.
Slides from the talk/presentation at "The Way Forward: the Future of Democracy in the Balkans" conference.
Regional Alumni (OSF/Chevening) Conference, March 24-27, 2011.
The podcast is soon available on iTunes.
Not/Networking, open access in developed and countries in transition SessionDr Danica Radovanovic
Slides from afternoon session< January 17th 2009, ScienceOnline09 conference, RTP, USA
After short intro [i know some of you are expecting *words* on the slides, well...], there was interactive discussion
Slides from the talk/presentation at "The Way Forward: the Future of Democracy in the Balkans" conference.
Regional Alumni (OSF/Chevening) Conference, March 24-27, 2011.
The podcast is soon available on iTunes.
Not/Networking, open access in developed and countries in transition SessionDr Danica Radovanovic
Slides from afternoon session< January 17th 2009, ScienceOnline09 conference, RTP, USA
After short intro [i know some of you are expecting *words* on the slides, well...], there was interactive discussion
Using Innoslate for Model-Based Systems EngineeringElizabeth Steiner
Dr. Steve Dam will walk you through the process of using Innoslate’s modeling and simulation capabilities while applying a MBSE methodology.
At its core, Innoslate is a full model-based systems engineering tool. Within Innoslate, system models are formalized and capable of simulation to derive cost, schedule, and performance data.
Your webinar will cover:
Functional modeling
Functional modeling is at the heart of how Innoslate derives new requirements and ensures logical accuracy.
Physical modeling
We can describe synthesizing the physical model in Innoslate with eight different diagrams, including the Asset Diagram, Layer Diagram, Block Definition Diagram, and Internal Block Diagram.
Executing a model
Innoslate includes a ‘Discrete Event Simulator’ to verify functional diagram’s logic, calculate cost, compute time, and quantify performance.
Relating Requirements to Diagrams
Requirements traceability ensures that the lifecycle and origin of a requirement is fully tracked. Innoslate includes relationship matrices to represent traceability relationships between entities in tabular view.
Requirements Generation
After modeling the system, often an engineer will derive textual requirements from the models by hand. Innoslate includes an automatic facility that generates requirements documents in a standard format (as outlined in “The Engineering Design of Systems: Models and Methods“).
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Lifeng (Aaron) Han
Starting from 1950s, Machine Translation (MT) was challenged from different scientific solutions which included rule-based methods, example-based and statistical models (SMT), to hybrid models, and very recent years the neural models (NMT).
While NMT has achieved a huge quality improvement in comparison to conventional methodologies, by taking advantages of huge amount of parallel corpora available from internet and the recently developed super computational power support with an acceptable cost, it struggles to achieve real human parity in many domains and most language pairs, if not all of them.
Alongside the long road of MT research and development, quality evaluation metrics played very important roles in MT advancement and evolution.
In this tutorial, we overview the traditional human judgement criteria, automatic evaluation metrics, unsupervised quality estimation models, as well as the meta-evaluation of the evaluation methods. Among these, we will also cover the very recent work in the MT evaluation (MTE) fields taking advantages of large size of pre-trained language models for automatic metric customisation towards exactly deployed language pairs and domains. In addition, we also introduce the statistical confidence estimation regarding sample size needed for human evaluation in real practice simulation.
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...Lifeng (Aaron) Han
Presentation PPT in MT SUMMIT 2013.
Language-independent Model for Machine Translation Evaluation with Reinforced Factors
International Association for Machine Translation2013
Authors: Aaron Li-Feng Han, Derek Wong, Lidia S. Chao, Yervant Ho, Yi Lu, Anson Xing, Samuel Zeng
Proceedings of the 14th biennial International Conference of Machine Translation Summit (MT Summit 2013). Nice, France. 2 - 6 September 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor (Machine Translation Archive)
Natural Language Understanding of Systems Engineering ArtifactsÁkos Horváth
This paper examines in close relation two fields of growing importance: model-based systems engineering (MBSE) and natural language processing (NLP). System models provide a structured description of engineering data, whose inherent semantics often remains hard to explore. Natural language understanding, (i.e., the machine analysis of texts produced by humans) an important field of NLP, focuses on semantic text comprehension but cannot directly account for structured information sources.
Using Machine Learning to aid Journalism at the New York TimesVivian S. Zhang
This talk was presented to NYC Open Data Meetup Group on Nov 11, 2014.
Speaker:
Daeil Kim is currently a data scientist at the Times and is finishing up his Ph.D at Brown University on work related to developing scalable inference algorithms for Bayesian Nonparametric models. His work at the Times spans a variety of problems related to the company's business interests, audience development, as well as developing tools to aid journalism.
Topic:
This talk will focus mostly on how machine learning can help problems that prop up in journalism. We'll begin first by talking about using popular supervised learning algorithms such as regularized Logistic Regression to help assist a journalist's work in uncovering insights into a story regarding the recall of Takata airbags in cars. Afterwards, we'll think about using topic modeling to deal with large document dumps generated from FOIA (Freedom of Information Act) requests and Refinery, a simple web based tool to ease the implementation of such tasks. Finally, if there is time, we will go over how topic models have been extended to assist in the problem of designing an efficient recommendation engine for text-based content.
Keynote presentation for the International Semantic Web Conference in Athens Greece, on November 9, 2023. The talk addresses the generative AI explosion and its potential impacts on the Semantic Web and Knowledge Graph communities and, in fact, may spark a research Renaissance.
Abstract:
We are living in an age of rapidly advancing technology. History may view this period as one in which generative artificial intelligence is seen as reshaping the landscape and narrative of many technology-based fields of research and application. Times of disruptions often present both opportunities and challenges. We will discuss some areas that may be ripe for consideration in the field of Semantic Web research and semantically-enabled applications. Semantic Web research has historically focused on representation and reasoning and enabling interoperability of data and vocabularies. At the core are ontologies along with ontology-enabled (or ontology-compatible) knowledge stores such as knowledge graphs. Ontologies are often manually constructed using a process that (1) identifies existing best practice ontologies (and vocabularies) and (2) generates a plan for how to leverage these ontologies by aligning and augmenting them as needed to address requirements. While semi-automated techniques may help, there is typically a significant portion of the work that is often best done by humans with domain and ontology expertise. This is an opportune time to rethink how the field generates, evolves, maintains, and evaluates ontologies. We consider how hybrid approaches, i.e., those that leverage generative AI components along with more traditional knowledge representation and reasoning approaches to create improved processes. The effort to build a robust ontology that meets a use case can be large. Ontologies are not static however and they need to evolve along with knowledge evolution and expanded usage. There is potential for hybrid approaches to help identify gaps in ontologies and/or refine content. Further, ontologies need to be documented with term definitions and their provenance. Opportunities exist to consider semi-automated techniques for some types of documentation, provenance, and decision rationale capture for annotating ontologies. The area of human-AI collaboration for population and verification presents a wide range of areas of research collaboration and impact. Ontologies need to be populated with class and relationship content. Knowledge graphs and other knowledge stores need to be populated with instance data in order to be used for question answering and reasoning. Population of large knowledge graphs can be time consuming. Generative AI holds the promise to create candidate knowledge graphs that are compatible with the ontology schema. The knowledge graph should contain provenance information identifying how the content was populated and its source and correctness and currency should be checked. A human-AI assistant approach is presented.
Las «tribus» #prerromanas que NO existieron: várdulos, carietes, autrigones, ...Joseba Abaitua
#Treviño nunca fue un punto trifinio; ni en #Trifinium se encontraban tres tribus. Presentación para el III Congreso de la Cátedra Luis Michelena (Vitoria-Gasteiz, 8-11 de octubre de 2012) basado en
Joseba Abaitua Odriozola y Mikel Unzueta Portilla (2013). El topónimo Treviño y la prevalencia de errores en historiografía lingüística. III Congreso de la Cátedra Luis Michelena. Ricardo Gómez, Joaquín Gorrochategui, Joseba A. Lakarra & Céline Mounole (arg./eds.) ISBN: 978-84-9860-911-0. pp. 3-25. Hay también un vídeo de la presentación que ofrece una versión parcial del trabajo.
http://blogs.tophistoria.com/trifinium/geografia-y-lexicografia-de-trifinium/
Using Innoslate for Model-Based Systems EngineeringElizabeth Steiner
Dr. Steve Dam will walk you through the process of using Innoslate’s modeling and simulation capabilities while applying a MBSE methodology.
At its core, Innoslate is a full model-based systems engineering tool. Within Innoslate, system models are formalized and capable of simulation to derive cost, schedule, and performance data.
Your webinar will cover:
Functional modeling
Functional modeling is at the heart of how Innoslate derives new requirements and ensures logical accuracy.
Physical modeling
We can describe synthesizing the physical model in Innoslate with eight different diagrams, including the Asset Diagram, Layer Diagram, Block Definition Diagram, and Internal Block Diagram.
Executing a model
Innoslate includes a ‘Discrete Event Simulator’ to verify functional diagram’s logic, calculate cost, compute time, and quantify performance.
Relating Requirements to Diagrams
Requirements traceability ensures that the lifecycle and origin of a requirement is fully tracked. Innoslate includes relationship matrices to represent traceability relationships between entities in tabular view.
Requirements Generation
After modeling the system, often an engineer will derive textual requirements from the models by hand. Innoslate includes an automatic facility that generates requirements documents in a standard format (as outlined in “The Engineering Design of Systems: Models and Methods“).
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Lifeng (Aaron) Han
Starting from 1950s, Machine Translation (MT) was challenged from different scientific solutions which included rule-based methods, example-based and statistical models (SMT), to hybrid models, and very recent years the neural models (NMT).
While NMT has achieved a huge quality improvement in comparison to conventional methodologies, by taking advantages of huge amount of parallel corpora available from internet and the recently developed super computational power support with an acceptable cost, it struggles to achieve real human parity in many domains and most language pairs, if not all of them.
Alongside the long road of MT research and development, quality evaluation metrics played very important roles in MT advancement and evolution.
In this tutorial, we overview the traditional human judgement criteria, automatic evaluation metrics, unsupervised quality estimation models, as well as the meta-evaluation of the evaluation methods. Among these, we will also cover the very recent work in the MT evaluation (MTE) fields taking advantages of large size of pre-trained language models for automatic metric customisation towards exactly deployed language pairs and domains. In addition, we also introduce the statistical confidence estimation regarding sample size needed for human evaluation in real practice simulation.
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...Lifeng (Aaron) Han
Presentation PPT in MT SUMMIT 2013.
Language-independent Model for Machine Translation Evaluation with Reinforced Factors
International Association for Machine Translation2013
Authors: Aaron Li-Feng Han, Derek Wong, Lidia S. Chao, Yervant Ho, Yi Lu, Anson Xing, Samuel Zeng
Proceedings of the 14th biennial International Conference of Machine Translation Summit (MT Summit 2013). Nice, France. 2 - 6 September 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor (Machine Translation Archive)
Natural Language Understanding of Systems Engineering ArtifactsÁkos Horváth
This paper examines in close relation two fields of growing importance: model-based systems engineering (MBSE) and natural language processing (NLP). System models provide a structured description of engineering data, whose inherent semantics often remains hard to explore. Natural language understanding, (i.e., the machine analysis of texts produced by humans) an important field of NLP, focuses on semantic text comprehension but cannot directly account for structured information sources.
Using Machine Learning to aid Journalism at the New York TimesVivian S. Zhang
This talk was presented to NYC Open Data Meetup Group on Nov 11, 2014.
Speaker:
Daeil Kim is currently a data scientist at the Times and is finishing up his Ph.D at Brown University on work related to developing scalable inference algorithms for Bayesian Nonparametric models. His work at the Times spans a variety of problems related to the company's business interests, audience development, as well as developing tools to aid journalism.
Topic:
This talk will focus mostly on how machine learning can help problems that prop up in journalism. We'll begin first by talking about using popular supervised learning algorithms such as regularized Logistic Regression to help assist a journalist's work in uncovering insights into a story regarding the recall of Takata airbags in cars. Afterwards, we'll think about using topic modeling to deal with large document dumps generated from FOIA (Freedom of Information Act) requests and Refinery, a simple web based tool to ease the implementation of such tasks. Finally, if there is time, we will go over how topic models have been extended to assist in the problem of designing an efficient recommendation engine for text-based content.
Keynote presentation for the International Semantic Web Conference in Athens Greece, on November 9, 2023. The talk addresses the generative AI explosion and its potential impacts on the Semantic Web and Knowledge Graph communities and, in fact, may spark a research Renaissance.
Abstract:
We are living in an age of rapidly advancing technology. History may view this period as one in which generative artificial intelligence is seen as reshaping the landscape and narrative of many technology-based fields of research and application. Times of disruptions often present both opportunities and challenges. We will discuss some areas that may be ripe for consideration in the field of Semantic Web research and semantically-enabled applications. Semantic Web research has historically focused on representation and reasoning and enabling interoperability of data and vocabularies. At the core are ontologies along with ontology-enabled (or ontology-compatible) knowledge stores such as knowledge graphs. Ontologies are often manually constructed using a process that (1) identifies existing best practice ontologies (and vocabularies) and (2) generates a plan for how to leverage these ontologies by aligning and augmenting them as needed to address requirements. While semi-automated techniques may help, there is typically a significant portion of the work that is often best done by humans with domain and ontology expertise. This is an opportune time to rethink how the field generates, evolves, maintains, and evaluates ontologies. We consider how hybrid approaches, i.e., those that leverage generative AI components along with more traditional knowledge representation and reasoning approaches to create improved processes. The effort to build a robust ontology that meets a use case can be large. Ontologies are not static however and they need to evolve along with knowledge evolution and expanded usage. There is potential for hybrid approaches to help identify gaps in ontologies and/or refine content. Further, ontologies need to be documented with term definitions and their provenance. Opportunities exist to consider semi-automated techniques for some types of documentation, provenance, and decision rationale capture for annotating ontologies. The area of human-AI collaboration for population and verification presents a wide range of areas of research collaboration and impact. Ontologies need to be populated with class and relationship content. Knowledge graphs and other knowledge stores need to be populated with instance data in order to be used for question answering and reasoning. Population of large knowledge graphs can be time consuming. Generative AI holds the promise to create candidate knowledge graphs that are compatible with the ontology schema. The knowledge graph should contain provenance information identifying how the content was populated and its source and correctness and currency should be checked. A human-AI assistant approach is presented.
Las «tribus» #prerromanas que NO existieron: várdulos, carietes, autrigones, ...Joseba Abaitua
#Treviño nunca fue un punto trifinio; ni en #Trifinium se encontraban tres tribus. Presentación para el III Congreso de la Cátedra Luis Michelena (Vitoria-Gasteiz, 8-11 de octubre de 2012) basado en
Joseba Abaitua Odriozola y Mikel Unzueta Portilla (2013). El topónimo Treviño y la prevalencia de errores en historiografía lingüística. III Congreso de la Cátedra Luis Michelena. Ricardo Gómez, Joaquín Gorrochategui, Joseba A. Lakarra & Céline Mounole (arg./eds.) ISBN: 978-84-9860-911-0. pp. 3-25. Hay también un vídeo de la presentación que ofrece una versión parcial del trabajo.
http://blogs.tophistoria.com/trifinium/geografia-y-lexicografia-de-trifinium/
Only 20% of titles are on bookshelves and these are separated from workplaces. But if over 1.5 million scientific papers are written annually, maybe libraries of the future have to design new ways of accessing scholarly documentation. In this doctoral course we will discuss how.
Sistema híbrido y cooperativo de traducción automáticaJoseba Abaitua
Para lenguas muy distintas los textos traducidos deben revisarse manualmente. Pero hay muchos textos traducidos por personas que no hace falta que traduzcan las máquinas. ¿Cómo optimizamos este proceso?
21. S tatus quaestionis: A ukerak: KMBT (interlingua) http://www.isi.edu/natural-language/projects/IL-Annot/ IL-Annot Interlingua Annotation Designing an Interlingua -- a neutral representation of text meaning that sits between languages and can be used to facilitate machine translation and other multi-lingual applications -- has been a dream of computational linguists for several decades. Despite progress on several fronts, a number of key phenomena stubbornly resist standardization. They include: case roles (can one list the most basic roles associated with each action, and each object?), aspect (for example, what does the continuous form in English mean?), discourse connectives (what is the general principle underlying "but" and "however", for example?), etc. This project is a collaboration with people from six universities around the country.