The document summarizes the AAT-Taiwan project which aims to develop a multilingual digital archive of cultural heritage terms for Taiwan. It provides an overview of the project background, current status including terminology translated to Chinese and linked images, and framework including localization processes, expert review, and new concept development. Issues addressed include equivalence mapping between terms and translation challenges.
Requirements Engineering: focus on Natural Language Processing, Lecture 2alessio_ferrari
In this lecture, we give a practical guide on how to detect ambiguities in natural language requirements by means of GATE and by means of Python. A brief guide to Python is also included.
The previous lecture gives an introduction to the problem of ambiguity in requirements engineering. Find it here: https://www.slideshare.net/alessio_ferrari/requirements-engineering-focus-on-natural-language-processing-lecture-1
Invited Talk at Summer School on Semantic Web, Bertinoro, 2015
Abstract:
Two decades ago one has discussed how to build seamless digital workflows
such that the medium for data in a workflow would not switch between paper, fax, phone,
and digital, because each transcription from one to another medium would
be laborious and cost-inefficient. Thus, the issue was avoiding *medium discontinuities*.
Today, we have all-digital data workflows, but we have still plenty of *semantic discontinuities*.
In this talk, I want first to describe reasons for this discontinuities including: autonomy of
data providers, need for agility and flexibility, or decentralized organizations in
the world-wide data spaces.
Then I want to describe several semantics discontinuities and some efforts to
ameliorate them by:
1. Semantic programming (Horizontal workflow paradigm)
2. Core ontologies (Vertical workflow paradigm)
3. Semantic data production and consumption (Sticky semantics)
Deep Natural Language Processing for Search and Recommender SystemsHuiji Gao
Tutorial for KDD 2019:
Search and recommender systems process rich natural language text data such as user queries and documents. Achieving high-quality search and recommendation results requires processing and understanding such information effectively and efficiently, where natural language processing (NLP) technologies are widely deployed. In recent years, the rapid development of deep learning models has been proven successful for improving various NLP tasks, indicating their great potential of promoting search and recommender systems.
In this tutorial, we summarize the current effort of deep learning for NLP in search/recommender systems. We first give an overview of search/recommender systems with NLP, then introduce basic concept of deep learning for NLP, covering state-of-the-art technologies in both language understanding and language generation. After that, we share our hands-on experience with LinkedIn applications. In the end, we highlight several important future trends.
Requirements Engineering: focus on Natural Language Processing, Lecture 2alessio_ferrari
In this lecture, we give a practical guide on how to detect ambiguities in natural language requirements by means of GATE and by means of Python. A brief guide to Python is also included.
The previous lecture gives an introduction to the problem of ambiguity in requirements engineering. Find it here: https://www.slideshare.net/alessio_ferrari/requirements-engineering-focus-on-natural-language-processing-lecture-1
Invited Talk at Summer School on Semantic Web, Bertinoro, 2015
Abstract:
Two decades ago one has discussed how to build seamless digital workflows
such that the medium for data in a workflow would not switch between paper, fax, phone,
and digital, because each transcription from one to another medium would
be laborious and cost-inefficient. Thus, the issue was avoiding *medium discontinuities*.
Today, we have all-digital data workflows, but we have still plenty of *semantic discontinuities*.
In this talk, I want first to describe reasons for this discontinuities including: autonomy of
data providers, need for agility and flexibility, or decentralized organizations in
the world-wide data spaces.
Then I want to describe several semantics discontinuities and some efforts to
ameliorate them by:
1. Semantic programming (Horizontal workflow paradigm)
2. Core ontologies (Vertical workflow paradigm)
3. Semantic data production and consumption (Sticky semantics)
Deep Natural Language Processing for Search and Recommender SystemsHuiji Gao
Tutorial for KDD 2019:
Search and recommender systems process rich natural language text data such as user queries and documents. Achieving high-quality search and recommendation results requires processing and understanding such information effectively and efficiently, where natural language processing (NLP) technologies are widely deployed. In recent years, the rapid development of deep learning models has been proven successful for improving various NLP tasks, indicating their great potential of promoting search and recommender systems.
In this tutorial, we summarize the current effort of deep learning for NLP in search/recommender systems. We first give an overview of search/recommender systems with NLP, then introduce basic concept of deep learning for NLP, covering state-of-the-art technologies in both language understanding and language generation. After that, we share our hands-on experience with LinkedIn applications. In the end, we highlight several important future trends.
Chingju Cheng(城菁汝), Sophy Chen(陳淑君)
Program Office (計畫辦公室)
Research and Development of Digital Archives and e-Learning Technologies Project (Division1: 數位技術研發與整合計畫)
International Collaboration and Promotion of Taiwan e-Learning & Digital Archives Program (Division 8: 海外推展暨國際合作計畫)
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...Lifeng (Aaron) Han
Presentation PPT in MT SUMMIT 2013.
Language-independent Model for Machine Translation Evaluation with Reinforced Factors
International Association for Machine Translation2013
Authors: Aaron Li-Feng Han, Derek Wong, Lidia S. Chao, Yervant Ho, Yi Lu, Anson Xing, Samuel Zeng
Proceedings of the 14th biennial International Conference of Machine Translation Summit (MT Summit 2013). Nice, France. 2 - 6 September 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor (Machine Translation Archive)
Presentation by Mariarita Cafulli, Head of the Consiglio Nazionale dei Dottori Commercialisti e degli Esperti Contabli Translation Department, at the World Congress of Accountants, in Rome, Italy, November 12, 2014
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
These are the slides for the technical briefing given at ICSE 2021, given by Alessio Ferrari, Liping Zhao, and Waad Alhoshan
It covers RE tasks to which NLP is applied, an overview of a recent systematic mapping study on the topic, and a hands-on tutorial on using transfer learning for requirements classification.
Please find the links to the colab notebooks here:
https://colab.research.google.com/drive/158H-lEJE1pc-xHc1ISBAKGDHMt_eg4Gn?usp=sharing
https://colab.research.google.com/d rive/1B_5ow3rvS0Qz1y-KyJtlMNnm gmx9w3kJ?usp=sharing
https://colab.research.google.com/d rive/1Xrm0gNaa41YwlM5g2CRYYX cRvpbDnTRT?usp=sharing
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools Lifeng (Aaron) Han
Abstract of Aaron Han’s Presentation
The main topic of this presentation will be the “evaluation of machine translation”. With the rapid development of machine translation (MT), the MT evaluation becomes more and more important to tell whether they make some progresses. The traditional human judgments are very time-consuming and expensive. On the other hand, there are some weaknesses in the existing automatic MT evaluation metrics:
– perform well in certain language pairs but weak on others, which we call the language-bias problem;
– consider no linguistic information (leading the metrics result in low correlation with human judgments) or too many linguistic features (difficult in replicability), which we call the extremism problem;
– design incomprehensive factors (e.g. precision only).
To address the existing problems, he has developed several automatic evaluation metrics:
– Design tunable parameters to address the language-bias problem;
– Use concise linguistic features for the linguistic extremism problem;
– Design augmented factors.
The experiments on ACL-WMT corpora show the proposed metrics yield higher correlation with human judgments. The proposed metrics have been published on international top conferences, e.g. COLING and MT SUMMIT. Actually speaking, the evaluation works are very related to the similarity measuring. So these works can be further developed into other literature, such as information retrieval, question and answering, searching, etc.
A brief introduction about some of his other researches will also be mentioned, such as Chinese named entity recognition, word segmentation, and multilingual treebanks, which have been published on Springer LNCS and LNAI series. Precious suggestions and comments are much appreciated. The opportunities of further corporation will be more exciting.
Chingju Cheng(城菁汝), Sophy Chen(陳淑君)
Program Office (計畫辦公室)
Research and Development of Digital Archives and e-Learning Technologies Project (Division1: 數位技術研發與整合計畫)
International Collaboration and Promotion of Taiwan e-Learning & Digital Archives Program (Division 8: 海外推展暨國際合作計畫)
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...Lifeng (Aaron) Han
Presentation PPT in MT SUMMIT 2013.
Language-independent Model for Machine Translation Evaluation with Reinforced Factors
International Association for Machine Translation2013
Authors: Aaron Li-Feng Han, Derek Wong, Lidia S. Chao, Yervant Ho, Yi Lu, Anson Xing, Samuel Zeng
Proceedings of the 14th biennial International Conference of Machine Translation Summit (MT Summit 2013). Nice, France. 2 - 6 September 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor (Machine Translation Archive)
Presentation by Mariarita Cafulli, Head of the Consiglio Nazionale dei Dottori Commercialisti e degli Esperti Contabli Translation Department, at the World Congress of Accountants, in Rome, Italy, November 12, 2014
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
These are the slides for the technical briefing given at ICSE 2021, given by Alessio Ferrari, Liping Zhao, and Waad Alhoshan
It covers RE tasks to which NLP is applied, an overview of a recent systematic mapping study on the topic, and a hands-on tutorial on using transfer learning for requirements classification.
Please find the links to the colab notebooks here:
https://colab.research.google.com/drive/158H-lEJE1pc-xHc1ISBAKGDHMt_eg4Gn?usp=sharing
https://colab.research.google.com/d rive/1B_5ow3rvS0Qz1y-KyJtlMNnm gmx9w3kJ?usp=sharing
https://colab.research.google.com/d rive/1Xrm0gNaa41YwlM5g2CRYYX cRvpbDnTRT?usp=sharing
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools Lifeng (Aaron) Han
Abstract of Aaron Han’s Presentation
The main topic of this presentation will be the “evaluation of machine translation”. With the rapid development of machine translation (MT), the MT evaluation becomes more and more important to tell whether they make some progresses. The traditional human judgments are very time-consuming and expensive. On the other hand, there are some weaknesses in the existing automatic MT evaluation metrics:
– perform well in certain language pairs but weak on others, which we call the language-bias problem;
– consider no linguistic information (leading the metrics result in low correlation with human judgments) or too many linguistic features (difficult in replicability), which we call the extremism problem;
– design incomprehensive factors (e.g. precision only).
To address the existing problems, he has developed several automatic evaluation metrics:
– Design tunable parameters to address the language-bias problem;
– Use concise linguistic features for the linguistic extremism problem;
– Design augmented factors.
The experiments on ACL-WMT corpora show the proposed metrics yield higher correlation with human judgments. The proposed metrics have been published on international top conferences, e.g. COLING and MT SUMMIT. Actually speaking, the evaluation works are very related to the similarity measuring. So these works can be further developed into other literature, such as information retrieval, question and answering, searching, etc.
A brief introduction about some of his other researches will also be mentioned, such as Chinese named entity recognition, word segmentation, and multilingual treebanks, which have been published on Springer LNCS and LNAI series. Precious suggestions and comments are much appreciated. The opportunities of further corporation will be more exciting.
How to expand your nlp solution to new languages using transfer learningLena Shakurova
Expanding NLP models to new languages typically involves annotating new data sets which is time and resource expensive. To reduce the costs one can use cross-lingual embeddings enabling knowledge transfer from languages with sufficient training data to low-resource languages. In this talk, you will hear about the challenges in learning cross-lingual embeddings for multilingual resume parsing.
Similar to Teldap4 getty multilingual vocab workshop2010 (20)
1. Overview of AAT-Taiwan Project Sophy Chen, Shi Lin * , Allison Peng, Diane Wu, and DT Lee Research Center for IT Innovation * Computing Centre Academia Sinica, Taiwan TELDAP AAT-Taiwan team Multilingual Vocabulary Project Workshop Getty Research Institute Los Angeles, California 8/23~8/26/2010
10. Progress Update last updated: 2010/8/20 34,000 terms in AAT 25,580 terms translated to Chinese 8,420 9,942 terms proofread 15,638 13,168 terms currently in AAT-Taiwan 452 terms verified by experts 77 linked to Wikipedia 17 terms contributed to AAT 1185 images linked to AAT-Taiwan
11. CURRENT STATUS Front-end Multilingual Search Chinese ( 內頗深 ), English ( pipa ), French ( famille rose ), Spanish ( Alenc,on lace ) Italian ( vermiglio, archi acuti ) External Link Link back to AAT, TELDAP Union Catalogue, and Muse Fusion Click here to go to website
12. CURRENT STATUS Back-end Click here to go to website Getty Note Translation TELDAP Note Preferred and None-preferred terms Source Images Ex. Repoussoir ( 內頗深 )
13.
14. Framework (SOP) New concepts with Chinese characters in Unicode, and Pinyin or Wade-Giles Chinese records & terms rendered with Chinese characters (Unicode)
15. Framework (Task-oriented) 1 2 3 4 GETTY AAT TELDAP AAT-Taiwan 在地化 | Localization Equivalence Mapping (M) Translation (T) System (S) 專家團隊 | Expert Group New Concept (N) Examination (E) 資料貢獻 (C) | Contribution Original AAT terms : Unicode, Pinyin, Wade-Giles New concepts from TELDAP : Unicode, Pinyin, Wade-Giles English scope note
18. Training: Unified terms and file format AAT-Taiwan Team Translator 1 2 3 4
19. Training: AAT-Taiwan translation forum Questions and discussions
20. Training: Discussion and Question Types We need to find solutions to the technical questions asked by the translators.
21. Translators Determine whether or not the translators are proficient enough to meet our demand Termination of employment
22. Translators Examine the fluency and accuracy of the translations of terms and scope notes. Frequently encountered problem types are as follows :
23. Proofreaders Each proofreader is assigned a workload of 200-250 terms per month; keep overseeing their performance and give guidance when needed. AAT-Taiwan Team Translator Proofreader The proofreaders are asked to highlight the problems they meet with in the translation sheets, so AAT-Taiwan staff can address them later in the review process. Proofread content ready for expert review
24. 1 LOCALIZATION Equivalence Mapping (M) Consult references to Identify the English term, then locate the term in AAT 在地化 | Localization Equivalence Mapping(M) Translation (T) System (S)
26. 1 LOCALIZATION System (s) Corresponding images Chinese characters in Unicode 在地化 | Localization Equivalence Mapping(M) Translation (T) System (S)
27. At least 3 authoritative references External links
28. EXPERT GROUP Examination (E) 2 After proofreading is completed, the experts receive the following files for review: 1) proofread translation files 2) content checklist 3) guidelines for examination 專家團隊 | Expert Group New Concept (N) Examination (E)
29.
30. The expert would make corrections directly on the translation sheet. Fill in the checklist accordingly
32. EXPERT GROUP New Concept (N) 2 After new scope note is written, the expert will receive the following files for review: 1) scope note worksheets 2) content checklist 3) guidelines for examination 專家團隊 | Expert Group New Concept (N) Examination (E)
33. Chinese scripts 故宮後設資料需求書 Metadata of National Palace Museum These terms are mapped to a broader term in AAT New Concept Team would create a new record with required fields, including : 1) source 2) note 3) related term 4) preferred term & non-preferred term, 5) Authoritative references 6) relevant images Add New Concepts
35. Example : Chinese Scripts scripts (writing) < 中國書體 > <Chinese scripts> 篆書 seal script 隸書 clerical script 楷書 standard script 行書 running script 草書 cursive script (1) 甲骨文 oracle bone script (2) 金文 bronze inscriptions (3) 大篆 great seal script (4) 小篆 small seal script (1) 秦隸 Qin clerical script (2) 漢隸 Han clerical script (1) 行楷 running-standard script (2) 行草 running-cursive script (1) 章草 draft cursive Script (2) 今草 modern cursive script (3) 狂草 wild cursive script <scripts by form> <Latin and Greek alphabet scripts> <Arabic scripts> 鳥蟲書 Bird-worm seal script
36. Example : Chinese Scripts Revised English Translation of the scope note
37. After the expert confirms the credibility and precision of the content, his/her name will be displayed on the term page. Link to Content Expert Link back to the expert’s homepage