It is already 29 years since I got involved in NLP research. It is almost the same period of the begin of NLP research in Thailand, especially for Thai language processing. Following the timeline, the slide shows the development of Thai NLP in terms of algorithm and language resource development.
1. Traps
• Middle income trap
• Aging society
• R&D trap
2. Challenges
• Thailand 4.0
• Programmer, Technologist
3. RUN Digital Platform
• Co-Research
• Resource Sharing
• New Business and Capacity Building
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
1. Traps
• Middle income trap
• Aging society
• R&D trap
2. Challenges
• Thailand 4.0
• Programmer, Technologist
3. RUN Digital Platform
• Co-Research
• Resource Sharing
• New Business and Capacity Building
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
9. POS Tagset
• 14 categories (N, PRON, V, AUX,
DET, ADV, CLAS, CONJ, PREP, INT,
PREF, END, NEG, PUNC) and 47
sub-categories
• VACT, VSTA, VATT
• Transitive, Intransitive
• AUX
• Word order
• S vs NP
• No diff in some cases
No. POS Description Example
1 NPRP Proper noun วินโดวส์ 95, โคโรน่า, โค้ก, พระอาทิตย์
2 NCNM Cardinal number หนึ่ง, สอง, สาม, 1, 2, 3
3 NONM Ordinal number ที่หนึ่ง, ที่สอง, ที่สาม, ที่1, ที่2, ที่3
4 NLBL Label noun 1, 2, 3, 4, ก, ข, a, b
5 NCMN Common noun หนังสือ, อาหาร, อาคาร, คน
6 NTTL Title noun ดร., พลเอก
7 PPRS Personal pronoun คุณ, เขา, ฉัน
8 PDMN Demonstrative pronoun นี่, นั่น, ที่นั่น, ที่นี่
9 PNTR Interrogative pronoun ใคร, อะไร, อย่างไร
10 PREL Relative pronoun ที่, ซื่ง, อัน, ผู้
11 VACT Active verb ทำงาน, ร้องเพลง, กิน
12 VSTA Stative verb เห็น, รู้, คือ
13 VATT Attributive verb อ้วน, ดี, สวย
14 XVBM Pre-verb auxiliary, before negator “ไม่” เกิด, เกือบ, กำลัง
15 XVAM Pre-verb auxiliary, after negator “ไม่” ค่อย, น่า, ได้
16 XVMM Pre-verb, before or after negator “ไม่” ควร, เคย, ต้อง
17 XVBB Pre-verb auxiliary, in imperative mood กรุณา, จง, เชิญ, อย่า, ห้าม
18 XVAE Post-verb auxiliary ไป, มา, ขึ้น
19 DDAN Definite determiner, after noun without
classifier in between
นี่, นั่น, โน่น, ทั้งหมด
20 DDAC Definite determiner, allowing classifier in
between
นี้, นั้น, โน้น, นู้น
21 DDBQ Definite determiner, between noun and
classifier or preceding quantitative
expression
ทั้ง, อีก, เพียง
22 DDAQ Definite determiner, following quantitative
expression
พอดี, ถ้วน
23 DIAC Indefinite determiner, following noun;
allowing classifier in between
ไหน, อื่น, ต่างๆ
24 DIBQ Indefinite determiner, between noun and
classifier or preceding quantitative
expression
บาง, ประมาณ, เกือบ
25 DIAQ Indefinite determiner, following
quantitative expression
กว่า, เศษ
26 DCNM Determiner, cardinal number expression หนึ่งคน, เสือ 2 ตัว
27 DONM Determiner, ordinal number expression ที่หนึ่ง, ที่สอง, ที่สุดท้าย
28 ADVN Adverb with normal form เก่ง, เร็ว, ช้า, สม่ำเสมอ
29 ADVI Adverb with iterative form เร็วๆ, เสมอๆ, ช้าๆ
30 ADVP Adverb with prefixed form โดยเร็ว
31 ADVS Sentential adverb โดยปกติ, ธรรมดา
32 CNIT Unit classifier ตัว, คน, เล่ม
33 CLTV Collective classifier คู่, กลุ่ม, ฝูง, เชิง, ทาง, ด้าน, แบบ, รุ่น
34 CMTR Measurement classifier กิโลกรัม, แก้ว, ชั่วโมง
35 CFQC Frequency classifier ครั้ง, เที่ยว
36 CVBL Verbal classifier ม้วน, มัด
37 JCRG Coordinating conjunction และ, หรือ, แต่
38 JCMP Comparative conjunction กว่า, เหมือนกับ, เท่ากับ
39 JSBR Subordinating conjunction เพราะว่า, เนื่องจาก, ที่, แม้ว่า, ถ้า
40 RPRE Preposition จาก, ละ, ของ, ใต้, บน
41 INT Interjection โอ้ย,โอ้, เออ, เอ๋, อ๋อ
42 FIXN Nominal prefix การทำงาน, ความสนุกสนาน
43 FIXV Adverbial prefix อย่างเร็ว
44 EAFF Ending for affirmative sentence จ๊ะ, จ้ะ, ค่ะ, ครับ, นะ, น่า, เถอะ
45 EITT Ending for interrogative sentence หรือ, เหรอ, ไหม, มั้ย
46 NEG Negator ไม่, มิได้, ไม่ได้, มิ
47 PUNC Punctuation (, ), “, ,, ;
28 August 2017, ISAI-NLP 2017, Hua Hin, Thailand
Virach Sornlertlamvanich, Naoto Takahashi and Hitoshi Isahara.
Building a Thai Part-Of-Speech Tagged Corpus (ORCHID).
The Journal of the Acoustical Society of Japan (E), Vol.20, No.3,
pp 189-140, May 1999.
10. Multi-lingual Machine Translation Project (MMT)
1987-1992 (+2)
• 6 years-project (1987-1992)
• Interlingual approach MMT for
CIJMT
• R&D
− Analysis
− Generation
− Dictionary
− Interlingua
− Integration system
• Collaboration
− Thailand (NECTEC, CU, KU, KMUTT,
KMITL)
− Japan (NEC, Fujitsu, Hitachi, OKI,
Sharp, Mitsubishi, Toshiba)
− China, Indonesia, Malaysia
• 1969 Computerized Alphabetization of
Thai
• 1974 Thai Transliteration System
• 1981 ARIANE Project
− English-Thai MT
− Ministry of University Affairs and Grenoble
Univ.
• 1986 Establishment of NECTEC
• 1986 TIS620-2529
− Thai Standard Character Code for Computer by
TISI
• 1987-92 (+2) NECTEC-CICC MMT Project
• 1992-present Establishment of LINKS at
NECTEC
− AI R&D Center at KMITT
− NAiST at KU
− KIND at SIIT
− RDI at NECTEC
− SLS at CU, ….
28 August 2017, ISAI-NLP 2017, Hua Hin, Thailand
15. LEXiTRON
• LEXiTRON version 1.1
• Corpus-based dictionary
• Dictionary for writing
• Launched in 1995
• CD-ROM for Windows 3.1 Thai
Edition
• Thai 11,000 entries
• English 9,000 entries
• 6 types of dictionaries
− General word entry
− Thai usage dictionary (sample
sentence)
− Synonym-Antonym
− Thai-English (equivalent)
− Word class
28 August 2017, ISAI-NLP 2017, Hua Hin, Thailand
Virach Sornlertlamvanich, Apichit Pittayaratsophon and Kriangchai Chansaenwilai.
Thai Dictionary Data Base Manipulation using Multi-indexed Double Array Trie.
The 5th Annual Conference, NECTEC, Bangkok. pp. 197-206, 1993. (in Thai)
17. ORCHID POS Tagged
Corpus
%TTitle: การประชุมทางวิชาการ ครั้งที่ 1
%ETitle: [1st Annual Conference]
%TAuthor:
%EAuthor:
%TInbook: การประชุมทางวิชาการ ครั้งที่ 1, โครงการวิจัยและพัฒนา
อิเล็กทรอนิกส์และคอมพิวเตอร์, ปีงบประมาณ 2531, เล่ม 1
%EInbook: The 1st Annual Conference, Electronics and
Computer Research and Development Project, Fiscal Year
1988, Book 1
%TPublisher: ศูนย์เทคโนโลยีอิเล็กทรอนิกส์และคอมพิวเตอร์
แห่งชาติ, กระทรวงวิทยาศาสตร์ เทคโนโลยีและการพลังงาน
%EPublisher: National Electronics and Computer
Technology Center, Ministry of Science, Technology and
Energy
%Page:
%Year: 1989
%File:
#P1
#1
การประชุมทางวิชาการ ครั้งที่ 1//
การ/FIXNป
ระชุม/VACT
ทาง/NCMN
วิชาการ/NCMN
<space>/PUNC
ครั้ง/CFQC
ที่ 1/DONM//
#2โครงการวิจัยและพัฒนาอิเล็กทรอนิกส์และคอมพิวเตอร์//
โครงการวิจัยและพัฒนา/NCMN
อิเล็กทรอนิกส์/NCMN
และ/JCRG
คอมพิวเตอร์/NCMN//
…
• ORCHID Corpus (1997) supported
by CRL Japan
• Source: NECTEC Technical
Report
• Size: 160 documents; 5.75 MB;
400K words
• Tag: XML tagged paragraph,
sentence, word, part-of-
speech
• Availability: for research
• Difficulties
• Hard to find consensus in the
sentence boundary, word
boundary, and POS tag
28 August 2017, ISAI-NLP 2017, Hua Hin, Thailand
Virach Sornlertlamvanich, Thatsanee Charoenporn and Hitoshi Isahara.
ORCHID: Thai Part-Of-Speech Tagged Corpus. Technical Report Orchid
TR-NECTEC-1997-001, NECTEC, Thailand, pp. 5-19, Dec 1997.
20. Term Candidate Extraction for Dictionary-less
Search Engine
• Virach Sornlertlamvanich et al. (COLING 2000) :
- Automatic Corpus-Based Thai Word Extraction with the C4.5 Learning
Algorithm
- C4.5-trained decision tree for determining potential word boundary
from MI, Entropy potential word boundary from MI, Entropy and
some linguistic information
- Capable of discovering new words in document without assistance
from static dictionary
28 August 2017, ISAI-NLP 2017, Hua Hin, Thailand
Virach Sornlertlamvanich, Tanapong Potipiti and Thatsanee Charoenporn.
Automatic Corpus-based Thai Word Extraction with the C4.5 Learning Algorithm.
Proceedings of the 18th International Conference on Computational Linguistics (COLING2000),
Saarbrucken, Germany, pp 802-807, July-August 2000.
21. Attributes(1) : Left and Right Mutual Information
High mutual information implies that xyz co-occurs more than expected
by chance. If xyz is a word, its MIL and MIR must be high.
…efunction… and ...function...
x yz zxy
where
x is the leftmost character of string xyz
y is the middle substring of xyz
z is the rightmost character of string xyz
p( ) is the probability function.
28 August 2017, ISAI-NLP 2017, Hua Hin, Thailand
26. The Names
• LEXiTRON :-
Lexicon + Electron
• ParSit :-
Parse it
• ORCHID :-
Orchid = Ran (蘭)
• Sansarn logo :-
Frog = Return of happiness
カエルは“福帰る”, 幸運が還ってくる
• LinuxTLE, OfficeTLE :-
TLE = Ta-Le (Sea series Linux distro)
Thai Language Extension
• Vaja :-
Speech
Smart-Q, EZKey,
28 August 2017, ISAI-NLP 2017, Hua Hin, Thailand
28. Collaboration Project
Project
Year
03 04 05 06 07 08 09 10
Asian E-Learning Network (AEN), CICC
Language Observatory Project (LOP), NUT
Intercultural Collaboration Experiments (ICE), KU
Asian Language Resource Network (ALRN), NUT
Asian Language Resources (ALR), NEDO
World Network on Linguistics Diversity (REDILI), UNESCO
Open Standards Promotion, NECTEC, UNDP-APDIP
Asian applied nlp for linguistics Diversity and language
resource Development (ADD)
KuiSci: STKC Research Community for MOST
KuiPoll: Educational Community (BUU, NECTEC)
KuiHerb: Collective Herbal Information (SIL, PSU, NECTEC)
AsianWordNet: WordNet for Asian languages development and
sharing
XPLOG: Experience Log for Local Wisdom Collection
NLP tools and corpora web services
③
28 August 2017, ISAI-NLP 2017, Hua Hin, Thailand
29. 28 August 2017, ISAI-NLP 2017, Hua Hin, Thailand
TCL’s Computational Lexicon: Representativity
Constraint based
a conceptual class referring to the whole of which a given word X is a
partWhole-of (WOF)
a conceptual class specifying a part of a given word XPart-of (POF)
a word having the opposite meaning of a given word XNot-equal (NEQ)
a word having the same meaning as a given word XEqual (EQU)
a conceptual class of a given word XIs-a (ISA)
Value descriptionAttribute
Logical Constraints
Semantic Constraints
a point or period of time when an event occursTime (TIM)
a position or place where an event occursLocation (LOC)
an entity used in the actionInstrument (INS)
an entity affected by the actionObject (OBJ)
an entity initiating the actionAgent (AGT)
36. Asian WordNet
http://www.asianwordnet.org/ • Asian WordNet
• Visualization of Asian WordNet
• Function
• Cross language visualization
• 3 modes of visualization
• Progress (May 3, 2010)
• Burmese
(19949 senses, 11006 u. words)
• Indonesian
(26175 senses, 24398 u. words)
• Japanese
(58447 senses, 64678 u. words)
• Korean
(42274 senses, 26009 u. words)
• Lao
(38890 senses, 44032 u. words)
• Mongolian
(1624 senses, 1574 u. words)
• Nepali
(41 senses, 42 u. words)
• Sinhala
(268 senses, 119 u. words)
• Sudanese
(69 senses, 52 u. words)
• Thai
(71139 senses, 69998 u. words)
• Collaboration
• TCL
• ADD members
28 August 2017, ISAI-NLP 2017, Hua Hin, Thailand