SlideShare a Scribd company logo
A	29-Year	Journey	of	Thai	NLP	
MT-ED-OSS-IR-DM-DT
Virach	Sornlertlamvanich
Sirindhorn International	Institute	of	Technology	(SIIT),	Thammasat University
virach@gmail.com
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
14			15
5-10-7-5-2	
RESEARCH
88			89 90			91 92			93 94			95 96			97 98			99 00			01 02			03 04			05 06			07 08			09 10			11 12			13
NEC/CICC
LINKS,	NECTEC
NLP,	Speech,	
Image,	e-Learning,	
OSS
NLP,	AWN,	IR,	OSS
Mobile	Application,	
Digitized	Thailand
RDI,	NECTEC
Machine	Translation
MT,	NLP
TCL,	NICT
IMA,	NECTEC
TPA/SIIT
NLP,	AI,	
Data	Mining,	
Big	Data,
SNS,	Deep	
Learning
TITECH
PGLR
① ② ③ ④ ⑤
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
SNLP
When	an	engineer	developed	a	
grammar	for	the	Thai	language
Font,	Encoding,	Input	method,	POS,	Dictionary,	Verb	pattern,	Grammar,	MT
① NEC/CICC	1988-1992
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Thai	Non-Logical	Order
• Non-logical	order	in	the	representation	of	consonant-vowel	
sequences.	Vowels	that	occur	to	the	left	side	of	their	consonant	are	
represented	in	visual	order	before	the	consonant	in	a	string,	even	
though	they	are	pronounced	afterward.	(Left-positioned	vowel	signs)
• Difficulty	in	Collation	(Sorting),	Grapheme	to	phoneme
Text โปรแกรม
Encoding U+0E42 U+0E1B U+0E23 U+0E41 U+0E01 U+0E23 U+0E21
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Zero-Width	Character	in	Thai
ที่อยู่ Base line
Consonant
Vowel sign (lower)
Vowel sign (upper)
Tone mark
Text ท ที ท่ ที่
Encoding U+0E17 U+0E35 U+0E48 U+0E17 U+0E35 U+0E48
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Store Display
X-TIS620
“อยู่” อ ย ยู ย่
CD C2 D9 E8
อ ย อู่
CD B0 C2 EA
TIS X-TIS
EA = B0 (base) + 38 ( อู ) + 02 ( อ่ )
0 0 0 1 0 0 1 0
0 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
0 0 0 1 1 0 1 1
0 0 1 1 0 1 1 0
0 0 1 0 1 1 0 1
0 0 1 1 1 1 1 1
อ็
อ่
อ้
อ๊
อ๋
อ์
อํ
อั
อิ
อี
อึ
อื
อุ
อู
0 1 0 0 0อฺ
“|อ|ยู่|”
Advantages
- More	than	1,000	code-points	
prepared	for	kerning	and	
rendering
- Internal	encoding	for	terminal	
text	wrapping
- Cursor	positioning
- Base	concept	for	TCC	
(Thai	Character	Cluster:- the	
smallest	unit	of	character	
cluster	according	to	the	spelling	
rules)	
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
TCC	(Thai	Character	Cluster)
• The	smallest	unit	of	character	cluster	according	to	the	spelling	rules.
• To	cluster	Thai	text	into	undividable	units.	Character	cluster	is	defined	
to	be	the	smallest	recognizable	unit.	The	character	string	is	clustered	
for	the	sake	of	avoiding	the	processing	of	invalid	Thai	character	units.
Examples of TCC
Pre-position: เ, แ, ไ, ใ, โ ⊕ C+
Post-position: C+ ⊕ ะ, า
Upper/Lower: ที่, มี, กุ, รู, …
Sound killer: ร์, ดิ์, ตร์, ทธิ์, ถุ์
Compound: เสร็จ, เหลือ, หน่วย
Leading char: หล่น, หนัง, หวะ, ไหล่
Diphthong: ครัว, อ้วน
Character: เ - ป - อ้ - า - ห - ม - า - ย
Cluster (TCC): เป้า - หมา - ย
Word: เป้าหมาย or เป้า - หมาย
Virach	Sornlertlamvanich	and	Tanaka	Hozumi.	The	Automatic	Extraction	of	Open	Compounds	from	Text	Corpora.	
Proceedings	of	the	16th	International	Conference	on	Computational	Linguistics	(COLING-96),	pp.	1143-1146,	Aug	1996.
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Implementation	(1991-)
•X-TIS	620	for	tterm	in	UNIX
•X	bitmap	fonts
•X	Consortium:	Thai	in	X11R6
•Thai	in	UNIX/Linux	applications
• Xfig
• Mule/GNU	Emacs:	SWATH,	LEXiTRON
• Xemacs:	X-TIS
• Mozilla:	LibInThai
• LaTeX:	Babel,	Omega
• National	fonts:	Kinnari,	Garuda,	Norasi
Free	developers
POS	Tagset
• 14	categories	(N,	PRON,	V,	AUX,	
DET,	ADV,	CLAS,	CONJ,	PREP,	INT,	
PREF,	END,	NEG,	PUNC)	and	47	
sub-categories
• VACT,	VSTA,	VATT
• Transitive,	Intransitive
• AUX
• Word	order
• S	vs	NP
• No	diff	in	some	cases
No. POS Description Example
1 NPRP Proper noun วินโดวส์ 95, โคโรน่า, โค้ก, พระอาทิตย์
2 NCNM Cardinal number หนึ่ง, สอง, สาม, 1, 2, 3
3 NONM Ordinal number ที่หนึ่ง, ที่สอง, ที่สาม, ที่1, ที่2, ที่3
4 NLBL Label noun 1, 2, 3, 4, ก, ข, a, b
5 NCMN Common noun หนังสือ, อาหาร, อาคาร, คน
6 NTTL Title noun ดร., พลเอก
7 PPRS Personal pronoun คุณ, เขา, ฉัน
8 PDMN Demonstrative pronoun นี่, นั่น, ที่นั่น, ที่นี่
9 PNTR Interrogative pronoun ใคร, อะไร, อย่างไร
10 PREL Relative pronoun ที่, ซื่ง, อัน, ผู้
11 VACT Active verb ทำงาน, ร้องเพลง, กิน
12 VSTA Stative verb เห็น, รู้, คือ
13 VATT Attributive verb อ้วน, ดี, สวย
14 XVBM Pre-verb auxiliary, before negator “ไม่” เกิด, เกือบ, กำลัง
15 XVAM Pre-verb auxiliary, after negator “ไม่” ค่อย, น่า, ได้
16 XVMM Pre-verb, before or after negator “ไม่” ควร, เคย, ต้อง
17 XVBB Pre-verb auxiliary, in imperative mood กรุณา, จง, เชิญ, อย่า, ห้าม
18 XVAE Post-verb auxiliary ไป, มา, ขึ้น
19 DDAN Definite determiner, after noun without
classifier in between
นี่, นั่น, โน่น, ทั้งหมด
20 DDAC Definite determiner, allowing classifier in
between
นี้, นั้น, โน้น, นู้น
21 DDBQ Definite determiner, between noun and
classifier or preceding quantitative
expression
ทั้ง, อีก, เพียง
22 DDAQ Definite determiner, following quantitative
expression
พอดี, ถ้วน
23 DIAC Indefinite determiner, following noun;
allowing classifier in between
ไหน, อื่น, ต่างๆ
24 DIBQ Indefinite determiner, between noun and
classifier or preceding quantitative
expression
บาง, ประมาณ, เกือบ
25 DIAQ Indefinite determiner, following
quantitative expression
กว่า, เศษ
26 DCNM Determiner, cardinal number expression หนึ่งคน, เสือ 2 ตัว
27 DONM Determiner, ordinal number expression ที่หนึ่ง, ที่สอง, ที่สุดท้าย
28 ADVN Adverb with normal form เก่ง, เร็ว, ช้า, สม่ำเสมอ
29 ADVI Adverb with iterative form เร็วๆ, เสมอๆ, ช้าๆ
30 ADVP Adverb with prefixed form โดยเร็ว
31 ADVS Sentential adverb โดยปกติ, ธรรมดา
32 CNIT Unit classifier ตัว, คน, เล่ม
33 CLTV Collective classifier คู่, กลุ่ม, ฝูง, เชิง, ทาง, ด้าน, แบบ, รุ่น
34 CMTR Measurement classifier กิโลกรัม, แก้ว, ชั่วโมง
35 CFQC Frequency classifier ครั้ง, เที่ยว
36 CVBL Verbal classifier ม้วน, มัด
37 JCRG Coordinating conjunction และ, หรือ, แต่
38 JCMP Comparative conjunction กว่า, เหมือนกับ, เท่ากับ
39 JSBR Subordinating conjunction เพราะว่า, เนื่องจาก, ที่, แม้ว่า, ถ้า
40 RPRE Preposition จาก, ละ, ของ, ใต้, บน
41 INT Interjection โอ้ย,โอ้, เออ, เอ๋, อ๋อ
42 FIXN Nominal prefix การทำงาน, ความสนุกสนาน
43 FIXV Adverbial prefix อย่างเร็ว
44 EAFF Ending for affirmative sentence จ๊ะ, จ้ะ, ค่ะ, ครับ, นะ, น่า, เถอะ
45 EITT Ending for interrogative sentence หรือ, เหรอ, ไหม, มั้ย
46 NEG Negator ไม่, มิได้, ไม่ได้, มิ
47 PUNC Punctuation (, ), “, ,, ;
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Virach	Sornlertlamvanich,	Naoto	Takahashi	and	Hitoshi	Isahara.	
Building	a	Thai	Part-Of-Speech	Tagged	Corpus	(ORCHID).	
The	Journal	of	the	Acoustical	Society	of	Japan	(E),	Vol.20,	No.3,	
pp	189-140,	May	1999.
Multi-lingual	Machine	Translation	Project	(MMT)
1987-1992	(+2)
• 6	years-project	(1987-1992)
• Interlingual approach	MMT	for	
CIJMT
• R&D
− Analysis
− Generation
− Dictionary
− Interlingua
− Integration	system
• Collaboration
− Thailand	(NECTEC,	CU,	KU,	KMUTT,	
KMITL)
− Japan	(NEC,	Fujitsu,	Hitachi,	OKI,	
Sharp,	Mitsubishi,	Toshiba)
− China,	Indonesia,	Malaysia
• 1969	Computerized	Alphabetization	of	
Thai
• 1974	Thai	Transliteration	System
• 1981	ARIANE	Project
− English-Thai	MT
− Ministry	of	University	Affairs	and	Grenoble	
Univ.
• 1986	Establishment	of	NECTEC	
• 1986	TIS620-2529
− Thai	Standard	Character	Code	for	Computer	by	
TISI
• 1987-92	(+2)	NECTEC-CICC	MMT	Project
• 1992-present	Establishment	of	LINKS	at	
NECTEC
− AI	R&D	Center	at	KMITT
− NAiST at	KU
− KIND	at	SIIT
− RDI	at	NECTEC
− SLS	at	CU,	….
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
MMT	Project
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Interlingua	in	MMT
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
NLP	Applications	and	Services
LEXiTRON,	Royal	Thai	Institute	Dictionary,	EZKey,	ParSit,	Sansarn
② LINKS/RD-I,	NECTEC	1993-2003
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
2537												2538												2539												2540													2541													2542												2543												2544							 2545
1994												1995												1996												1997													1998													1999												2000												2001							 2002
②
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
LEXiTRON
• LEXiTRON version	1.1
• Corpus-based	dictionary
• Dictionary	for	writing
• Launched	in	1995
• CD-ROM	for	Windows	3.1	Thai	
Edition
• Thai	11,000	entries
• English	9,000	entries
• 6	types	of	dictionaries
− General	word	entry
− Thai	usage	dictionary	(sample	
sentence)
− Synonym-Antonym
− Thai-English	(equivalent)
− Word	class
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Virach	Sornlertlamvanich,	Apichit Pittayaratsophon and	Kriangchai Chansaenwilai.	
Thai	Dictionary	Data	Base	Manipulation	using	Multi-indexed	Double	Array	Trie.	
The	5th	Annual	Conference,	NECTEC,	Bangkok.	pp.	197-206,	1993.	(in	Thai)
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Thai	Electronic	Dictionary
ORCHID	POS	Tagged	
Corpus
%TTitle: การประชุมทางวิชาการ ครั้งที่ 1
%ETitle: [1st Annual Conference]
%TAuthor:
%EAuthor:
%TInbook: การประชุมทางวิชาการ ครั้งที่ 1, โครงการวิจัยและพัฒนา
อิเล็กทรอนิกส์และคอมพิวเตอร์, ปีงบประมาณ 2531, เล่ม 1
%EInbook: The 1st Annual Conference, Electronics and
Computer Research and Development Project, Fiscal Year
1988, Book 1
%TPublisher: ศูนย์เทคโนโลยีอิเล็กทรอนิกส์และคอมพิวเตอร์
แห่งชาติ, กระทรวงวิทยาศาสตร์ เทคโนโลยีและการพลังงาน
%EPublisher: National Electronics and Computer
Technology Center, Ministry of Science, Technology and
Energy
%Page:
%Year: 1989
%File:
#P1
#1
การประชุมทางวิชาการ ครั้งที่ 1//
การ/FIXNป
ระชุม/VACT
ทาง/NCMN
วิชาการ/NCMN
<space>/PUNC
ครั้ง/CFQC
ที่ 1/DONM//
#2โครงการวิจัยและพัฒนาอิเล็กทรอนิกส์และคอมพิวเตอร์//
โครงการวิจัยและพัฒนา/NCMN
อิเล็กทรอนิกส์/NCMN
และ/JCRG
คอมพิวเตอร์/NCMN//
…
• ORCHID	Corpus	(1997)	supported	
by	CRL	Japan
• Source:	NECTEC	Technical	
Report
• Size:	160	documents;	5.75	MB;	
400K	words
• Tag:	XML	tagged	paragraph,	
sentence,	word,	part-of-
speech
• Availability:	for	research
• Difficulties
• Hard	to	find	consensus	in	the	
sentence	boundary, word	
boundary,	and	POS	tag
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Virach	Sornlertlamvanich,	Thatsanee Charoenporn and	Hitoshi	Isahara.	
ORCHID:	Thai	Part-Of-Speech	Tagged	Corpus.	Technical	Report	Orchid	
TR-NECTEC-1997-001,	NECTEC,	Thailand,	pp.	5-19,	Dec	1997.
Interlingua	English-Thai	MT
Concept	Composition	and	Decomposition
c#amaze
c#news c#i
objectimplement
this
c#cause
c#news
c#i
objectimplement
this
c#amazing
a-object
This news amazes me. ข่าวนี)ทําให้ฉันประหลาดใจ
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
English-Thai Web Translation
http://come.to/parsit
http://www.suparsit.com/
• 51,075 visits/month
•138,748 translation-pages/month
Term	Candidate	Extraction	for	Dictionary-less	
Search	Engine
• Virach	Sornlertlamvanich	et	al.	(COLING	2000)	:
- Automatic	Corpus-Based	Thai	Word	Extraction	with	the	C4.5	Learning	
Algorithm
- C4.5-trained	decision	tree	for	determining	potential	word	boundary	
from	MI,	Entropy	potential	word	boundary	from	MI,	Entropy	and	
some	linguistic	information
- Capable	of	discovering	new	words	in	document	without	assistance	
from	static	dictionary
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Virach	Sornlertlamvanich,	Tanapong Potipiti and	Thatsanee Charoenporn.	
Automatic	Corpus-based	Thai	Word	Extraction	with	the	C4.5	Learning	Algorithm.	
Proceedings	of	the	18th	International	Conference	on	Computational	Linguistics	(COLING2000),	
Saarbrucken,	Germany,	pp	802-807,	July-August	2000.
Attributes(1) : Left	and	Right	Mutual	Information
High	mutual	information	implies	that	xyz co-occurs	more	than	expected	
by	chance.	If	xyz is	a	word,	its MIL and MIR must	be	high.
…efunction…	and	...function...
x yz zxy
where
x is the leftmost character of string xyz
y is the middle substring of xyz
z is the rightmost character of string xyz
p( ) is the probability function.
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Attributes(2) : Left	and	Right	Entropy
Entropy	shows	the	variety	of	characters	before	and	after	a	word.	If y is	
a	word,	its	left	and	right	entropy	must	be	high.
...?function... and ...?unction...
where
x is the leftmost character of string xyz
y is the middle substring of xyz
z is the rightmost character of string xyz
p( ) is the probability function.
x y
y z
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
EZKey
%~
T/E
ฏ
ก
D โ
ด
F ฌ
เ
G
Shift
.of]dp68 computer vtwidh’jkpwxs,f_
ในโลกยุค computer อะไรก็ง่ายไปหมด_
The	Names
• LEXiTRON :-
Lexicon	+	Electron
• ParSit :-
Parse	it
• ORCHID	:-
Orchid	=	Ran	(蘭)
• Sansarn logo	:-
Frog	=	Return	of	happiness
カエルは“福帰る”,	幸運が還ってくる
• LinuxTLE,	OfficeTLE :-
TLE	=	Ta-Le	(Sea	series	Linux	distro)
Thai	Language	Extension
• Vaja :-
Speech
Smart-Q,	EZKey,
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Multi-lingualism
Language	Observatory,	Asian	WordNet
③ TCL,	NICT	2003-2008
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Collaboration	Project
Project
Year
03 04 05 06 07 08 09 10
Asian E-Learning Network (AEN), CICC
Language Observatory Project (LOP), NUT
Intercultural Collaboration Experiments (ICE), KU
Asian Language Resource Network (ALRN), NUT
Asian Language Resources (ALR), NEDO
World Network on Linguistics Diversity (REDILI), UNESCO
Open Standards Promotion, NECTEC, UNDP-APDIP
Asian applied nlp for linguistics Diversity and language
resource Development (ADD)
KuiSci: STKC Research Community for MOST
KuiPoll: Educational Community (BUU, NECTEC)
KuiHerb: Collective Herbal Information (SIL, PSU, NECTEC)
AsianWordNet: WordNet for Asian languages development and
sharing
XPLOG: Experience Log for Local Wisdom Collection
NLP tools and corpora web services
③
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
TCL’s	Computational	Lexicon:	Representativity
Constraint based
a conceptual class referring to the whole of which a given word X is a
partWhole-of (WOF)
a conceptual class specifying a part of a given word XPart-of (POF)
a word having the opposite meaning of a given word XNot-equal (NEQ)
a word having the same meaning as a given word XEqual (EQU)
a conceptual class of a given word XIs-a (ISA)
Value descriptionAttribute
Logical Constraints
Semantic Constraints
a point or period of time when an event occursTime (TIM)
a position or place where an event occursLocation (LOC)
an entity used in the actionInstrument (INS)
an entity affected by the actionObject (OBJ)
an entity initiating the actionAgent (AGT)
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Synset	Assignment	Algorithm	(CS=4)
l Accept	the	Synset that	includes	more	than	
one	English	Equivalent	with	confidence	
score 4.
L0
E0
S0Î
S1
Î
E1
Î
S2
Î
Example:
L0:	เป้าหมาย
E0:	aim
E1:	target
S0:	purpose,	intent,	intention,	aim,	design
S1:	aim,	object,	objective,	target
S2:	aim
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Synset	Assignment	Algorithm	(CS=3)
Example:
L0:	จ้อง
L1:	เพ่งมอง
E0: stare
E1: gaze
S0: stare
S1: gaze,	stare
Synonym
l Accept	the	Synset that	includes	more	than	
one	English	Equivalent	from	the	synonym	
of	the	target	language	with	confidence	
score	3.
L0 E0
S0Î
S1
Î
E1
Î
S2
ÎL1
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Synset	Assignment	Algorithm	(CS=2)
Example:
L0:	สูติแพทย์
E0:	obstetrician
S0:	obstetrician,	accoucheur
l Accept	the	only	Synset that	includes	the	
English	Equivalent	with	confidence	score	2. L0 E0 S0
Î
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Technical	
term
Synset	Assignment	Algorithm	(CS=1)
Example:
L0:	ช่อง
E0:	hole
E1:	canal
S0:	hole,	hollow		
S1:	hole,	trap,	cakehole,	maw,	yap,	gap
S2:	canal,	duct,	epithelial	duct,	channel
l Accept	more	than	one	Synset that	includes	
each	of	the	English	Equivalent	with	
confidence	score	1. L0
E0
S0Î
S1
Î
E1
S2
Î
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Common	
term
KUI
Correction
Voting
Lookup
Translation
Discussion
Addition
WN
GWN
AWN
X-English
X-English
X-English
Thai-English
X-English
X-English
X-English
Indonesian
-English
merged-WN
ML Applications
Dictionary
Ontology
CL-Search
MT
Summarization
IE/IR
….
Asian WordNet
Development
Process
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Asian	WordNet
http://www.asianwordnet.org/ • Asian	WordNet
• Visualization	of	Asian	WordNet
• Function
• Cross	language	visualization
• 3	modes	of	visualization
• Progress	(May	3,	2010)
• Burmese	
(19949	senses,	11006	u.	words)
• Indonesian	
(26175	senses,	24398	u.	words)
• Japanese	
(58447	senses,	64678	u.	words)
• Korean	
(42274	senses,	26009	u.	words)
• Lao	
(38890	senses,	44032	u.	words)
• Mongolian	
(1624	senses,	1574	u.	words)
• Nepali	
(41	senses,	42	u.	words)
• Sinhala	
(268	senses,	119	u.	words)
• Sudanese	
(69	senses,	52	u.	words)
• Thai	
(71139	senses,	69998	u.	words)
• Collaboration
• TCL
• ADD	members
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Digitalization
Linked	Open	Data,	Digitized	Thailand,	Thailand-1-Click
④ NECTEC	2009-2013
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Semantic	Link	Generation
•Semantic	Representation	of	the	description
•Keyword	Extraction
• Extract	keywords	in	text	documents	and	link	them	to	appropriate	
articles
•Semantic	Relation	Extraction
• Extract	commons	syntactic	patterns	between	two	keywords	and	
generalize	them	to	a	triple	(ei ,	rij ,	ej)
• Linked	Data
– Set	of	triple	(ei ,	rij ,	ej)
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Virach	Sornlertlamvanich	and	Canasai Kruengkrai.	
Effectiveness	of	Keyword	and	Semantic	Relation	Extraction	for	Knowledge	Map	Generation	,	
Proceedings	of	The	Second	International	Workshop	on	Worldwide	
Language	Service	Infrastructure	(WLSI),	Kyoto	University,	Kyoto,	Japan,	January	22-23,	2015.
Types	of	Semantic	Relation
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
description title
tag
Knowledge	Map
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Infobox
Knowledge	map
ISBUILTIN(พระเจดีย์กลางนํ)า, พ.ศ.2403)
ISLOCATEDAT(พระเจดีย์กลางนํ)า, ตําบลปากนํ)า)
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Infobox
Knowledge	map
Creator
Making
Product
Shop
Semantically	Enhanced
Cultural	Database
[Place,	Person,	Artifact]
Knowledging
Digital	Content	Technology
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Projects	in	Digitized	Thailand,	2009
• DT	PaaS on	the	Cloud
• Digitized	Thailand
(http://www.digitized-thailand.org/)
• Digitized	Lanna	
(http://www.digitized-lanna.com/)
• Digitized	Isan	
(http://www.digitized-isan.com/)
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Digitized	Thailand:	The	Ultimate	Goal
• DT	is	a	framework	for	collaboration	in	technology	and	content	
development
• DT	is	a	platform	for	digital	content	sharing
• Toward	creative	economy,	DT	PaaS will	be	established
Data,	Data,	Data
NLP,	Big	Data,	Deep	Learning,	Social	Computing,	IoT,	AI
⑤ SIIT	2014-…
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
NLP	Challenges
• Internet,	Big	Data,	Machine	Learning,	Deep	Learning have	brought	
along	the	possibilities.
Facebook:-
Adds	0.5	petabyte	(1015)	of	data	every	24	
hours
Twitter:-
Adds	340	million	tweets	per	day
Youtube:-
Adds	100	hours	of	new	videos	every	
minute
Germin8,	Social	Intelligence
The	Evolution	of	Communication
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
NLP	Challenges
Data	Community	DC	(DC2)
Bird	Steven,	Edward	Loper and	Ewan	Klein	(2009),	Natural	Language	Processing	with	Python.	O’Reilly	Media	Inc.
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Data	Data	Data!!!
• Drastically	increase	number	of	users	on	social	network
• Keywords	in	the	contents	express	
the	concepts	of	the	talk
• Social	media	texts	are	input	
in	a	time	sequence	
• But,	social	media	texts	
are	normally	short,	incomplete	
and	diverse
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
NLP,	Big	Data,	Deep	Learning,	Social	Computing,	IoT,	AI

More Related Content

More from Thammasat University, Musashino University

When AI becomes a data-driven machine, and digital is everywhere!
When AI becomes a data-driven machine, and digital is everywhere!When AI becomes a data-driven machine, and digital is everywhere!
When AI becomes a data-driven machine, and digital is everywhere!
Thammasat University, Musashino University
 
Shaping our AI (Strategy)?
Shaping our AI (Strategy)?Shaping our AI (Strategy)?
Shaping our AI (Strategy)?
Thammasat University, Musashino University
 
Siit digital-20171011
Siit digital-20171011Siit digital-20171011
How to make Thailand 4.0!!!
How to make Thailand 4.0!!!How to make Thailand 4.0!!!
How to make Thailand 4.0!!!
Thammasat University, Musashino University
 
AI, Big Data, IoT
AI, Big Data, IoTAI, Big Data, IoT
Traps and Opportunities in Digital Era
Traps and Opportunities in Digital EraTraps and Opportunities in Digital Era
Traps and Opportunities in Digital Era
Thammasat University, Musashino University
 
Global innovation-tj20151211
Global innovation-tj20151211Global innovation-tj20151211
Global innovation-tj20151211
Thammasat University, Musashino University
 
Management of japanese company virach
Management of japanese company virachManagement of japanese company virach
Management of japanese company virach
Thammasat University, Musashino University
 
RUN Digital Cluster 2017
RUN Digital Cluster 2017RUN Digital Cluster 2017
Paradigm Shift in Research and Education
Paradigm Shift in Research and EducationParadigm Shift in Research and Education
Paradigm Shift in Research and Education
Thammasat University, Musashino University
 
Trendy Technology and Social Media for EGAT Executive
Trendy Technology and Social Media for EGAT ExecutiveTrendy Technology and Social Media for EGAT Executive
Trendy Technology and Social Media for EGAT Executive
Thammasat University, Musashino University
 
Challenges of Thailand behind Thai industry
Challenges of Thailand behind Thai industryChallenges of Thailand behind Thai industry
Challenges of Thailand behind Thai industry
Thammasat University, Musashino University
 
Digital Economy based on Open Data and Open Access Approach โอเพนซอร์ส ภายใต้...
Digital Economy based on Open Data and Open Access Approach โอเพนซอร์ส ภายใต้...Digital Economy based on Open Data and Open Access Approach โอเพนซอร์ส ภายใต้...
Digital Economy based on Open Data and Open Access Approach โอเพนซอร์ส ภายใต้...
Thammasat University, Musashino University
 
Digital Economy, Digital Tourism based on Open Data and Open Access Approach
Digital Economy, Digital Tourism based on Open Data and Open Access ApproachDigital Economy, Digital Tourism based on Open Data and Open Access Approach
Digital Economy, Digital Tourism based on Open Data and Open Access Approach
Thammasat University, Musashino University
 

More from Thammasat University, Musashino University (15)

When AI becomes a data-driven machine, and digital is everywhere!
When AI becomes a data-driven machine, and digital is everywhere!When AI becomes a data-driven machine, and digital is everywhere!
When AI becomes a data-driven machine, and digital is everywhere!
 
Shaping our AI (Strategy)?
Shaping our AI (Strategy)?Shaping our AI (Strategy)?
Shaping our AI (Strategy)?
 
Siit digital-20171011
Siit digital-20171011Siit digital-20171011
Siit digital-20171011
 
How to make Thailand 4.0!!!
How to make Thailand 4.0!!!How to make Thailand 4.0!!!
How to make Thailand 4.0!!!
 
AI, Big Data, IoT
AI, Big Data, IoTAI, Big Data, IoT
AI, Big Data, IoT
 
Traps and Opportunities in Digital Era
Traps and Opportunities in Digital EraTraps and Opportunities in Digital Era
Traps and Opportunities in Digital Era
 
Creative Thinking
Creative ThinkingCreative Thinking
Creative Thinking
 
Global innovation-tj20151211
Global innovation-tj20151211Global innovation-tj20151211
Global innovation-tj20151211
 
Management of japanese company virach
Management of japanese company virachManagement of japanese company virach
Management of japanese company virach
 
RUN Digital Cluster 2017
RUN Digital Cluster 2017RUN Digital Cluster 2017
RUN Digital Cluster 2017
 
Paradigm Shift in Research and Education
Paradigm Shift in Research and EducationParadigm Shift in Research and Education
Paradigm Shift in Research and Education
 
Trendy Technology and Social Media for EGAT Executive
Trendy Technology and Social Media for EGAT ExecutiveTrendy Technology and Social Media for EGAT Executive
Trendy Technology and Social Media for EGAT Executive
 
Challenges of Thailand behind Thai industry
Challenges of Thailand behind Thai industryChallenges of Thailand behind Thai industry
Challenges of Thailand behind Thai industry
 
Digital Economy based on Open Data and Open Access Approach โอเพนซอร์ส ภายใต้...
Digital Economy based on Open Data and Open Access Approach โอเพนซอร์ส ภายใต้...Digital Economy based on Open Data and Open Access Approach โอเพนซอร์ส ภายใต้...
Digital Economy based on Open Data and Open Access Approach โอเพนซอร์ส ภายใต้...
 
Digital Economy, Digital Tourism based on Open Data and Open Access Approach
Digital Economy, Digital Tourism based on Open Data and Open Access ApproachDigital Economy, Digital Tourism based on Open Data and Open Access Approach
Digital Economy, Digital Tourism based on Open Data and Open Access Approach
 

Recently uploaded

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 

Recently uploaded (20)

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 

A 29-Year Journey of Thai NLP