SlideShare a Scribd company logo
- Development of Korean NLP
- Largely follows classical computaional
linguistics approaches
- NLP pipeline to downstream tasks
- Why & how Korean NLP studies have
become popular (at least domestic)
- Challenging syntax and morphology
(pro-drop, agglutinative)
- Non latin alphabet nor Kanji-oriented
writing system
- Motivation of model and corpus studies
of own language
Korean NLP Paradigm
Past and Ongoing Projects
- KAIST & NIKL (Corpora & treebanks)
- ExoBrain by ETRI (CL and QA)
- AI HUB by NIA (Nationwide datahub)
- LDC (Dataset catalogs)
- Open corpora by individuals &
organizations in academia/industry
(Downstream tasks & Benchmarks)
Revisiting Korean
Corpus Studies through
Technological Advances
Won Ik Cho
Sangwhan Moon
Youngsook Song
Seoul National University
Tokyo Institute of Technology
Sionic AI Inc.
Diachronic Overview of (Open) Korean Corpora (1990s – 2023)
• Large-scale, raw text to small, specific, annotated text
- Raw & annotated corpora, Treebanks (of syntactic area) to Downstream
tasks related to language understanding, and to Benchmarks (set of tasks)
• Token-annotated classification to document-annotated
classification and span/generation
- Model development tends to develop from categorical priority (statistical
models and perceptrons) to ones for translation/generation (PLMs, LLMs)
- Capacity of model becomes larger (lowering computation budget)
• Written or spoken (web) text to texts in various areas
- Flourishing data provide from web domain (e.g., social media) and
increasing demand from areas beyond linguistics (e.g., finance, law)
Trend of Korean Corpus Studies
- Large-scale corpora + Treebanks (1990s-2000s; Usually driven by institutes)
- Downstream tasks of NLP pipeline and parallel corpora (2000s-)
- Sentiment analysis, QA, entailment, and similarity datasets (2010s-)
- Dialogue studies, offensive language, societal bias and fairness (2020s-)
- Benchmark studies for PLMs + Pretraining corpora (2020s-)
- Benchmarks for LLM evaluation (2022-)
Our Repository for
Open Korean Corpora
Studies 

More Related Content

Similar to 2312 PACLIC

Ontologies Fmi 042010
Ontologies Fmi 042010Ontologies Fmi 042010
Ontologies Fmi 042010
Mariana Damova, Ph.D
 
OOR Architecture - Towards a Network of Linked Ontology Repositories
OOR Architecture - Towards a Network of Linked Ontology RepositoriesOOR Architecture - Towards a Network of Linked Ontology Repositories
OOR Architecture - Towards a Network of Linked Ontology Repositories
Kim Viljanen
 
The State of #NLProc
The State of #NLProcThe State of #NLProc
The State of #NLProc
Vsevolod Dyomkin
 
Update From OCLC Research May 2008
Update From OCLC Research May 2008Update From OCLC Research May 2008
Update From OCLC Research May 2008
Nancy Elkington
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Jimmy Lai
 
BT02.pptx
BT02.pptxBT02.pptx
BT02.pptx
ThAnhonc
 
Innovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLPInnovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLP
ariadnenetwork
 
Cwn aat talk
Cwn aat talkCwn aat talk
Cwn aat talk
AAT Taiwan
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
alessio_ferrari
 
Introduction about AAT-Taiwan Project
Introduction about AAT-Taiwan ProjectIntroduction about AAT-Taiwan Project
Introduction about AAT-Taiwan Project
AAT Taiwan
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
Sujit Pal
 
Using pedagogic corpora in ELT
Using pedagogic corpora in ELTUsing pedagogic corpora in ELT
Using pedagogic corpora in ELT
Pascual Pérez-Paredes
 
NJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryNJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary Repository
Gong Cheng
 
RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)
Vladimir Alexiev, PhD, PMP
 
Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13
Brian Ulicny
 
Ontology Engineering for Systems Engineering
Ontology Engineering for Systems EngineeringOntology Engineering for Systems Engineering
Ontology Engineering for Systems Engineering
Anatoly Levenchuk
 
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
Marcia Zeng
 
Sinmin Literature Review Presentation
Sinmin Literature Review PresentationSinmin Literature Review Presentation
Sinmin Literature Review Presentation
Chamila Wijayarathna
 
1910 JK27
1910 JK271910 JK27
1910 JK27
WarNik Chow
 
김문형-Curriculum Vitae
김문형-Curriculum Vitae김문형-Curriculum Vitae
김문형-Curriculum Vitae
Munhyong Kim
 

Similar to 2312 PACLIC (20)

Ontologies Fmi 042010
Ontologies Fmi 042010Ontologies Fmi 042010
Ontologies Fmi 042010
 
OOR Architecture - Towards a Network of Linked Ontology Repositories
OOR Architecture - Towards a Network of Linked Ontology RepositoriesOOR Architecture - Towards a Network of Linked Ontology Repositories
OOR Architecture - Towards a Network of Linked Ontology Repositories
 
The State of #NLProc
The State of #NLProcThe State of #NLProc
The State of #NLProc
 
Update From OCLC Research May 2008
Update From OCLC Research May 2008Update From OCLC Research May 2008
Update From OCLC Research May 2008
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
 
BT02.pptx
BT02.pptxBT02.pptx
BT02.pptx
 
Innovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLPInnovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLP
 
Cwn aat talk
Cwn aat talkCwn aat talk
Cwn aat talk
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
Introduction about AAT-Taiwan Project
Introduction about AAT-Taiwan ProjectIntroduction about AAT-Taiwan Project
Introduction about AAT-Taiwan Project
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
 
Using pedagogic corpora in ELT
Using pedagogic corpora in ELTUsing pedagogic corpora in ELT
Using pedagogic corpora in ELT
 
NJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryNJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary Repository
 
RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)
 
Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13Grosof haley-talk-semtech2013-ver6-10-13
Grosof haley-talk-semtech2013-ver6-10-13
 
Ontology Engineering for Systems Engineering
Ontology Engineering for Systems EngineeringOntology Engineering for Systems Engineering
Ontology Engineering for Systems Engineering
 
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
 
Sinmin Literature Review Presentation
Sinmin Literature Review PresentationSinmin Literature Review Presentation
Sinmin Literature Review Presentation
 
1910 JK27
1910 JK271910 JK27
1910 JK27
 
김문형-Curriculum Vitae
김문형-Curriculum Vitae김문형-Curriculum Vitae
김문형-Curriculum Vitae
 

More from WarNik Chow

2311 EAAMO
2311 EAAMO2311 EAAMO
2311 EAAMO
WarNik Chow
 
2211 HCOMP
2211 HCOMP2211 HCOMP
2211 HCOMP
WarNik Chow
 
2211 APSIPA
2211 APSIPA2211 APSIPA
2211 APSIPA
WarNik Chow
 
2211 AACL
2211 AACL2211 AACL
2211 AACL
WarNik Chow
 
2210 CODI
2210 CODI2210 CODI
2210 CODI
WarNik Chow
 
2206 FAccT_inperson
2206 FAccT_inperson2206 FAccT_inperson
2206 FAccT_inperson
WarNik Chow
 
2206 Modupop!
2206 Modupop!2206 Modupop!
2206 Modupop!
WarNik Chow
 
2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset
WarNik Chow
 
2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e
WarNik Chow
 
2106 PRSLLS
2106 PRSLLS2106 PRSLLS
2106 PRSLLS
WarNik Chow
 
2106 JWLLP
2106 JWLLP2106 JWLLP
2106 JWLLP
WarNik Chow
 
2106 ACM DIS
2106 ACM DIS2106 ACM DIS
2106 ACM DIS
WarNik Chow
 
2104 Talk @SSU
2104 Talk @SSU2104 Talk @SSU
2104 Talk @SSU
WarNik Chow
 
2103 ACM FAccT
2103 ACM FAccT2103 ACM FAccT
2103 ACM FAccT
WarNik Chow
 
2102 Redone seminar
2102 Redone seminar2102 Redone seminar
2102 Redone seminar
WarNik Chow
 
2011 NLP-OSS
2011 NLP-OSS2011 NLP-OSS
2011 NLP-OSS
WarNik Chow
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
WarNik Chow
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
WarNik Chow
 
2010 HCLT Hate Speech
2010 HCLT Hate Speech2010 HCLT Hate Speech
2010 HCLT Hate Speech
WarNik Chow
 
2009 DevC Seongnam - NLP
2009 DevC Seongnam - NLP2009 DevC Seongnam - NLP
2009 DevC Seongnam - NLP
WarNik Chow
 

More from WarNik Chow (20)

2311 EAAMO
2311 EAAMO2311 EAAMO
2311 EAAMO
 
2211 HCOMP
2211 HCOMP2211 HCOMP
2211 HCOMP
 
2211 APSIPA
2211 APSIPA2211 APSIPA
2211 APSIPA
 
2211 AACL
2211 AACL2211 AACL
2211 AACL
 
2210 CODI
2210 CODI2210 CODI
2210 CODI
 
2206 FAccT_inperson
2206 FAccT_inperson2206 FAccT_inperson
2206 FAccT_inperson
 
2206 Modupop!
2206 Modupop!2206 Modupop!
2206 Modupop!
 
2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset2204 Kakao talk on Hate speech dataset
2204 Kakao talk on Hate speech dataset
 
2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e2108 [LangCon2021] kosp2e
2108 [LangCon2021] kosp2e
 
2106 PRSLLS
2106 PRSLLS2106 PRSLLS
2106 PRSLLS
 
2106 JWLLP
2106 JWLLP2106 JWLLP
2106 JWLLP
 
2106 ACM DIS
2106 ACM DIS2106 ACM DIS
2106 ACM DIS
 
2104 Talk @SSU
2104 Talk @SSU2104 Talk @SSU
2104 Talk @SSU
 
2103 ACM FAccT
2103 ACM FAccT2103 ACM FAccT
2103 ACM FAccT
 
2102 Redone seminar
2102 Redone seminar2102 Redone seminar
2102 Redone seminar
 
2011 NLP-OSS
2011 NLP-OSS2011 NLP-OSS
2011 NLP-OSS
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
 
2010 HCLT Hate Speech
2010 HCLT Hate Speech2010 HCLT Hate Speech
2010 HCLT Hate Speech
 
2009 DevC Seongnam - NLP
2009 DevC Seongnam - NLP2009 DevC Seongnam - NLP
2009 DevC Seongnam - NLP
 

Recently uploaded

M44.pdf dairy management farm report of an
M44.pdf dairy management farm report of anM44.pdf dairy management farm report of an
M44.pdf dairy management farm report of an
ManjuBv2
 
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
bhupeshkumar0889
 
Biometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdfBiometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdf
Joel Ngushwai
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
sharonblush
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
kihus38
 
Supervised Learning (Data Science).pptx
Supervised Learning  (Data Science).pptxSupervised Learning  (Data Science).pptx
Supervised Learning (Data Science).pptx
TARIKU ENDALE
 
bai-tap-tieng-anh-lop-12-unit-4-the-mass-media (1).doc
bai-tap-tieng-anh-lop-12-unit-4-the-mass-media (1).docbai-tap-tieng-anh-lop-12-unit-4-the-mass-media (1).doc
bai-tap-tieng-anh-lop-12-unit-4-the-mass-media (1).doc
PhngThLmHnh
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
Kanchana Weerasinghe
 
Oracle Database Desupported Features on 23ai (Part A)
Oracle Database Desupported Features on 23ai (Part A)Oracle Database Desupported Features on 23ai (Part A)
Oracle Database Desupported Features on 23ai (Part A)
Alireza Kamrani
 
NPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension schemeNPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension scheme
ASISHSABAT3
 
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
revolutionary575
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
kinni singh$A17
 
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
tanupasswan6
 
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
janvikumar4133
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
dizzycaye
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
huseindihon
 
Harendra Singh, AI Strategy and Consulting Portfolio
Harendra Singh, AI Strategy and Consulting PortfolioHarendra Singh, AI Strategy and Consulting Portfolio
Harendra Singh, AI Strategy and Consulting Portfolio
harendmgr
 
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
birajmohan012
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
huseindihon
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata
 

Recently uploaded (20)

M44.pdf dairy management farm report of an
M44.pdf dairy management farm report of anM44.pdf dairy management farm report of an
M44.pdf dairy management farm report of an
 
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
 
Biometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdfBiometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdf
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
 
Supervised Learning (Data Science).pptx
Supervised Learning  (Data Science).pptxSupervised Learning  (Data Science).pptx
Supervised Learning (Data Science).pptx
 
bai-tap-tieng-anh-lop-12-unit-4-the-mass-media (1).doc
bai-tap-tieng-anh-lop-12-unit-4-the-mass-media (1).docbai-tap-tieng-anh-lop-12-unit-4-the-mass-media (1).doc
bai-tap-tieng-anh-lop-12-unit-4-the-mass-media (1).doc
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
 
Oracle Database Desupported Features on 23ai (Part A)
Oracle Database Desupported Features on 23ai (Part A)Oracle Database Desupported Features on 23ai (Part A)
Oracle Database Desupported Features on 23ai (Part A)
 
NPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension schemeNPS_Presentation_V3.pptx it is regarding National pension scheme
NPS_Presentation_V3.pptx it is regarding National pension scheme
 
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
 
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
 
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
 
Harendra Singh, AI Strategy and Consulting Portfolio
Harendra Singh, AI Strategy and Consulting PortfolioHarendra Singh, AI Strategy and Consulting Portfolio
Harendra Singh, AI Strategy and Consulting Portfolio
 
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
 

2312 PACLIC

  • 1. - Development of Korean NLP - Largely follows classical computaional linguistics approaches - NLP pipeline to downstream tasks - Why & how Korean NLP studies have become popular (at least domestic) - Challenging syntax and morphology (pro-drop, agglutinative) - Non latin alphabet nor Kanji-oriented writing system - Motivation of model and corpus studies of own language Korean NLP Paradigm Past and Ongoing Projects - KAIST & NIKL (Corpora & treebanks) - ExoBrain by ETRI (CL and QA) - AI HUB by NIA (Nationwide datahub) - LDC (Dataset catalogs) - Open corpora by individuals & organizations in academia/industry (Downstream tasks & Benchmarks) Revisiting Korean Corpus Studies through Technological Advances Won Ik Cho Sangwhan Moon Youngsook Song Seoul National University Tokyo Institute of Technology Sionic AI Inc. Diachronic Overview of (Open) Korean Corpora (1990s – 2023) • Large-scale, raw text to small, specific, annotated text - Raw & annotated corpora, Treebanks (of syntactic area) to Downstream tasks related to language understanding, and to Benchmarks (set of tasks) • Token-annotated classification to document-annotated classification and span/generation - Model development tends to develop from categorical priority (statistical models and perceptrons) to ones for translation/generation (PLMs, LLMs) - Capacity of model becomes larger (lowering computation budget) • Written or spoken (web) text to texts in various areas - Flourishing data provide from web domain (e.g., social media) and increasing demand from areas beyond linguistics (e.g., finance, law) Trend of Korean Corpus Studies - Large-scale corpora + Treebanks (1990s-2000s; Usually driven by institutes) - Downstream tasks of NLP pipeline and parallel corpora (2000s-) - Sentiment analysis, QA, entailment, and similarity datasets (2010s-) - Dialogue studies, offensive language, societal bias and fairness (2020s-) - Benchmark studies for PLMs + Pretraining corpora (2020s-) - Benchmarks for LLM evaluation (2022-) Our Repository for Open Korean Corpora Studies 