Embed presentation
Download to read offline

This document summarizes an effort to curate open Korean natural language datasets for global users. It provides an overview of 32 open Korean datasets across various criteria like documentation status, license for use and distribution. The curation is intended to be updated and maintained on arXiv and GitHub as a living document. It acknowledges original dataset creators and the Ko-NLP project for hosting the work. The overview aims to address the need for more openly available Korean datasets and resources to support non-Korean NLP researchers.
