Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Demonstration of the Online Local Assembly Minutes Search System

15 views

Published on

Workshop 5: Hansards as a dialectal resource, the Sixteenth International Conference on Methods in Dialectology (METHODS XVI), 11th 2017, Tachikawa

Published in: Science
  • Be the first to comment

  • Be the first to like this

Demonstration of the Online Local Assembly Minutes Search System

  1. 1. Demonstration of the Online Local Assembly Minutes Search System KEIICHI TAKAMARU
  2. 2. Local Assemblies in Japan 2 Prefecture City Specified districts Town and village Total 47 790 23 928 1,788 Number of local governments Prefecture City Specified districts Town and village Total 2,687 18,654 902 11,271 33,514 Number of representatives http://www.soumu.go.jp/senkyo/senkyo_s/data/syozoku/h27.html
  3. 3. Web publishing of local assembly minutes Rate of web publishing: 86% ◦ Prefectures: ◦ 100% ◦ Cities and specified districts: ◦ 100% ◦ Towns and villages ◦ 73% 3 Investigation by Otaru university of commerce (May 2016)
  4. 4. Plenary session and committee (photo:Nagano Prefecture) Plenary session ◦ Question to a head of local government ◦ Discussing and voting on a bill Committee ◦ Examining a bill before discussing and voting in a plenary session 4
  5. 5. Local assembly minutes on Utsunomiya city’s web sites 5 Year Meeting Name Month & Day Speaker’s Name
  6. 6. Utterances recorded in local assembly minutes Assembly minutes: ◦ transcription of utterances in a minutes Some expressions that make the minutes difficult to read are modified. ◦ Fillers /a:/ /e:/ ◦ Obvious slips of the tongue ◦ Obvious colloquial expressions ◦ Some dialectal expressions are modified as colloquial expressions. Local assembly minutes contain rich dialectal expressions. 6
  7. 7. Are assemblypersons native speakers? According to Takeyasu (2004): ◦ Survey of local representatives ◦ Number of responses: 16,844 ◦ Execution period: February to April, 2002 87.7% of representatives ◦ Live in their birth prefectures 75.5% of representatives ◦ Live in their birth cities (or towns or villages) 7 竹安栄子(2004)「地方議員のジェンダー差異 : 「2002年全国地方議員調査」結果の分析より」
  8. 8. Construction of a Corpus of Local Assembly Minutes 8 Utterance +Municipality, Date, Speaker 1)Scraping Local Assembly Minutes Published on the Web Politics, Sociology, Linguistics, Computer Science and so on Corpus of Local Assembly Minutes 2)Sharing the Corpus 3)Interdisciplinary Research Using the Corpus
  9. 9. Corpus of Local Assembly Minutes 9
  10. 10. Outline of the System Architecture Elasticsearch An open-source full-text search engineUtterance Index Indexing Web-based Visualization System ①Input a Search Query ④Visualized Output ②Request ③Response Corpus of Local Assembly Minutes 10
  11. 11. Corpus development project (2010 – 2014) Created scraping programs for four major assembly minutes web publication systems Collected as many assembly minutes from municipalities’ web sites as possible ◦ Number of municipalities: 425 ◦ Number of sentences: 125 million ◦ Number of characters: 8.1 billion 11
  12. 12. The amount of collected data from each prefecture 12 Thenumberofcollectedsentences Prefecture Thenumberofcollectedmunicipalities
  13. 13. Periods of collected data stored in the corpus 13 Thenumberofmunicipalities Collected periods (year)
  14. 14. Web-based visualization system 1. Full-text search for utterances 2. Context word extraction using KWIC (For prefectures only so far) 3. Map visualization with full-text search 4. Cross tabulation output with full-text search 14
  15. 15. Input of full-text search for utterances Input 原発 N-gram Exact match AND OR Search nuclear power plant 15 Export ▼
  16. 16. Output of full-text search Utterance Speaker’s Name Municipality Year Meeting Name Month & Day 16 ・ ・ ・ ・ ・ ・
  17. 17. Context word extraction using KWIC 17 (Only for minutes of prefecture assemblies) Node: nuclear power plant Preceding Context Following Context
  18. 18. Map visualization with full-text search 18 N-gram Exact match AND OR Input 原発 Frequency Rate nuclear power plant ×?
  19. 19. Cross tabulation output with full-text search 19 columns: YEAR rows:PREFECTURE Frequency of search results
  20. 20. Dialect study using map visualization with full-text search 20
  21. 21. Kanji compound “passing by” 「離合」 /rigoo/ 21
  22. 22. Honorific expression of “be doing” 「してみえる」 /shitemieru/ 22
  23. 23. Bring to an end 「終わす」 /owasu/ 23
  24. 24. “Especially” 「特にも」 /tokunimo/ 24
  25. 25. Second Person Pronouns 「あんた」 /aNta/(left), 「おめえ」 /omee/(right) Fumio Inoue(2016)「Appearance of Japanese Second Person Pronouns in Regional Conference Minutes」Meikai Japanese language journal (21), 1-16 25
  26. 26. Onomatopoeia 「ぴしゃっ」 /pishaQ/ 26
  27. 27. Onomatopoeia 「ばくっ」 /bakuQ/ 27
  28. 28. Demonstration of the online local assembly minutes search system ◦ A linguistic map is directly drawn from the corpus. ◦ It is a powerful tool which can discover geographical distribution of words and phrases. ◦ We can make use of rich data, however the amount of collected data is uneven. ◦ A balanced corpus is being prepared for strict statistical analysis. ◦ Municipality ◦ Assembly minutes for all prefectures/all prefectural capital cities ◦ Period ◦ 4 years (the term of a representative) 28

×