Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Demonstration of the Online
Local Assembly Minutes
Search System
KEIICHI TAKAMARU
Local Assemblies in Japan
2
Prefecture City
Specified
districts
Town and
village
Total
47 790 23 928 1,788
Number of local...
Web publishing of
local assembly minutes
Rate of web publishing: 86%
◦ Prefectures:
◦ 100%
◦ Cities and specified district...
Plenary session and committee
(photo:Nagano Prefecture)
Plenary session
◦ Question to a head of local
government
◦ Discuss...
Local assembly minutes on
Utsunomiya city’s web sites
5
Year
Meeting Name
Month & Day
Speaker’s Name
Utterances recorded
in local assembly minutes
Assembly minutes:
◦ transcription of utterances in a minutes
Some expression...
Are assemblypersons
native speakers?
According to Takeyasu (2004):
◦ Survey of local representatives
◦ Number of responses...
Construction of a Corpus
of Local Assembly Minutes
8
Utterance
+Municipality, Date, Speaker
1)Scraping Local
Assembly Minu...
Corpus of Local
Assembly Minutes
9
Outline of the System Architecture
Elasticsearch
An open-source
full-text search engineUtterance
Index
Indexing
Web-based
...
Corpus development project
(2010 – 2014)
Created scraping programs for four major
assembly minutes web publication systems...
The amount of collected data
from each prefecture
12
Thenumberofcollectedsentences
Prefecture
Thenumberofcollectedmunicipa...
Periods of collected data
stored in the corpus
13
Thenumberofmunicipalities
Collected periods (year)
Web-based visualization system
1. Full-text search for utterances
2. Context word extraction using KWIC
(For prefectures o...
Input of full-text search
for utterances
Input 原発
N-gram Exact match AND OR
Search
nuclear power plant
15
Export ▼
Output of full-text search
Utterance
Speaker’s Name Municipality Year Meeting Name Month & Day
16
・
・
・
・
・
・
Context word extraction using KWIC
17
(Only for minutes of prefecture assemblies)
Node:
nuclear power plant
Preceding Cont...
Map visualization with full-text search
18
N-gram Exact match AND OR
Input 原発
Frequency Rate
nuclear power plant
×?
Cross tabulation output
with full-text search
19
columns: YEAR
rows:PREFECTURE
Frequency of
search results
Dialect study using map
visualization with full-text search
20
Kanji compound “passing by”
「離合」 /rigoo/
21
Honorific expression of “be doing”
「してみえる」 /shitemieru/
22
Bring to an end
「終わす」 /owasu/
23
“Especially”
「特にも」 /tokunimo/
24
Second Person Pronouns
「あんた」 /aNta/(left), 「おめえ」 /omee/(right)
Fumio Inoue(2016)「Appearance of Japanese Second Person Pron...
Onomatopoeia
「ぴしゃっ」 /pishaQ/
26
Onomatopoeia
「ばくっ」 /bakuQ/
27
Demonstration of the online local
assembly minutes search system
◦ A linguistic map is directly drawn from the corpus.
◦ I...
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Demonstration of the Online Local Assembly Minutes Search System

Download to read offline

Workshop 5: Hansards as a dialectal resource, the Sixteenth International Conference on Methods in Dialectology (METHODS XVI), 11th 2017, Tachikawa

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Demonstration of the Online Local Assembly Minutes Search System

  1. 1. Demonstration of the Online Local Assembly Minutes Search System KEIICHI TAKAMARU
  2. 2. Local Assemblies in Japan 2 Prefecture City Specified districts Town and village Total 47 790 23 928 1,788 Number of local governments Prefecture City Specified districts Town and village Total 2,687 18,654 902 11,271 33,514 Number of representatives http://www.soumu.go.jp/senkyo/senkyo_s/data/syozoku/h27.html
  3. 3. Web publishing of local assembly minutes Rate of web publishing: 86% ◦ Prefectures: ◦ 100% ◦ Cities and specified districts: ◦ 100% ◦ Towns and villages ◦ 73% 3 Investigation by Otaru university of commerce (May 2016)
  4. 4. Plenary session and committee (photo:Nagano Prefecture) Plenary session ◦ Question to a head of local government ◦ Discussing and voting on a bill Committee ◦ Examining a bill before discussing and voting in a plenary session 4
  5. 5. Local assembly minutes on Utsunomiya city’s web sites 5 Year Meeting Name Month & Day Speaker’s Name
  6. 6. Utterances recorded in local assembly minutes Assembly minutes: ◦ transcription of utterances in a minutes Some expressions that make the minutes difficult to read are modified. ◦ Fillers /a:/ /e:/ ◦ Obvious slips of the tongue ◦ Obvious colloquial expressions ◦ Some dialectal expressions are modified as colloquial expressions. Local assembly minutes contain rich dialectal expressions. 6
  7. 7. Are assemblypersons native speakers? According to Takeyasu (2004): ◦ Survey of local representatives ◦ Number of responses: 16,844 ◦ Execution period: February to April, 2002 87.7% of representatives ◦ Live in their birth prefectures 75.5% of representatives ◦ Live in their birth cities (or towns or villages) 7 竹安栄子(2004)「地方議員のジェンダー差異 : 「2002年全国地方議員調査」結果の分析より」
  8. 8. Construction of a Corpus of Local Assembly Minutes 8 Utterance +Municipality, Date, Speaker 1)Scraping Local Assembly Minutes Published on the Web Politics, Sociology, Linguistics, Computer Science and so on Corpus of Local Assembly Minutes 2)Sharing the Corpus 3)Interdisciplinary Research Using the Corpus
  9. 9. Corpus of Local Assembly Minutes 9
  10. 10. Outline of the System Architecture Elasticsearch An open-source full-text search engineUtterance Index Indexing Web-based Visualization System ①Input a Search Query ④Visualized Output ②Request ③Response Corpus of Local Assembly Minutes 10
  11. 11. Corpus development project (2010 – 2014) Created scraping programs for four major assembly minutes web publication systems Collected as many assembly minutes from municipalities’ web sites as possible ◦ Number of municipalities: 425 ◦ Number of sentences: 125 million ◦ Number of characters: 8.1 billion 11
  12. 12. The amount of collected data from each prefecture 12 Thenumberofcollectedsentences Prefecture Thenumberofcollectedmunicipalities
  13. 13. Periods of collected data stored in the corpus 13 Thenumberofmunicipalities Collected periods (year)
  14. 14. Web-based visualization system 1. Full-text search for utterances 2. Context word extraction using KWIC (For prefectures only so far) 3. Map visualization with full-text search 4. Cross tabulation output with full-text search 14
  15. 15. Input of full-text search for utterances Input 原発 N-gram Exact match AND OR Search nuclear power plant 15 Export ▼
  16. 16. Output of full-text search Utterance Speaker’s Name Municipality Year Meeting Name Month & Day 16 ・ ・ ・ ・ ・ ・
  17. 17. Context word extraction using KWIC 17 (Only for minutes of prefecture assemblies) Node: nuclear power plant Preceding Context Following Context
  18. 18. Map visualization with full-text search 18 N-gram Exact match AND OR Input 原発 Frequency Rate nuclear power plant ×?
  19. 19. Cross tabulation output with full-text search 19 columns: YEAR rows:PREFECTURE Frequency of search results
  20. 20. Dialect study using map visualization with full-text search 20
  21. 21. Kanji compound “passing by” 「離合」 /rigoo/ 21
  22. 22. Honorific expression of “be doing” 「してみえる」 /shitemieru/ 22
  23. 23. Bring to an end 「終わす」 /owasu/ 23
  24. 24. “Especially” 「特にも」 /tokunimo/ 24
  25. 25. Second Person Pronouns 「あんた」 /aNta/(left), 「おめえ」 /omee/(right) Fumio Inoue(2016)「Appearance of Japanese Second Person Pronouns in Regional Conference Minutes」Meikai Japanese language journal (21), 1-16 25
  26. 26. Onomatopoeia 「ぴしゃっ」 /pishaQ/ 26
  27. 27. Onomatopoeia 「ばくっ」 /bakuQ/ 27
  28. 28. Demonstration of the online local assembly minutes search system ◦ A linguistic map is directly drawn from the corpus. ◦ It is a powerful tool which can discover geographical distribution of words and phrases. ◦ We can make use of rich data, however the amount of collected data is uneven. ◦ A balanced corpus is being prepared for strict statistical analysis. ◦ Municipality ◦ Assembly minutes for all prefectures/all prefectural capital cities ◦ Period ◦ 4 years (the term of a representative) 28

Workshop 5: Hansards as a dialectal resource, the Sixteenth International Conference on Methods in Dialectology (METHODS XVI), 11th 2017, Tachikawa

Views

Total views

37

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

0

Shares

0

Comments

0

Likes

0

×