Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Demonstration of the Online Local Assembly Minutes Search System
1. Demonstration of the Online
Local Assembly Minutes
Search System
KEIICHI TAKAMARU
2. Local Assemblies in Japan
2
Prefecture City
Specified
districts
Town and
village
Total
47 790 23 928 1,788
Number of local governments
Prefecture City
Specified
districts
Town and
village
Total
2,687 18,654 902 11,271 33,514
Number of representatives
http://www.soumu.go.jp/senkyo/senkyo_s/data/syozoku/h27.html
3. Web publishing of
local assembly minutes
Rate of web publishing: 86%
◦ Prefectures:
◦ 100%
◦ Cities and specified districts:
◦ 100%
◦ Towns and villages
◦ 73%
3
Investigation by Otaru university
of commerce (May 2016)
4. Plenary session and committee
(photo:Nagano Prefecture)
Plenary session
◦ Question to a head of local
government
◦ Discussing and voting on a
bill
Committee
◦ Examining a bill before
discussing and voting in a
plenary session
4
5. Local assembly minutes on
Utsunomiya city’s web sites
5
Year
Meeting Name
Month & Day
Speaker’s Name
6. Utterances recorded
in local assembly minutes
Assembly minutes:
◦ transcription of utterances in a minutes
Some expressions that make the minutes difficult to
read are modified.
◦ Fillers /a:/ /e:/
◦ Obvious slips of the tongue
◦ Obvious colloquial expressions
◦ Some dialectal expressions are modified as colloquial expressions.
Local assembly minutes contain rich dialectal
expressions.
6
7. Are assemblypersons
native speakers?
According to Takeyasu (2004):
◦ Survey of local representatives
◦ Number of responses: 16,844
◦ Execution period: February to April, 2002
87.7% of representatives
◦ Live in their birth prefectures
75.5% of representatives
◦ Live in their birth cities (or towns or villages)
7
竹安栄子(2004)「地方議員のジェンダー差異 : 「2002年全国地方議員調査」結果の分析より」
8. Construction of a Corpus
of Local Assembly Minutes
8
Utterance
+Municipality, Date, Speaker
1)Scraping Local
Assembly Minutes
Published on the Web
Politics, Sociology, Linguistics, Computer Science and so on
Corpus of Local
Assembly Minutes
2)Sharing the Corpus
3)Interdisciplinary Research
Using the Corpus
10. Outline of the System Architecture
Elasticsearch
An open-source
full-text search engineUtterance
Index
Indexing
Web-based
Visualization System
①Input a
Search Query
④Visualized
Output
②Request ③Response
Corpus of Local
Assembly Minutes
10
11. Corpus development project
(2010 – 2014)
Created scraping programs for four major
assembly minutes web publication systems
Collected as many assembly minutes from
municipalities’ web sites as possible
◦ Number of municipalities: 425
◦ Number of sentences: 125 million
◦ Number of characters: 8.1 billion
11
12. The amount of collected data
from each prefecture
12
Thenumberofcollectedsentences
Prefecture
Thenumberofcollectedmunicipalities
13. Periods of collected data
stored in the corpus
13
Thenumberofmunicipalities
Collected periods (year)
14. Web-based visualization system
1. Full-text search for utterances
2. Context word extraction using KWIC
(For prefectures only so far)
3. Map visualization with full-text search
4. Cross tabulation output with full-text search
14
15. Input of full-text search
for utterances
Input 原発
N-gram Exact match AND OR
Search
nuclear power plant
15
Export ▼
16. Output of full-text search
Utterance
Speaker’s Name Municipality Year Meeting Name Month & Day
16
・
・
・
・
・
・
17. Context word extraction using KWIC
17
(Only for minutes of prefecture assemblies)
Node:
nuclear power plant
Preceding Context Following Context
18. Map visualization with full-text search
18
N-gram Exact match AND OR
Input 原発
Frequency Rate
nuclear power plant
×?
25. Second Person Pronouns
「あんた」 /aNta/(left), 「おめえ」 /omee/(right)
Fumio Inoue(2016)「Appearance of Japanese Second Person Pronouns in Regional
Conference Minutes」Meikai Japanese language journal (21), 1-16 25
28. Demonstration of the online local
assembly minutes search system
◦ A linguistic map is directly drawn from the corpus.
◦ It is a powerful tool which can discover geographical
distribution of words and phrases.
◦ We can make use of rich data, however the amount of
collected data is uneven.
◦ A balanced corpus is being prepared for strict statistical
analysis.
◦ Municipality
◦ Assembly minutes for all prefectures/all prefectural capital cities
◦ Period
◦ 4 years (the term of a representative)
28