The document discusses:
1. The development of a thesaurus of classical Japanese poetic vocabulary to better understand the connotations of words over time and how their usage changed.
2. The thesaurus is being developed using materials from the Hachidaishu, eight anthologies of Japanese poetry compiled between 905-2105 CE.
3. The thesaurus development involves processing the poetry data through a tokenizer, code converter, and other tools to extract and categorize the vocabulary terms according to their attributes.
The document summarizes a study examining the determinants of cattle prices in Ethiopia using a hedonic price formation analysis. Some of the key findings from the analysis include:
1) International beef prices have a strong influence on cattle prices in Ethiopia, with a price elasticity of 0.9.
2) Attributes like cattle breed, age, sex, and body condition significantly impact prices. For example, male cattle fetch 36% higher prices than female cattle.
3) Domestic factors like transportation costs, cooking oil prices, and rainfall also influence cattle prices, though to a lesser extent.
4) Cattle prices vary significantly across regions in Ethiopia and are generally higher in urban markets
The document discusses the development of a thesaurus of classical Japanese poetic vocabulary. It outlines how the thesaurus was created by analyzing poems from the Hachidaishu anthologies using techniques like tokenization, meta-code conversion, and matching original poems to scholarly translations to extract vocabulary terms and their meanings over time. The goal is to better understand the connotation and historical transition of classical poetic words in a longitudinal study.
The document provides an outline for Hilofumi Yamamoto's research and teaching. It summarizes his educational background, research interests, and contributions to students at Wollongong University. His research focuses on Japanese vocabulary and language teaching methods. Specific areas of research include the study of connotation and computer modeling of vocabulary using corpus linguistics techniques.
1. This document presents an analysis of term weighting methods for information retrieval and text mining.
2. It examines inverse document frequency (idf), collection term frequency (ctf), and co-occurrence weight (cw) as term weighting schemes.
3. The results show that cw, which combines ctf, idf, and co-occurrence information, outperforms other term weighting methods by better representing term importance and relevance to documents.
1. The document discusses methods for calculating weights for terms in documents, including term frequency (tf), inverse document frequency (idf), and weighted schemes that combine tf and idf like tfidf.
2. It provides examples of calculating idf values for specific terms and illustrates how idf values increase as terms appear in fewer documents.
3. Tables show ranked lists of term pairs based on their calculated co-occurrence weight (cw) values, which factor in co-occurrence frequency, idf, and co-information density.
The document summarizes a study examining the determinants of cattle prices in Ethiopia using a hedonic price formation analysis. Some of the key findings from the analysis include:
1) International beef prices have a strong influence on cattle prices in Ethiopia, with a price elasticity of 0.9.
2) Attributes like cattle breed, age, sex, and body condition significantly impact prices. For example, male cattle fetch 36% higher prices than female cattle.
3) Domestic factors like transportation costs, cooking oil prices, and rainfall also influence cattle prices, though to a lesser extent.
4) Cattle prices vary significantly across regions in Ethiopia and are generally higher in urban markets
The document discusses the development of a thesaurus of classical Japanese poetic vocabulary. It outlines how the thesaurus was created by analyzing poems from the Hachidaishu anthologies using techniques like tokenization, meta-code conversion, and matching original poems to scholarly translations to extract vocabulary terms and their meanings over time. The goal is to better understand the connotation and historical transition of classical poetic words in a longitudinal study.
The document provides an outline for Hilofumi Yamamoto's research and teaching. It summarizes his educational background, research interests, and contributions to students at Wollongong University. His research focuses on Japanese vocabulary and language teaching methods. Specific areas of research include the study of connotation and computer modeling of vocabulary using corpus linguistics techniques.
1. This document presents an analysis of term weighting methods for information retrieval and text mining.
2. It examines inverse document frequency (idf), collection term frequency (ctf), and co-occurrence weight (cw) as term weighting schemes.
3. The results show that cw, which combines ctf, idf, and co-occurrence information, outperforms other term weighting methods by better representing term importance and relevance to documents.
1. The document discusses methods for calculating weights for terms in documents, including term frequency (tf), inverse document frequency (idf), and weighted schemes that combine tf and idf like tfidf.
2. It provides examples of calculating idf values for specific terms and illustrates how idf values increase as terms appear in fewer documents.
3. Tables show ranked lists of term pairs based on their calculated co-occurrence weight (cw) values, which factor in co-occurrence frequency, idf, and co-information density.
1. The document summarizes research on analyzing the co-occurrence patterns of words in a large corpus of documents.
2. It finds that the number of high co-occurrence weight patterns between words is much smaller than the number of low co-occurrence weight patterns.
3. The document also presents examples of words that have high and low co-occurrence weights based on an analysis of a corpus of documents.
1. The document discusses methods for analyzing the relationships between terms in a corpus using measures like co-occurrence weight (cw) and inverse document frequency (idf).
2. It presents formulas for calculating cw, cidf, ctf, and ictf to capture term associations based on frequency of co-occurrence.
3. Tables of term pairs are provided with their calculated measure values to demonstrate the methods. The highest scoring pairs may indicate stronger semantic relations.
The document discusses performing incremental loads in SQL Server and SSIS. It describes:
1) Using T-SQL to identify new rows using a LEFT JOIN and updated rows by comparing all columns in an INNER JOIN. The rows are then inserted or updated respectively.
2) Implementing incremental loads in SSIS using a Lookup transformation to identify new and changed rows similarly to the T-SQL, and a Conditional Split to separate the rows into outputs which are loaded or updated using an OLE DB Destination and Command, respectively.
3) The approach maintains data integrity by only loading truly new or changed data in each load, making the process faster and using fewer resources than a full reload.
This document appears to be notes from a lecture or presentation on natural language processing and text mining techniques. It discusses topics like inverse document frequency, co-occurrence analysis, and graph-based representations of word relationships. Tables and graphs are included to illustrate co-occurrence patterns between words and how they are represented visually. The document also references various authors and their work related to semantics, meaning, and textual analysis.
MPEG es un formato de video digital que comprime secuencias de imágenes y sonido de forma sincronizada usando codificadores y descodificadores. Fue desarrollado por el grupo de expertos Moving Picture Experts Group perteneciente a la Organización Internacional de Normalización.
Passengers who request name modification after the risk-free period have to pay the United Airlines name change fee. Furthermore, in some cases, travelers have to pay the difference in fare if applicable. The airline doesn’t permit you to make other modifications (date change, fare classes, time, etc) under the name correction policy. Moreover, before you proceed, you must get all related information comprehensively. For that, you can call the consolidation desk at +1-800-865-1848 and get instant response.
A list of budget-friendly things that families can do in San Antonio! Dive into its rich history and vibrant culture at iconic landmarks like the Alamo. Explore colorful Market Square and stroll along the scenic River Walk. Enjoy family-friendly fun at Brackenridge Park and capture breathtaking views at the Tower of the Americas—all without breaking the bank!
The 09 Days Tour to Skardu by road offers a breathtaking journey through some of Pakistan’s most spectacular landscapes. Skardu, nestled in the heart of the Karakoram mountain range, is renowned for its stunning vistas, crystal-clear lakes, and rugged terrain.
Discover the exhilarating world of manta ray night snorkeling in Kona, Hawaii. Led by expert guides, participants witness these majestic creatures feeding on plankton under mesmerizing underwater lights. With stringent safety measures, environmental responsibility, and emergency preparedness, enjoy this unique adventure responsibly and securely with trusted tour operators.
The Inca Trail to Machu Picchu is an unforgettable adventure, blending stunning natural beauty with rich history. Over four days, trekkers traverse diverse landscapes, from lush cloud forests to high mountain passes, encountering ancient Inca ruins along the way. Each step brings you closer to the awe-inspiring sight of Machu Picchu, revealed at sunrise from the Sun Gate. The journey is challenging but incredibly rewarding, offering a profound sense of accomplishment. With its combination of breathtaking scenery and cultural significance, the Inca Trail to Machu Picchu is a must-do for those seeking an extraordinary adventure in Peru.
1. The document summarizes research on analyzing the co-occurrence patterns of words in a large corpus of documents.
2. It finds that the number of high co-occurrence weight patterns between words is much smaller than the number of low co-occurrence weight patterns.
3. The document also presents examples of words that have high and low co-occurrence weights based on an analysis of a corpus of documents.
1. The document discusses methods for analyzing the relationships between terms in a corpus using measures like co-occurrence weight (cw) and inverse document frequency (idf).
2. It presents formulas for calculating cw, cidf, ctf, and ictf to capture term associations based on frequency of co-occurrence.
3. Tables of term pairs are provided with their calculated measure values to demonstrate the methods. The highest scoring pairs may indicate stronger semantic relations.
The document discusses performing incremental loads in SQL Server and SSIS. It describes:
1) Using T-SQL to identify new rows using a LEFT JOIN and updated rows by comparing all columns in an INNER JOIN. The rows are then inserted or updated respectively.
2) Implementing incremental loads in SSIS using a Lookup transformation to identify new and changed rows similarly to the T-SQL, and a Conditional Split to separate the rows into outputs which are loaded or updated using an OLE DB Destination and Command, respectively.
3) The approach maintains data integrity by only loading truly new or changed data in each load, making the process faster and using fewer resources than a full reload.
This document appears to be notes from a lecture or presentation on natural language processing and text mining techniques. It discusses topics like inverse document frequency, co-occurrence analysis, and graph-based representations of word relationships. Tables and graphs are included to illustrate co-occurrence patterns between words and how they are represented visually. The document also references various authors and their work related to semantics, meaning, and textual analysis.
MPEG es un formato de video digital que comprime secuencias de imágenes y sonido de forma sincronizada usando codificadores y descodificadores. Fue desarrollado por el grupo de expertos Moving Picture Experts Group perteneciente a la Organización Internacional de Normalización.
Passengers who request name modification after the risk-free period have to pay the United Airlines name change fee. Furthermore, in some cases, travelers have to pay the difference in fare if applicable. The airline doesn’t permit you to make other modifications (date change, fare classes, time, etc) under the name correction policy. Moreover, before you proceed, you must get all related information comprehensively. For that, you can call the consolidation desk at +1-800-865-1848 and get instant response.
A list of budget-friendly things that families can do in San Antonio! Dive into its rich history and vibrant culture at iconic landmarks like the Alamo. Explore colorful Market Square and stroll along the scenic River Walk. Enjoy family-friendly fun at Brackenridge Park and capture breathtaking views at the Tower of the Americas—all without breaking the bank!
The 09 Days Tour to Skardu by road offers a breathtaking journey through some of Pakistan’s most spectacular landscapes. Skardu, nestled in the heart of the Karakoram mountain range, is renowned for its stunning vistas, crystal-clear lakes, and rugged terrain.
Discover the exhilarating world of manta ray night snorkeling in Kona, Hawaii. Led by expert guides, participants witness these majestic creatures feeding on plankton under mesmerizing underwater lights. With stringent safety measures, environmental responsibility, and emergency preparedness, enjoy this unique adventure responsibly and securely with trusted tour operators.
The Inca Trail to Machu Picchu is an unforgettable adventure, blending stunning natural beauty with rich history. Over four days, trekkers traverse diverse landscapes, from lush cloud forests to high mountain passes, encountering ancient Inca ruins along the way. Each step brings you closer to the awe-inspiring sight of Machu Picchu, revealed at sunrise from the Sun Gate. The journey is challenging but incredibly rewarding, offering a profound sense of accomplishment. With its combination of breathtaking scenery and cultural significance, the Inca Trail to Machu Picchu is a must-do for those seeking an extraordinary adventure in Peru.
Southwest Airlines Low Fare Calendar: The Ultimate Guidei2aanshul
Travelling doesn't have to be expensive, especially with tools like the Southwest Airlines Low Fare Calendar at your disposal. This guide will take you through everything you need to know about using this feature to snag the best deals on your flights. Whether you're a seasoned traveller or planning your first trip, this guide will ensure you get the most out of your budget.
Traveling with Frontier Airlines through Boston Logan International Airport offers a budget-friendly and efficient experience. With the modern facilities at Terminal C, extensive services, and amenities provided by Frontier, passengers can enjoy a comfortable journey. Whether you're a frequent flyer or a first-time traveler, this guide aims to help you navigate BOS with ease and make the most of your trip.
With the American Airlines name change policy, you can alter the incorrect name on your flight ticket/boarding pass without any fuss. Therefore, it’s essential to understand the major guidelines before requesting a name change/correction. However, if you still encounter any issues, you can navigate to the AA website or approach the airline over the phone. Additionally, you can talk with a flight expert at +1-866-738-0741 to get your problem fixed in a few minutes.
Our Bahrain Visa PowerPoint Presentation offers a detailed and comprehensive guide to the Bahrain visa application process. It is designed to assist travelers, travel agents, and businesses in navigating the various visa types, including tourist, business, work, student, and family visas. Each section provides an in-depth look at eligibility criteria, required documents, and step-by-step application procedures. Additionally, the presentation includes valuable tips for avoiding common application mistakes, an overview of processing times, and details on fees and payment methods. This presentation aims to ensure a smooth and successful visa application experience, making travel to Bahrain as seamless as possible.
Explore Austin's dynamic history and cultural tapestry on a captivating journey. From its origins as Texas' capital to architectural marvels like the Texas State Capitol and cultural hubs such as the Driskill Hotel. Dive into its diverse heritage, legendary music scene, key historical moments, natural beauty, and vibrant culinary delights.
1. Asialex 2011 Kyoto, Japan 1
Development of the Thesaurus of Classical
Japanese Poetic Vocabulary
Hilofumi Yamamoto
Tokyo Institute of Technology
Makiro Tanaka
National Institute of Japanese Language and Linguistics
22nd Aug. 2011
2. Asialex 2011 Kyoto, Japan 2
Outline
1. Purpose of Study
• Connotation of classical poetic vocabulary
• Longitudinal study of transition of vocabulary
2. Development of Thesaurus
3. Applications
3. Asialex 2011 Kyoto, Japan 3
Waka: Japanese Poetry
Tatsuta-Hime..
tamukuru KAMI no / arebakoso
aki no konoha no / nusa to chirurame
because Princess Tatsuta
has a god to whom she offers brocades,
the leaves of trees
in autumn will scatter
as an offering.
Prince Kanemi
No. 298 in the Kokinsh¯
u
4. Asialex 2011 Kyoto, Japan 4
Problem: Orthography
in Chinese characters
in hiragana
→ All Tatsuta (place name)
5. Asialex 2011 Kyoto, Japan 5
Problem: Unit size / attribution
The unit size and meaning of a word depends on a context.
• unit → or (Nakano, 1998)
• orthography →
(sad)
• attributions → ∈ plant or ∈ food
(unohana = a deutzia or bean curd refuse)
6. Asialex 2011 Kyoto, Japan 6
An Item of Thesaurus: God
BG-01-2030-01-030-A- -
↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
(1) (2) (3) (4) (5) (6) (7) (8)
Figure 1: Structure of an item of BG database in the case of kami (god):
(1) database ID (BG = short-unit general vocabulary);
(2) part of speech ID (01 = noun);
(3) group ID (2030 = Shinto deities and Buddhas);
(4) field ID;
(5) exact ID (030 = god);
(6) era-flag (A = contemporary, C = classic);
(7) Chinese character reading;
(8) Chinese character
7. Asialex 2011 Kyoto, Japan 7
Development: Thesaurus, KH, and t2c
• Thesaurus for classical poetic vocabulary
• KH (tokenizer)
• t2c (token to code converter)
8. Asialex 2011 Kyoto, Japan 8
Materials: the Hachidaish¯
u
• The Hachidaish¯ : eight anthologies compiled by
u
imperial orders during ca. 905–2105.
• The database: compiled by the National Institute of
Japanese Literature, Japan.
• Old texts taken based on Sh¯hobonban version of the
o
Hachidaish¯u )
) ) ) ) ) 205
05
)
51 ) 0 86 1 24 44 88 (1
•9 07 1 1 1 ¯
( •9 ( 0 (1 ( • ( •1 (1 shu
u¯ u¯ •1 sh
u¯ ¯
u ¯
u n
sh nsh u¯
(
u¯ i sh shu
¯
ish oki
ki
n
se sh sh
¯
yo ika za ink
K
o
G
o
J ui
¯ G
o
K
in h
S
n
Se Sh
46 56 79 38 20 44 17
⊲
⊲
⊲
⊲
⊲
⊲
⊲
⊲
900 950 1000 1050 1100 1150 1200 1250
9. Asialex 2011 Kyoto, Japan 9
Methods: Flowchart of data processing
ing P
e nt er sion o dell −O
opm nv lm CT
sdevel isat
ion
co d
e co ma tica ction: isat
ion
pu en a- he tra al
Co r Tok Met Mat Sub Visu
A B C D E F
10. Asialex 2011 Kyoto, Japan 10
Development: Thesaurus, KH, and t2c
• Thesaurus for classical poetic vocabulary
• KH (tokenizer)
• t2c (token to code converter)
12. Asialex 2011 Kyoto, Japan 12
Development: Thesaurus
Thesaurus
Tokeniser code tagger
Poem Texts kh t2c Hachidaishu
Thesaurus
add unknown entries add new thesaurus codes
Dictionary General, Place Name
Personal Name, etc
(A) (B)
13. Asialex 2011 Kyoto, Japan 13
(A) Corpus: Poems (OP)
KW00029800|A|KANEMI NO ¯=kanemi no ¯
O o
KW00029800|B|Tatsutahime[NOUN-PLNAME:TATSUTAHIME]/→
tamukuru[KASHIMO2-ATTR:TAMUkuru],kami[NOUN:KAMI]→
no[SUB]are[RAHEN-REAL]ba[CAUS]koso[KP]/→
aki[NOUN:AKI]no[CON],konoha[NOUN:KOnoHA]no[SUB]/→
nusa[NOUN:NUSA]to[P-CRD],chiru[RA4DAN-FF:CHIru]→
rame[CJR-REAL]/
Figure 2: Format of the database of a poem: → indicates continuing to the
next line without breaks; the first line, which includes |A|, indicates
the name of the poet; the second line which includes |B|, indicates
the contents of the poem and added information.
14. Asialex 2011 Kyoto, Japan 14
(A) Corpus: Translations (CT)
$A|000298
$B| →
$C|
$D| →
$I| →
→
Figure 3: Format of the database of a CT
15. Asialex 2011 Kyoto, Japan 15
(B) Tokenisation:
original text
↓
tokenising
/ / / /[ ]/ / / / / / / / / /[ ]
↓
converting into predicative form
/ / / /[ ]/ / / / / / / / / /[ ]
Figure 4: Tokenisation of poem texts
16. Asialex 2011 Kyoto, Japan 16
(C) meta-code conversion
CH-29-2130-01-010-A Tatsutahime Princess-Tatsuta
CH-29-0000-14-010-A -- -- Tatsuta Tatsuta
BG-01-2030-01-101-A -- -- hime princess
BG-02-3770-04-080-C tamukuru present(verb)
BG-01-5730-02-010-A -- -- te hand
BG-02-1700-01-040-A -- -- mukeru for
BG-01-2030-01-030-A kami god
BG-08-0061-07-010-A no SUB (particle)
BG-02-1200-01-010-C are be
BG-08-0064-26-010-A ba because (particle)
BG-04-1120-05-150-A -- -- ba because (reason)
BG-08-0065-01-010-A koso KP (emphasis)
Figure 5: Meta-code conversion in case of OP
17. Asialex 2011 Kyoto, Japan 17
(C) Structure of meta-code-1
BG-01-2030-01-030-A- -
↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
(1) (2) (3) (4) (5) (6) (7) (8)
Figure 6: Structure of an item of BG database in the case of kami (god):
(1) database ID (BG = short-unit general vocabulary);
(2) part of speech ID (01 = noun);
(3) group ID (2030 = Shinto deities and Buddhas);
(4) field ID;
(5) exact ID (030 = god);
(6) era-flag (A = contemporary, C = classic);
(7) Chinese character reading;
(8) Chinese character
18. Asialex 2011 Kyoto, Japan 18
(C) Structure of the meta-code-2
BG-01-2600-01-020-A (1) = BG-01-2610-01-040-A (2)
yononaka (world) yo (world)
+ BG-08-0010-01-021-A (3)
no (of)
+ BG-01-1770-01-080-A (4)
naka (inside)
Figure 7: Structure of an item of the semantic table in the case
of a compound word, yononaka (world)
19. Asialex 2011 Kyoto, Japan 19
(C) meta-code conversion-3
CH-29-2130-01-010-A Tatsutahime Princess-Tatsuta
CH-29-0000-14-010-A -- -- Tatsuta Tatsuta
BG-01-2030-01-101-A -- -- hime princess
BG-02-3770-04-080-C tamukuru present(verb)
BG-01-5730-02-010-A -- -- te hand
BG-02-1700-01-040-A -- -- mukeru for
BG-01-2030-01-030-A kami god
BG-08-0061-07-010-A no SUB (particle)
BG-02-1200-01-010-C are be
BG-08-0064-26-010-A ba because (particle)
BG-04-1120-05-150-A -- -- ba because (reason)
BG-08-0065-01-010-A koso KP (emphasis)
Figure 8: Meta-code conversion in case of OP
20. Asialex 2011 Kyoto, Japan 20
10th century 20th century
Field of experience Field of experience (expert)
poet write OP read expert reader
com
par write
e
CT
read
novice reader
20th century
Field of experience
(novice)
Figure 9: Schema of relationship between OP and CT
21. Asialex 2011 Kyoto, Japan 21
+-------- # of pair
| +----- value of matching level, exact=17, field=13, group=10
| | +-- # of POS
| | |
| | | # of element of OP ----+ +- # of element of CT
| | | element of OP -+ | | +--- element of CT
| | | | | | |
1 17 11 00 <-> 12 (Tatsutahime)
2 17 47 04 <-> 25 (hand)
3 17 47 05 <-> 26 (toward)
4 17 2 06 <-> 32 (god)
5 10 61 07 <-> 33 (SUB)
6 17 47 08 <-> 34 (be)
7 10 64 09 <-> 35 (because)
8 17 65 11 <-> 36 (EM)
9 17 2 12 <-> 38 (autumn)
10 17 71 13 <-> 39 (CON)
11 17 2 14 <-> 40 (leaf of tree)
12 17 2 19 <-> 45 (present)
13 17 61 20 <-> 46 (CRD)
14 17 47 21 <-> 49 (fall)
15 13 74 22 <-> 54 (CJR)
Figure 10: Example of the matching process
22. Asialex 2011 Kyoto, Japan 22
Residual
CT ( ) ( )
OP — —— — — — — — — — — — — — —— —
CT ( ) ( ) ( ) ( )
OP — — [ ] — — — — — —
Figure 11: Example of the matching process in the case of kks 298 in Ko-
machiya (1982)
23. Asialex 2011 Kyoto, Japan 23
Components of OP
Table 2: Result of subtracting the elements of OP(298) from those
of CT(298, koma): it indicates the ratio of the ingredients
of OP(298).
OP (valid number of element) = 16
E (ratio of exact match) 12/16 = 0.750
F (ratio of field match) 1/16 = 0.062
G (ratio of group match) 2/16 = 0.125
T (ratio of total match) 15/16 = 0.938
U (ratio of unmatched OP) 1 - T = 0.062
24. Asialex 2011 Kyoto, Japan 24
Calculation of Residual Rate
P
D = 1− (1)
T
16
= 1− (2)
41
= 0.61 (3)
25. Asialex 2011 Kyoto, Japan 25
Components of CT
Table 3: Component of CT in case of kks 298 by Komachiya (1982):
fabs(D-H) stands for the function of the absolute value of the prac-
tical value, D, minus the theoretical value, H.
CT (valid number of element) =41
W (ratio of original word use) 12/41=0.293(E/CT)
A (ratio of annotation) 1-0.293=0.707(1-W)
---breakdown of the annotation---
P1(ratio of FG paraphrased) (0.62+0.12)/0.707=0.073(F+G)/A
P2(ratio of U paraphrased) (0.707-0.073)*0.062=0.040(A-P1)*U
D (ratio of purely added) 0.707-(0.073+0.040)=0.595A-(P1+P2)
H (theoretical value of D) 1-16/41=0.6101-OP/CT
Gap fabs(0.595-0.610)=0.015fabs(D-H)
26. Asialex 2011 Kyoto, Japan 26
Subtraction: CT - OP
P1 3 (7.3%)
P2 1 (4.0%) W 12 (29.3%)
Exact 12 (75.0%)
Unmatched 1 (6.2%)
D 25 (59.5%)
Group 2 (12.5%)
Field 1 (6.2%)
OP(298) : 16 elements CT(298,koma) : 41 elements
Figure 12: Pie-charts illustrating the components of OP(298) and CT(298,
koma)
28. Asialex 2011 Kyoto, Japan 28
far treetop high.1
7regret
force separation
7 treetop high.3
go over
5
10
6 be heard.1 7
4
this morning 10 near
9
10
summer mountains
hear borrow Otowa.PN
37
6
29
69 19 11 old age
11
treetop 20
20
a cry
19
singing voice 20
every morning
cuckoo mountain
10 21
wear in (my) hair
8 stop.vi.1 8 6
39 110
14 9 261 4
summer midsummer rain sing.vi field
side 8 20 green willow
4
12 10
42
174 15 plum
44 145 4
17 10
9 woven hat
last year 10
26 voice 62
56
break off23
10
6
sew.2
10
May 22
mountain cuckoo 6 10
warbler 7
6 6
9
35 branch
88
Tatsuta.PN 29
cry.vi
52 138
7 hide.vi.2
flutter.2 8 10 30
imperceptibly spring
scatter.1
10
flower
9
10
9
yet.1
iris.1 reason.1
6
touch lure
stand.vi
4
send
spring haze 7
5
4
10
fragrance.1
attach
hand guidance.1
warbler-CT-23-229-3.73-15 cuckoo-CT-40-370-3.27-16
29. Asialex 2011 Kyoto, Japan 29
Conclusion
The thesaurus annotated with meta-codes allows researchers
1. to identify different orthographies as the same word;
2. to attach an alternative semantic ID to a word which has the
same form but has more than one meaning (polysemic word);
3. to attach meta-codes not only to tokens recognised as a
single/simple word but also to attach it to a longer size token
4. to indicate a similarity between tokens.
5. to detect common or different tokens among more than one text,
which will tell us the similarities or differences between texts.
6. to indicate the relative differences between two words in literary
works.
30. Asialex 2011 Kyoto, Japan 30
Questions
• Computer Modelling of Classical Japanese Poetic
Vocabulary
http://etymology.jp/waka/poem.cgi
• Inquiry:
Hilofumi Yamamoto
yamagen@ryu.titech.ac.jp
• Thank you.