SlideShare a Scribd company logo
Lyric Mining: Word, Rhyme & Concept
                               Co-occurrence Analysis
    Karthika Ranganathan, T.V Geetha, Ranjani Parthasarathi & Madhan Karky
                                         Tamil Computing Lab (TaCoLa),
                         College of Engineering Guindy, Anna University, Chennai.
                             karthika.cyr@gmail.com, madhankarky@gmail.com

ABSTRACT
Computational creativity is one area of NLP which requires extensive analysis of large datasets.
Laalalaa [1] framework for Lyric analysis and generation proposed a lyric analysis subsystem that
required statistical analysis of Tamil lyrics. In this paper, we propose a data analysis model for words,
rhymes and their usage in Tamil lyrics. The proposed analysis model extracts the root words from
lyrics using a morphological analyzer [2] to compute the word frequency across the lyric dataset. The
words in their unanalyzed form are used for computing the frequent rhyme, alliteration and end-
rhyme pairs using adapted apriori algorithm. Frequent co-occurring concepts in lyrics are also
computed using Agaraadhi, an on-line Tamil dictionary. Presenting the results, this paper concludes
by discussing the need of such an analysis to compute freshness, pleasantness of a lyric and using
these statistics for Lyric Generation.

Keywords : Tamil Lyrics, Morphological Analyser, Apriori algorithm.

I. INTRODUCTION
Tamil is one of the world's oldest languages and has a Classical status. Numerous forms of literature
exist in Tamil language of which, lyrics play a vital role in taking the language to every house hold in
form of original film soundtracks, jingles, private albums, and commercials. With over thousands of
lyrics being created every year, we do not have proper tools to model and analyse lyrics. Such an
analysis framework would enable one to see various patterns of words, combinations and thoughts
used over time. The analysis framework will also make it possible to generate fresh lyrics where the
freshness can be associated with the concepts and thoughts associated with the lyric.

In this paper, we discuss about Tamil lyric Analysis on the basis of word usage, rhyme usage and the
co-occurrence of word. The frequency of word usage is identified by considering a morphological root
of the word using morphological analyser, instead of considering terms. For analysing the frequent
rhyme, alliteration and end-rhyme pairs, we adapted Apriori algorithm [4]. To identify the co-
occurring concepts in lyrics, we used “Agaraadhi”, an on-line Tamil dictionary Framework [3] and a
new algorithm has been proposed to compute the frequent usage of co-occurring concepts in lyrics.

The rest of this paper has been organized as follows. In Section 2, we explain about the algorithm and
tools. In Section 3, we explain the methodology and in Section 4, we discuss our results. Conclusions
and future extensions to this work are presented in section 5.




                                                     276
2. MORPHOLOGICAL ANALYSIS AND APRIORI ALGORITHM
Morphological analysis is the process of segmenting words into morphemes and identifying its
grammatical categories. For a given word, morphological analyser (MA) generates its root word and
its grammatical information. The role of morphological analyser in the proposed work is to identify
the noun and verb morphology of a given word, examples are as follows

Example 1, for noun morphology:

                                       இராமைன (Ramanai)

                                     இராம           (Raman)    + ஐ (ai)

                                     Entity                     + Accusative Case



        Example 2, for verb morphology:

                                        ெச     றா          (Senraan)

                                  ெச      (sel) +     (R) + ஆ          (Aan)

                 Verb + Past Tense Marker + Third Person Masculine Singular Suffix



In the proposed work, the frequency of a word is identified by considering the variations of a noun in
terms of its morphology. For instance, the variations of Ramanai such as Ramanaal, Ramanukku,
Ramanin, Ramanadhu are also counted for the word Raman.

The Apriori Algorithm is an influential algorithm for mining frequent item sets for Boolean
association rules [4]. Apriori uses a "bottom up" approach, where frequent subsets are extended one
item at a time (a step known as candidate generation), and groups of candidates are tested against the
data. The algorithm terminates when no further successful extensions are found. It has objective
measures: support and confidence. The support of an association pattern is the percentage of task-
relevant data transactions for the apparent pattern. Confidence can be defined as the measure of
certainty or trustworthiness associated with each discovered pattern.

3. LYRIC ANALYSIS
(i) Word Analysis

The frequency of words is used to associate a popularity score for each word. This score is proposed to
be used for lyric generation part of the frame work proposed in [1]. In this work, the popularity score
of a word has been identified from lyrics. In lyrics, the words are mainly attached with the suffix. So,
the root words are taken into consideration for determining its frequency count. The root words are
identified using morphological analyser. The algorithm to find the word usage is illustrated below:

Algorithm
Let LD is the set of lyric dataset and LS denotes the set of sentences of lyric dataset and LW denotes the
set of words in the lyric dataset. WC denotes the word count across all lyric dataset. Let m be the total
number of sentences in lyrics and n be the total number of words in lyrics.

                                                     277
a) Given a Lyric dataset LD

b) For each LSi ← 1 to m

        Split the sentence LS into words LW

c) For each LWj ← 1 to n

        RW     ← ProcessMorphAnalyser(LWj )


d) Let RW be the root word

        if RW exist, then add into the word count list ( WC)

        else add into the word count list ( WC)

e)ReturnWC.
Here ProcessMorphAnalyser(LWj ) returns the root of the given word.

(ii) Rhyme Analysis

Alliteration (Monai) is the repetition of the same letter at the beginning of words. The rhyme
(Edhugai) is defined as the repetition of the same letter at the second position of words. The end
rhyme (iyaibu) is defined as the repetition of the same letter at the last position of words. The example
of alliteration, rhyme and end rhyme is given below:

Examples:

உய      and உ         rhyme in alliteration (monai) as they start with the same letter.

இதய        and காத         rhyme in rhyme (edhugai) as they share the same second letter.

யா ைக and வா               ைக rhyme in end – rhyme (iyaibu) as they share the same last letter.

We have adapted apriori algorithm to find the frequency count of rhyme, alliteration and end rhyme
pairs of Tamil lyrics which has been illustrated below:

Algorithm:

Let LD be the set of lyric dataset and LS denote the set of sentences of Lyric dataset. Let m be the total
number of sentences in lyrics. Let PC1 denote the count of alliteration and PC2 denote the count of
rhyme and PC3 denote the count of end – rhyme.

a) Given lyric dataset LD .

b) For each LS ← 1 to m

        Join the pair of sentences (LP)

c) For each LP

        Consider the first (LP1) and last words (LP2)

d) Rhyme (LP1, LP2)

e) return PC

                                                     278
Algorithm : Rhyme (LP1, LP2)

a) Let k denote the ith character. Let ML denote the alliteration (monai) list and RL denote the rhyme
(edhugai) list and EL denote the end – rhyme (iyaibu) list.

b) For ∀i,

        if k = 1,   if LP1(k) = LP2(k), then add into ML list and increment PC1

        if k = 2,   if LP1(k) = LP2(k), then add into RL list and increment PC2

        if k = i - 1, if LP1(k) = LP2(k), then add into EL list and increment PC3

(iii) Co-occurrence concept Analysis

Co-occurrence is defining the frequent occurrence of two terms from a text corpus on the either side in
a certain order. This word information in NLP system is extremely high. It is very important for
cancelling the ambiguous and the polysemy of words to improve the accuracy of the entire system [5].

In this method, to improve the efficiency of co-occurrence, we have been considering the concept of
each word. The concept for each word has been identified using the Agaraadhi, an on-line Tamil
dictionary. The example for concept word which has been in lyric is given below:

Example:     The    word   "நில        ”which     has   the   concept    ெவ         ண லா,   மதி,   மாத ,
  ைண ேகா            , ெவ    ணல        ,அ      லி, அ        லிமா     .

By considering these concepts, we have been determining the co-occurring words using our own
algorithm is described below:

Algorithm :

Let LD denote the set of Lyric dataset and W denote each word in lyric. Let CW denote the set of
concepts for each word. Let WC denote the word count.

a) If the word CW identify, then consider the next word.

        Increment the count WC

b) Else the word W and consider the next word.

        Increment the count WC

c) return WC

4. RESULTS

The lyric corpus of more than two thousand songs were analysed for the word usage, rhyme usage
and Co-occurence concepts usage. The analysed results are given below:




                                                     279
Table 1 shows the list of top 10 usage words in lyrics.


                       WORDS             USAGE             WORDS              USAGE
                       ந                 2009              வா                 1062
                       எ                 1941              ஒ                  987
                       நா                1645              க                  965
                       உ                 1556                                 857
                       காத               1153              இ    ைல            793


Table 2 shows the list of top 10 rhyme words in lyrics.


     EDHUGAI                USAGE        MONAI                        USAGE          IYAIBU          USAGE

     எ    ,உ                107975       எ      ,எ    ைன              33492          எ    ,உ         111552

     நா    ,எ               80125        உ      ைன,உ                  26289          நா    ,எ        83435

     நா    ,உ               61204        எ த         ,எ               16478          நா    ,உ        63543

     எ    ,உ    ைன          33731        எ      ,எ    ன               15405          எ    ,உ த       18411

     எ    ,எ    ைன          32570        உ த         ,உ               14001          எ த       ,எ    16478

     உ    ைன,உ              25747        உய ,உ                        11640          உ த        ,உ   14001

     எ    ைன,உ              24867        இ த,இ                        10993          எ த       ,உ    12524

     நா    ,உ   ைன          19297        என      ,எ                   9985           நா    ,உ த      10524

     நா    ,எ   ைன          18935        எ த,எ                        9976           எ த       ,நா   9367

     எ    ,எ    ன           14486        ந,ந                          9962           இ த,அ த         4818

Table 3 shows the list of top 10 co-occurring concept words in lyrics.

                      CO-OCCURING WORDS                         USAGE
                      அ ேப,அ ேப                                 638
                      சி    ன,சி     ன                          530
                      வா,வா                                     506
                      எ     ,காத                                478
                      ஒேர,ஒ                                     469
                      ந      ,நா                                434
                      உ     ைன,நா                               419
                      தமி ,எ         க                          367
                      ஒ ,நா                                     302
                      ந,எ    ைன                                 287


                                                          280
By analysing those data, this shows that the most of lyrics which predicts the emotion of happiness
and love. In the adapted apriori algorithm, the support which represents the total number of pair
words with the total number of combination of sentences and the confidence which described the total
number of pair words with the total number of pair of sentences. The results may vary if the number
of lyrics used for the analysis is increased.

5. CONCLUSIONS AND FUTURE WORK
In this paper, we discussed the usage of words and rhymes in the lyrics dataset. The adapted apriori
algorithm has been used to detect the frequency count for rhyme, alliteration and end word pairs. This
analysis has been mainly used in the lyric generation and computing freshness scoring for lyrics.
Frequent co-occurring concept is also been identified for the development of lyric extraction, semantic
relationship, word sense identification and sentence similarity. Possible extensions of this work could
be the detection of emotions in lyrics by genre classification and identify genre specific rhymes and
concept co-occurrence.

REFERENCES
        Sowmiya Dharmalingam, Madhan Karky, “LaaLaLaa – A Tamil Lyric Analysis and
        Generation Framework” in World Classical Tamil Conference – June 2010, Coimbatore.

        Anandan P, Ranjani Parthasarathy, Geetha, T.V. “Morphological analyzer for Tamil”. ICON
        2002.

        Agaraadhi Online Tamil Dictionary. http://www.agaraadhi.com, Last accessed date 25h April
        2011.

        HAN Feng, ZHANG Shu-mao, DU Ying-shuang, “The analysis and improvement of Apriori
        algorithm”, Journal of Communication and Computer, ISSN1548-7709, USA, Sep. 2008,
        Volume 5, No.9 (Serial No.46).

        EI-Sayed Atlam, Elmarhomy Ghada, Masao Fuketa, Kazuhiro Morita and Jun-ichi Aoe, “New
        Hierarchy Technique Using Co-Occurrence Word Information”, International Journal of
        Information Processing and Management, Volume 40 Issue 6, November 2004.




                                                 281

More Related Content

What's hot

NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
Hemantha Kulathilake
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion Analysis
Seth Grimes
 
Ijartes v1-i1-002
Ijartes v1-i1-002Ijartes v1-i1-002
Ijartes v1-i1-002
IJARTES
 
Design of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizerDesign of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizer
csandit
 
DESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZERDESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZER
csandit
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
eSAT Publishing House
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
eSAT Journals
 
Spell checker using Natural language processing
Spell checker using Natural language processing Spell checker using Natural language processing
Spell checker using Natural language processing
Sandeep Wakchaure
 
Verb based manipuri sentiment analysis
Verb based manipuri sentiment analysisVerb based manipuri sentiment analysis
Verb based manipuri sentiment analysis
ijnlc
 
Nlp
NlpNlp
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
csandit
 
Quality estimation of machine translation outputs through stemming
Quality estimation of machine translation outputs through stemmingQuality estimation of machine translation outputs through stemming
Quality estimation of machine translation outputs through stemming
ijcsa
 
Pxc3898474
Pxc3898474Pxc3898474
Pxc3898474
Sivajyothi Chandra
 
Sanskrit in Natural Language Processing
Sanskrit in Natural Language ProcessingSanskrit in Natural Language Processing
Sanskrit in Natural Language Processing
Hitesh Joshi
 
Implementation Of Syntax Parser For English Language Using Grammar Rules
Implementation Of Syntax Parser For English Language Using Grammar RulesImplementation Of Syntax Parser For English Language Using Grammar Rules
Implementation Of Syntax Parser For English Language Using Grammar Rules
IJERA Editor
 
IRJET- A System for Determining Sarcasm in Tweets: Sarcasm Detector
IRJET-  	  A System for Determining Sarcasm in Tweets: Sarcasm DetectorIRJET-  	  A System for Determining Sarcasm in Tweets: Sarcasm Detector
IRJET- A System for Determining Sarcasm in Tweets: Sarcasm Detector
IRJET Journal
 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment
andrefsantos
 

What's hot (19)

NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion Analysis
 
Ijartes v1-i1-002
Ijartes v1-i1-002Ijartes v1-i1-002
Ijartes v1-i1-002
 
Design of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizerDesign of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizer
 
DESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZERDESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZER
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 
FIRE2014_IIT-P
FIRE2014_IIT-PFIRE2014_IIT-P
FIRE2014_IIT-P
 
Spell checker using Natural language processing
Spell checker using Natural language processing Spell checker using Natural language processing
Spell checker using Natural language processing
 
Verb based manipuri sentiment analysis
Verb based manipuri sentiment analysisVerb based manipuri sentiment analysis
Verb based manipuri sentiment analysis
 
Nlp
NlpNlp
Nlp
 
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
 
Quality estimation of machine translation outputs through stemming
Quality estimation of machine translation outputs through stemmingQuality estimation of machine translation outputs through stemming
Quality estimation of machine translation outputs through stemming
 
Pxc3898474
Pxc3898474Pxc3898474
Pxc3898474
 
Sanskrit in Natural Language Processing
Sanskrit in Natural Language ProcessingSanskrit in Natural Language Processing
Sanskrit in Natural Language Processing
 
mtp_stageII_final
mtp_stageII_finalmtp_stageII_final
mtp_stageII_final
 
Implementation Of Syntax Parser For English Language Using Grammar Rules
Implementation Of Syntax Parser For English Language Using Grammar RulesImplementation Of Syntax Parser For English Language Using Grammar Rules
Implementation Of Syntax Parser For English Language Using Grammar Rules
 
IRJET- A System for Determining Sarcasm in Tweets: Sarcasm Detector
IRJET-  	  A System for Determining Sarcasm in Tweets: Sarcasm DetectorIRJET-  	  A System for Determining Sarcasm in Tweets: Sarcasm Detector
IRJET- A System for Determining Sarcasm in Tweets: Sarcasm Detector
 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment
 

Similar to I3 madankarky2 karthika

CLASSIFICATION OF TAMIL POETRY USING CONTEXT FREE GRAMMAR USING TAMIL GRAMMAR...
CLASSIFICATION OF TAMIL POETRY USING CONTEXT FREE GRAMMAR USING TAMIL GRAMMAR...CLASSIFICATION OF TAMIL POETRY USING CONTEXT FREE GRAMMAR USING TAMIL GRAMMAR...
CLASSIFICATION OF TAMIL POETRY USING CONTEXT FREE GRAMMAR USING TAMIL GRAMMAR...
cscpconf
 
Classification of tamil poetry using context free grammar using tamil grammar...
Classification of tamil poetry using context free grammar using tamil grammar...Classification of tamil poetry using context free grammar using tamil grammar...
Classification of tamil poetry using context free grammar using tamil grammar...
csandit
 
Sanskrit parser Project Report
Sanskrit parser Project ReportSanskrit parser Project Report
Sanskrit parser Project Report
Laxmi Kant Yadav
 
A Statistical Model for Morphology Inspired by the Amis Language
A Statistical Model for Morphology Inspired by the Amis LanguageA Statistical Model for Morphology Inspired by the Amis Language
A Statistical Model for Morphology Inspired by the Amis Language
dannyijwest
 
A statistical model for morphology inspired by the Amis language
A statistical model for morphology inspired by the Amis languageA statistical model for morphology inspired by the Amis language
A statistical model for morphology inspired by the Amis language
IJwest
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
GeekNightHyderabad
 
Developing links of compound sentences for parsing through marathi link gramm...
Developing links of compound sentences for parsing through marathi link gramm...Developing links of compound sentences for parsing through marathi link gramm...
Developing links of compound sentences for parsing through marathi link gramm...
ijnlc
 
Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis
Automatic Generation of Compound Word Lexicon for Marathi Speech SynthesisAutomatic Generation of Compound Word Lexicon for Marathi Speech Synthesis
Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis
iosrjce
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systems
paperpublications3
 
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmmUnit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
DhruvKushwaha12
 
Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)
Dhabal Sethi
 
Analysis of lexico syntactic patterns for antonym pair extraction from a turk...
Analysis of lexico syntactic patterns for antonym pair extraction from a turk...Analysis of lexico syntactic patterns for antonym pair extraction from a turk...
Analysis of lexico syntactic patterns for antonym pair extraction from a turk...
csandit
 
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...
cscpconf
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
ijistjournal
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
ijistjournal
 
Ceis 3
Ceis 3Ceis 3
Paper id 25201466
Paper id 25201466Paper id 25201466
Paper id 25201466IJRAT
 
Statistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationStatistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationAndi Wu
 
An implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzerAn implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzer
ijnlc
 

Similar to I3 madankarky2 karthika (20)

CLASSIFICATION OF TAMIL POETRY USING CONTEXT FREE GRAMMAR USING TAMIL GRAMMAR...
CLASSIFICATION OF TAMIL POETRY USING CONTEXT FREE GRAMMAR USING TAMIL GRAMMAR...CLASSIFICATION OF TAMIL POETRY USING CONTEXT FREE GRAMMAR USING TAMIL GRAMMAR...
CLASSIFICATION OF TAMIL POETRY USING CONTEXT FREE GRAMMAR USING TAMIL GRAMMAR...
 
Classification of tamil poetry using context free grammar using tamil grammar...
Classification of tamil poetry using context free grammar using tamil grammar...Classification of tamil poetry using context free grammar using tamil grammar...
Classification of tamil poetry using context free grammar using tamil grammar...
 
Sanskrit parser Project Report
Sanskrit parser Project ReportSanskrit parser Project Report
Sanskrit parser Project Report
 
A Statistical Model for Morphology Inspired by the Amis Language
A Statistical Model for Morphology Inspired by the Amis LanguageA Statistical Model for Morphology Inspired by the Amis Language
A Statistical Model for Morphology Inspired by the Amis Language
 
A statistical model for morphology inspired by the Amis language
A statistical model for morphology inspired by the Amis languageA statistical model for morphology inspired by the Amis language
A statistical model for morphology inspired by the Amis language
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Developing links of compound sentences for parsing through marathi link gramm...
Developing links of compound sentences for parsing through marathi link gramm...Developing links of compound sentences for parsing through marathi link gramm...
Developing links of compound sentences for parsing through marathi link gramm...
 
Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis
Automatic Generation of Compound Word Lexicon for Marathi Speech SynthesisAutomatic Generation of Compound Word Lexicon for Marathi Speech Synthesis
Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesis
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systems
 
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmmUnit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
 
Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)
 
Analysis of lexico syntactic patterns for antonym pair extraction from a turk...
Analysis of lexico syntactic patterns for antonym pair extraction from a turk...Analysis of lexico syntactic patterns for antonym pair extraction from a turk...
Analysis of lexico syntactic patterns for antonym pair extraction from a turk...
 
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 
Ceis 3
Ceis 3Ceis 3
Ceis 3
 
Paper id 25201466
Paper id 25201466Paper id 25201466
Paper id 25201466
 
Statistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationStatistically-Enhanced New Word Identification
Statistically-Enhanced New Word Identification
 
Rap Lyric Generator
Rap Lyric GeneratorRap Lyric Generator
Rap Lyric Generator
 
An implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzerAn implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzer
 

More from Jasline Presilda (20)

I6 mala3 sowmya
I6 mala3 sowmyaI6 mala3 sowmya
I6 mala3 sowmya
 
I4 madankarky3 subalalitha
I4 madankarky3 subalalithaI4 madankarky3 subalalitha
I4 madankarky3 subalalitha
 
I2 madankarky1 jharibabu
I2 madankarky1 jharibabuI2 madankarky1 jharibabu
I2 madankarky1 jharibabu
 
I1 geetha3 revathi
I1 geetha3 revathiI1 geetha3 revathi
I1 geetha3 revathi
 
Hari tamil-complete details
Hari tamil-complete detailsHari tamil-complete details
Hari tamil-complete details
 
H4 neelavathy
H4 neelavathyH4 neelavathy
H4 neelavathy
 
H3 anuraj
H3 anurajH3 anuraj
H3 anuraj
 
H2 gunasekaran
H2 gunasekaranH2 gunasekaran
H2 gunasekaran
 
H1 iniya nehru
H1 iniya nehruH1 iniya nehru
H1 iniya nehru
 
G3 chandrakala
G3 chandrakalaG3 chandrakala
G3 chandrakala
 
G2 selvakumar
G2 selvakumarG2 selvakumar
G2 selvakumar
 
G1 nmurugaiyan
G1 nmurugaiyanG1 nmurugaiyan
G1 nmurugaiyan
 
Front matter
Front matterFront matter
Front matter
 
F2 pvairam sarathy
F2 pvairam sarathyF2 pvairam sarathy
F2 pvairam sarathy
 
F1 ferdinjoe
F1 ferdinjoeF1 ferdinjoe
F1 ferdinjoe
 
Emerging
EmergingEmerging
Emerging
 
E3 ilangkumaran
E3 ilangkumaranE3 ilangkumaran
E3 ilangkumaran
 
E2 tamilselvan
E2 tamilselvanE2 tamilselvan
E2 tamilselvan
 
E1 geetha2 karthikeyan
E1 geetha2 karthikeyanE1 geetha2 karthikeyan
E1 geetha2 karthikeyan
 
D5 radha chellappan
D5 radha chellappanD5 radha chellappan
D5 radha chellappan
 

Recently uploaded

"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 

Recently uploaded (20)

"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 

I3 madankarky2 karthika

  • 1.
  • 2. Lyric Mining: Word, Rhyme & Concept Co-occurrence Analysis Karthika Ranganathan, T.V Geetha, Ranjani Parthasarathi & Madhan Karky Tamil Computing Lab (TaCoLa), College of Engineering Guindy, Anna University, Chennai. karthika.cyr@gmail.com, madhankarky@gmail.com ABSTRACT Computational creativity is one area of NLP which requires extensive analysis of large datasets. Laalalaa [1] framework for Lyric analysis and generation proposed a lyric analysis subsystem that required statistical analysis of Tamil lyrics. In this paper, we propose a data analysis model for words, rhymes and their usage in Tamil lyrics. The proposed analysis model extracts the root words from lyrics using a morphological analyzer [2] to compute the word frequency across the lyric dataset. The words in their unanalyzed form are used for computing the frequent rhyme, alliteration and end- rhyme pairs using adapted apriori algorithm. Frequent co-occurring concepts in lyrics are also computed using Agaraadhi, an on-line Tamil dictionary. Presenting the results, this paper concludes by discussing the need of such an analysis to compute freshness, pleasantness of a lyric and using these statistics for Lyric Generation. Keywords : Tamil Lyrics, Morphological Analyser, Apriori algorithm. I. INTRODUCTION Tamil is one of the world's oldest languages and has a Classical status. Numerous forms of literature exist in Tamil language of which, lyrics play a vital role in taking the language to every house hold in form of original film soundtracks, jingles, private albums, and commercials. With over thousands of lyrics being created every year, we do not have proper tools to model and analyse lyrics. Such an analysis framework would enable one to see various patterns of words, combinations and thoughts used over time. The analysis framework will also make it possible to generate fresh lyrics where the freshness can be associated with the concepts and thoughts associated with the lyric. In this paper, we discuss about Tamil lyric Analysis on the basis of word usage, rhyme usage and the co-occurrence of word. The frequency of word usage is identified by considering a morphological root of the word using morphological analyser, instead of considering terms. For analysing the frequent rhyme, alliteration and end-rhyme pairs, we adapted Apriori algorithm [4]. To identify the co- occurring concepts in lyrics, we used “Agaraadhi”, an on-line Tamil dictionary Framework [3] and a new algorithm has been proposed to compute the frequent usage of co-occurring concepts in lyrics. The rest of this paper has been organized as follows. In Section 2, we explain about the algorithm and tools. In Section 3, we explain the methodology and in Section 4, we discuss our results. Conclusions and future extensions to this work are presented in section 5. 276
  • 3. 2. MORPHOLOGICAL ANALYSIS AND APRIORI ALGORITHM Morphological analysis is the process of segmenting words into morphemes and identifying its grammatical categories. For a given word, morphological analyser (MA) generates its root word and its grammatical information. The role of morphological analyser in the proposed work is to identify the noun and verb morphology of a given word, examples are as follows Example 1, for noun morphology: இராமைன (Ramanai) இராம (Raman) + ஐ (ai) Entity + Accusative Case Example 2, for verb morphology: ெச றா (Senraan) ெச (sel) + (R) + ஆ (Aan) Verb + Past Tense Marker + Third Person Masculine Singular Suffix In the proposed work, the frequency of a word is identified by considering the variations of a noun in terms of its morphology. For instance, the variations of Ramanai such as Ramanaal, Ramanukku, Ramanin, Ramanadhu are also counted for the word Raman. The Apriori Algorithm is an influential algorithm for mining frequent item sets for Boolean association rules [4]. Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found. It has objective measures: support and confidence. The support of an association pattern is the percentage of task- relevant data transactions for the apparent pattern. Confidence can be defined as the measure of certainty or trustworthiness associated with each discovered pattern. 3. LYRIC ANALYSIS (i) Word Analysis The frequency of words is used to associate a popularity score for each word. This score is proposed to be used for lyric generation part of the frame work proposed in [1]. In this work, the popularity score of a word has been identified from lyrics. In lyrics, the words are mainly attached with the suffix. So, the root words are taken into consideration for determining its frequency count. The root words are identified using morphological analyser. The algorithm to find the word usage is illustrated below: Algorithm Let LD is the set of lyric dataset and LS denotes the set of sentences of lyric dataset and LW denotes the set of words in the lyric dataset. WC denotes the word count across all lyric dataset. Let m be the total number of sentences in lyrics and n be the total number of words in lyrics. 277
  • 4. a) Given a Lyric dataset LD b) For each LSi ← 1 to m Split the sentence LS into words LW c) For each LWj ← 1 to n RW ← ProcessMorphAnalyser(LWj ) d) Let RW be the root word if RW exist, then add into the word count list ( WC) else add into the word count list ( WC) e)ReturnWC. Here ProcessMorphAnalyser(LWj ) returns the root of the given word. (ii) Rhyme Analysis Alliteration (Monai) is the repetition of the same letter at the beginning of words. The rhyme (Edhugai) is defined as the repetition of the same letter at the second position of words. The end rhyme (iyaibu) is defined as the repetition of the same letter at the last position of words. The example of alliteration, rhyme and end rhyme is given below: Examples: உய and உ rhyme in alliteration (monai) as they start with the same letter. இதய and காத rhyme in rhyme (edhugai) as they share the same second letter. யா ைக and வா ைக rhyme in end – rhyme (iyaibu) as they share the same last letter. We have adapted apriori algorithm to find the frequency count of rhyme, alliteration and end rhyme pairs of Tamil lyrics which has been illustrated below: Algorithm: Let LD be the set of lyric dataset and LS denote the set of sentences of Lyric dataset. Let m be the total number of sentences in lyrics. Let PC1 denote the count of alliteration and PC2 denote the count of rhyme and PC3 denote the count of end – rhyme. a) Given lyric dataset LD . b) For each LS ← 1 to m Join the pair of sentences (LP) c) For each LP Consider the first (LP1) and last words (LP2) d) Rhyme (LP1, LP2) e) return PC 278
  • 5. Algorithm : Rhyme (LP1, LP2) a) Let k denote the ith character. Let ML denote the alliteration (monai) list and RL denote the rhyme (edhugai) list and EL denote the end – rhyme (iyaibu) list. b) For ∀i, if k = 1, if LP1(k) = LP2(k), then add into ML list and increment PC1 if k = 2, if LP1(k) = LP2(k), then add into RL list and increment PC2 if k = i - 1, if LP1(k) = LP2(k), then add into EL list and increment PC3 (iii) Co-occurrence concept Analysis Co-occurrence is defining the frequent occurrence of two terms from a text corpus on the either side in a certain order. This word information in NLP system is extremely high. It is very important for cancelling the ambiguous and the polysemy of words to improve the accuracy of the entire system [5]. In this method, to improve the efficiency of co-occurrence, we have been considering the concept of each word. The concept for each word has been identified using the Agaraadhi, an on-line Tamil dictionary. The example for concept word which has been in lyric is given below: Example: The word "நில ”which has the concept ெவ ண லா, மதி, மாத , ைண ேகா , ெவ ணல ,அ லி, அ லிமா . By considering these concepts, we have been determining the co-occurring words using our own algorithm is described below: Algorithm : Let LD denote the set of Lyric dataset and W denote each word in lyric. Let CW denote the set of concepts for each word. Let WC denote the word count. a) If the word CW identify, then consider the next word. Increment the count WC b) Else the word W and consider the next word. Increment the count WC c) return WC 4. RESULTS The lyric corpus of more than two thousand songs were analysed for the word usage, rhyme usage and Co-occurence concepts usage. The analysed results are given below: 279
  • 6. Table 1 shows the list of top 10 usage words in lyrics. WORDS USAGE WORDS USAGE ந 2009 வா 1062 எ 1941 ஒ 987 நா 1645 க 965 உ 1556 857 காத 1153 இ ைல 793 Table 2 shows the list of top 10 rhyme words in lyrics. EDHUGAI USAGE MONAI USAGE IYAIBU USAGE எ ,உ 107975 எ ,எ ைன 33492 எ ,உ 111552 நா ,எ 80125 உ ைன,உ 26289 நா ,எ 83435 நா ,உ 61204 எ த ,எ 16478 நா ,உ 63543 எ ,உ ைன 33731 எ ,எ ன 15405 எ ,உ த 18411 எ ,எ ைன 32570 உ த ,உ 14001 எ த ,எ 16478 உ ைன,உ 25747 உய ,உ 11640 உ த ,உ 14001 எ ைன,உ 24867 இ த,இ 10993 எ த ,உ 12524 நா ,உ ைன 19297 என ,எ 9985 நா ,உ த 10524 நா ,எ ைன 18935 எ த,எ 9976 எ த ,நா 9367 எ ,எ ன 14486 ந,ந 9962 இ த,அ த 4818 Table 3 shows the list of top 10 co-occurring concept words in lyrics. CO-OCCURING WORDS USAGE அ ேப,அ ேப 638 சி ன,சி ன 530 வா,வா 506 எ ,காத 478 ஒேர,ஒ 469 ந ,நா 434 உ ைன,நா 419 தமி ,எ க 367 ஒ ,நா 302 ந,எ ைன 287 280
  • 7. By analysing those data, this shows that the most of lyrics which predicts the emotion of happiness and love. In the adapted apriori algorithm, the support which represents the total number of pair words with the total number of combination of sentences and the confidence which described the total number of pair words with the total number of pair of sentences. The results may vary if the number of lyrics used for the analysis is increased. 5. CONCLUSIONS AND FUTURE WORK In this paper, we discussed the usage of words and rhymes in the lyrics dataset. The adapted apriori algorithm has been used to detect the frequency count for rhyme, alliteration and end word pairs. This analysis has been mainly used in the lyric generation and computing freshness scoring for lyrics. Frequent co-occurring concept is also been identified for the development of lyric extraction, semantic relationship, word sense identification and sentence similarity. Possible extensions of this work could be the detection of emotions in lyrics by genre classification and identify genre specific rhymes and concept co-occurrence. REFERENCES Sowmiya Dharmalingam, Madhan Karky, “LaaLaLaa – A Tamil Lyric Analysis and Generation Framework” in World Classical Tamil Conference – June 2010, Coimbatore. Anandan P, Ranjani Parthasarathy, Geetha, T.V. “Morphological analyzer for Tamil”. ICON 2002. Agaraadhi Online Tamil Dictionary. http://www.agaraadhi.com, Last accessed date 25h April 2011. HAN Feng, ZHANG Shu-mao, DU Ying-shuang, “The analysis and improvement of Apriori algorithm”, Journal of Communication and Computer, ISSN1548-7709, USA, Sep. 2008, Volume 5, No.9 (Serial No.46). EI-Sayed Atlam, Elmarhomy Ghada, Masao Fuketa, Kazuhiro Morita and Jun-ichi Aoe, “New Hierarchy Technique Using Co-Occurrence Word Information”, International Journal of Information Processing and Management, Volume 40 Issue 6, November 2004. 281