SlideShare a Scribd company logo
1 of 6
Download to read offline
Tamil Summary Generation for a Cricket Match
                 J. Jai Hari Raju, P. Indhu Reka, K.K Nandavi, Dr. Madhan Karky
                                  Tamil Computing Lab (TaCoLa),
                     College of Engineering Guindy, Anna University, Chennai.
   jaihari1989@gmail.com, p.indhu@gmail.com, ashwathas@gmail.com, madhankarky@gmail.com



Abstract
Cricket is one of the most followed sports in the Indian subcontinent. There is a wide requirement for
natural language descriptions, which summarize a cricket match effectively. The process of generating
match summaries from statistical data is a manual process. The objective of this paper is to propose a
framework for automatic analysis and summary generation for a cricket match in Tamil, with the
scorecard of the match as the input. Data analytics is performed on the statistical match data, to mine
all frequently occurring patterns. The paper proposes a parameter called Interestingness, which
quantifies the interestingness of the match. The paper also proposes a customization model for the
summary. We propose an evaluation parameter called humanness, which quantifies the similarity
between the output and a manually written summary. Discussing the results and analyzing the
summaries generated for matches based on scorecards, this paper concludes with proposing some
extensions for future developments.

1. Introduction
The number of websites which facilitate people to follow and analyze sports has increased manyfold.
Among them, there are an exceptionally large number of sites devoted to Cricket. Mostly these
involve participation of experts, who present their views and summaries in English about cricket
matches. There are no such sites which provide similar services in Tamil. In this case it is also
desirable if there is an alternative for human creativity. As a solution the paper proposes an
automated Tamil summary generation framework which is capable of analyzing and generating a
Tamil summary about a cricket match, provided the score card as the input. This paper discusses the
overall architecture and implementation details of such a framework.

The large amount of data in this domain makes it possible to apply data mining and data analytics
techniques. The input scorecard is analyzed to construct feature vectors, which are then subjected to
data mining. Based on the various parameters identified, the interestingness of the match is
quantified.

The summary generation part involves extraction of key players and events from a match.
Appropriate sentences are then synthesized to express these selected events. The sentence constructs
and the vocabulary used are chosen based on the linguistic ability specified by the user. Then the
sentences are combined in to a meaningful summary.

The results of the system, i.e. the summaries, are evaluated based on the Humanness parameter. This
parameter gives the degree of similarity between the generated summary and the manually written
summary, with which it is compared. This value helps us decide, the level of creativity achieved by


                                                 271
the system. In section 2 we provide an overview of the literature survey conducted. In section 3 we
discuss the design of the various modules of the framework. In section 4 we discuss the
implementation of the proposed framework and the results obtained from the analysis. Finally we
conclude in section 5 with extensions to the current framework and directions for further studies in
the field of Tamil Summary Generation Systems.

2. Background
In the literature there are existing works on summary generation from statistical data. Alice Oh et al.
generated multiple stories about a single baseball game based on different perspectives using a
reordering algorithm [1]. Ehud Reiter et al. in their book building natural language generation systems
explain the difference between natural language generation and natural language processing and also
describe the various steps involved in the natural language generation process with examples
[2].Jacques Robin et al. presented a system (called STREAK) for summarizing data in natural
language. It focuses on basketball game to design and evaluate the system [3]. L. Bourbeau et al. came
up with the FoG (Forecast Generator) using the streamlined version of the Meaning-Text linguistic
model. This system was capable of generating weather forecasts in both English and French [4].

3. Summary Generation Framework
The Tamil Cricket Summary Generator consists of the following major components:
    •   Data Gathering and Modeling module
    •   Data Mining and Data Analytics module
    •   Summary Generator
    •   Evaluator
    Figure (1) given below depicts the Summary Generation framework.




                         Figure 1: Tamil Cricket Summary Generator Framework

                                                 272
3.1 Data Gathering and Modeling Module

Data gathering is the first step of the system. The data to be gathered is present in internet. This
module has a custom designed parser, for the tag structure of the site. The user must provide the URL
from where the particular match’s data can be obtained. The module checks whether the match has
already been processed. If not the parser parses the page and retrieves the statistical data. The
statistical data is then modeled in the form of the predefined feature vectors.

3.2 Data Mining and Analytics Module

Modified version of Apriori algorithm is used to find the association rules from the feature vectors.
After performing mathematical analysis using correlation of variance (CoV), CoV is plotted against
average to give an idea about how consistent the player is. The interestingness of the match is
calculated based on the weighted average of the scores assigned to the factors identified, they include
the Winning margin, Team history, Individual records made, High run rate, Series state, Relative
position in international ranking, Reaction in social networks etc.

3.3 Summary Generator

The summary generator part of the framework consists of the following sub modules Content
Determiner, Aggregator, Tamil Morphological Generator and Layout Determiner. The events to be
included in the summary are not predefined and are not the same for every match. Based on the
interestingness of the total match, the interestingness of the individual events and the expert level
chosen by the user, particular events are chosen to be included in the summary. The content
determiner is responsible for identifying those facts which are worth mentioning in the summary.

Aggregation of relevant events from other matches in the summary will make it more readable and
interesting. The aggregator performs this function. It chooses events based on their similarity and
coherence and aggregates them with the key events selected in the content determiner module.

As a next step, the sentences used to describe the events are synthesized. The sentence which is the
most apt to the current event under consideration is selected. The vocabulary used in the sentence and
the depth to which an event is discussed is also varied based on the expert level of the user. The nouns
in the key events are passed to the morphological generator along with the desired case endings and
the generated variants are added to the sentences.

The layout determiner module chooses the layout of the summary to be generated. The layout is
varied based on the interestingness of the match. The sentences are aggregated in the fashion of the
layout selected and the final output summary is passed to Evaluator.

3.4 Evaluator

The summary generated by the system is evaluated based on its degree of similarity with human
written summaries. The summaries are compared based on two parameters, the Nouns Mentioned
and the Events Mentioned.

The nouns and the events in the summaries are extracted along with their absolute positions. The
events in the summary are modeled as a set consisting of, one or more Performers (the persons who
takes part in the event), Numeral (the numeric part involved in the event e.g. 4 wickets) and a

                                                  273
Descriptor (the action connecting the Performer and the Numeral). Their absolute positions refer to
the sentence number in which they are mentioned. Then these absolute positions are normalized
based on the total number of sentences present in the summary. Three different scores are calculated
they are,

  ◦   Similarity Score: The ratio of the number of nouns and events mentioned in both the summaries
      to the total number of nouns and events mentioned at least in one summary.

  ◦   Count Score: The ratio of the number of nouns and events mentioned in the system generated
      summary to the number of nouns and events mentioned in the human written summary

  ◦   Closeness Score: The degree of closeness, in terms of the normalized positions of the nouns and
      events mentioned in both the summaries.

A weighted average of these three scores yields the final humanness score.

4. Implementation
To implement the proposed framework, espncricinfo a reliable and prominent site for Cricket data is
chosen as the source of input. The frame work was implemented in java. The URL of the match for
which the summary is to be generated is obtained from the user. The feature vectors designed for
modeling a match are stored as rows with unique identities, in the back end oracle database. The
patterns which are generated as a result of data mining are validated based on the support and
confidence parameters. As a design decision all nouns are stored in English and are translated on the
fly using a constantly updated look up database. This decision was taken to allow interoperability and
easy extension of the system to other languages in future. The sentence pattern files are stored
external to the system, so as to allow modifications without changes in the system. The summary
generated for the match is stored in the back end, indexed with the unique identity assigned already.
The user interface is designed to be simple and robust. It allows the users to search matches based on
various parameters and also to save their preferences.

4.1 Results

Score cards of 90 One Day International matches where retrieved and their summaries were
generated. These include matches between 9 countries. Both individual matches and series were
considered. A large number of hidden patterns in cricket domain have been retrieved based on the
algorithm used. The patterns have been validated and the ones which are interesting have been
reported. The factors contributing to the interestingness of the match have been identified and the
weights associated with them have been found. The consistency of a player has been modelled and
consistency analysis of a player is done to analyse his performance.

The difference in the language used and the events mentioned in the summary is pronounced when
the user opts for an expert level. Similar facts occurring in the past have been identified and added to
the summary. Each summary was compared with two human written summaries, one an expert
summary and other an average summary, their cumulative scores were considered. The humanness
score of the summaries tend to be in the range of 70% to 85%. The recurrence of layouts is also
minimal, which reflects the fact that the summaries generated are not monotonous.


                                                 274
5. Conclusion and future work
In this paper we have proposed the framework for an Automated Tamil Cricket Summary Generator.
The current implementation of the system can be enhanced by adding machine learning capabilities to
make the summaries more human and interesting. The system can be extended to produce summaries
in multiple languages apart from Tamil. The system can be enhanced to generate summaries about
the match in real time. As a next level the system can be modified for summary generation in other
sports too.




                   Figure 2: Screenshot of the Tamil Cricket Summary Generation System


The frame work can be used as a guideline to develop summary generation systems, which can be
applied for any domain where frequent numerical reports are used. (Weather Prediction, Industrial
Quality Testing etc)

References
        Alice Oh and Howard Shrobe, “Generating baseball summaries from multiple perspectives by
        reordering content,” in Proc. 5th International Natural Language Generation Conference,
        2008, pp. 173-176.

        Ehud Reiter and Robert Dale, “Building natural language generation systems,” Cambridge:
        Cambridge University Press, 2000.

        Jacques Robin and Kathleen McKeown, “Empirically Designing and Evaluating a New
        Revision-Based Model for Summary Generation,” Department of Computer Science,
        Columbia University, 1996, vol. 85, pp.135-179.

        L. Bourbeau, D. Carcagno, E. Goldberg, R. Kittredge and A. Polguere. “Bilingual generation of
        weather forecasts in an operations environment,” In Proc. 13th International Conference on
        Computational Linguistics, Helsinki University, Finland, 1990. COLING



                                                  275

More Related Content

What's hot

Group 04 te_a_mini project_ report
Group 04 te_a_mini project_ reportGroup 04 te_a_mini project_ report
Group 04 te_a_mini project_ reportAdityaSingh1722
 
Context Sensitive Search String Composition Algorithm using User Intention to...
Context Sensitive Search String Composition Algorithm using User Intention to...Context Sensitive Search String Composition Algorithm using User Intention to...
Context Sensitive Search String Composition Algorithm using User Intention to...IJECEIAES
 
IRJET- Missing Value Evaluation in SQL Queries: A Survey
IRJET- 	  Missing Value Evaluation in SQL Queries: A SurveyIRJET- 	  Missing Value Evaluation in SQL Queries: A Survey
IRJET- Missing Value Evaluation in SQL Queries: A SurveyIRJET Journal
 
Tweet Summarization and Segmentation: A Survey
Tweet Summarization and Segmentation: A SurveyTweet Summarization and Segmentation: A Survey
Tweet Summarization and Segmentation: A Surveyvivatechijri
 
Multi Similarity Measure based Result Merging Strategies in Meta Search Engine
Multi Similarity Measure based Result Merging Strategies in Meta Search EngineMulti Similarity Measure based Result Merging Strategies in Meta Search Engine
Multi Similarity Measure based Result Merging Strategies in Meta Search EngineIDES Editor
 
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET-  	  Review on Information Retrieval for Desktop Search EngineIRJET-  	  Review on Information Retrieval for Desktop Search Engine
IRJET- Review on Information Retrieval for Desktop Search EngineIRJET Journal
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics DomainDrjabez
 
IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class ...
IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class ...IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class ...
IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class ...IRJET Journal
 
2014 USA
2014 USA2014 USA
2014 USALI HE
 
IRJET - Sentiment Analysis of Posts and Comments of OSN
IRJET -  	  Sentiment Analysis of Posts and Comments of OSNIRJET -  	  Sentiment Analysis of Posts and Comments of OSN
IRJET - Sentiment Analysis of Posts and Comments of OSNIRJET Journal
 
Ontological approach for improving semantic web search results
Ontological approach for improving semantic web search resultsOntological approach for improving semantic web search results
Ontological approach for improving semantic web search resultseSAT Publishing House
 
Enhancing the performance of cluster based text summarization using support v...
Enhancing the performance of cluster based text summarization using support v...Enhancing the performance of cluster based text summarization using support v...
Enhancing the performance of cluster based text summarization using support v...eSAT Journals
 
Graph based forcasting for social network
Graph based forcasting for social networkGraph based forcasting for social network
Graph based forcasting for social networkAshenafi Workie
 
Business recommendation based on collaborative filtering and feature engineer...
Business recommendation based on collaborative filtering and feature engineer...Business recommendation based on collaborative filtering and feature engineer...
Business recommendation based on collaborative filtering and feature engineer...IJECEIAES
 
Web Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering AnalysisWeb Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering Analysisinventy
 
IRJET- Personality Prediction System using AI
IRJET- Personality Prediction System using AIIRJET- Personality Prediction System using AI
IRJET- Personality Prediction System using AIIRJET Journal
 
Incentive Compatible Privacy Preserving Data Analysis
Incentive Compatible Privacy Preserving Data AnalysisIncentive Compatible Privacy Preserving Data Analysis
Incentive Compatible Privacy Preserving Data Analysisrupasri mupparthi
 

What's hot (19)

Query expansion
Query expansionQuery expansion
Query expansion
 
Group 04 te_a_mini project_ report
Group 04 te_a_mini project_ reportGroup 04 te_a_mini project_ report
Group 04 te_a_mini project_ report
 
Context Sensitive Search String Composition Algorithm using User Intention to...
Context Sensitive Search String Composition Algorithm using User Intention to...Context Sensitive Search String Composition Algorithm using User Intention to...
Context Sensitive Search String Composition Algorithm using User Intention to...
 
IRJET- Missing Value Evaluation in SQL Queries: A Survey
IRJET- 	  Missing Value Evaluation in SQL Queries: A SurveyIRJET- 	  Missing Value Evaluation in SQL Queries: A Survey
IRJET- Missing Value Evaluation in SQL Queries: A Survey
 
Tweet Summarization and Segmentation: A Survey
Tweet Summarization and Segmentation: A SurveyTweet Summarization and Segmentation: A Survey
Tweet Summarization and Segmentation: A Survey
 
Multi Similarity Measure based Result Merging Strategies in Meta Search Engine
Multi Similarity Measure based Result Merging Strategies in Meta Search EngineMulti Similarity Measure based Result Merging Strategies in Meta Search Engine
Multi Similarity Measure based Result Merging Strategies in Meta Search Engine
 
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET-  	  Review on Information Retrieval for Desktop Search EngineIRJET-  	  Review on Information Retrieval for Desktop Search Engine
IRJET- Review on Information Retrieval for Desktop Search Engine
 
Naresh sharma
Naresh sharmaNaresh sharma
Naresh sharma
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class ...
IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class ...IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class ...
IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class ...
 
2014 USA
2014 USA2014 USA
2014 USA
 
IRJET - Sentiment Analysis of Posts and Comments of OSN
IRJET -  	  Sentiment Analysis of Posts and Comments of OSNIRJET -  	  Sentiment Analysis of Posts and Comments of OSN
IRJET - Sentiment Analysis of Posts and Comments of OSN
 
Ontological approach for improving semantic web search results
Ontological approach for improving semantic web search resultsOntological approach for improving semantic web search results
Ontological approach for improving semantic web search results
 
Enhancing the performance of cluster based text summarization using support v...
Enhancing the performance of cluster based text summarization using support v...Enhancing the performance of cluster based text summarization using support v...
Enhancing the performance of cluster based text summarization using support v...
 
Graph based forcasting for social network
Graph based forcasting for social networkGraph based forcasting for social network
Graph based forcasting for social network
 
Business recommendation based on collaborative filtering and feature engineer...
Business recommendation based on collaborative filtering and feature engineer...Business recommendation based on collaborative filtering and feature engineer...
Business recommendation based on collaborative filtering and feature engineer...
 
Web Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering AnalysisWeb Based Fuzzy Clustering Analysis
Web Based Fuzzy Clustering Analysis
 
IRJET- Personality Prediction System using AI
IRJET- Personality Prediction System using AIIRJET- Personality Prediction System using AI
IRJET- Personality Prediction System using AI
 
Incentive Compatible Privacy Preserving Data Analysis
Incentive Compatible Privacy Preserving Data AnalysisIncentive Compatible Privacy Preserving Data Analysis
Incentive Compatible Privacy Preserving Data Analysis
 

Viewers also liked (9)

I1 geetha3 revathi
I1 geetha3 revathiI1 geetha3 revathi
I1 geetha3 revathi
 
I5 geetha4 suraiya
I5 geetha4 suraiyaI5 geetha4 suraiya
I5 geetha4 suraiya
 
Hari tamil-complete details
Hari tamil-complete detailsHari tamil-complete details
Hari tamil-complete details
 
I4 madankarky3 subalalitha
I4 madankarky3 subalalithaI4 madankarky3 subalalitha
I4 madankarky3 subalalitha
 
C2 mala2 janani
C2 mala2 jananiC2 mala2 janani
C2 mala2 janani
 
I6 mala3 sowmya
I6 mala3 sowmyaI6 mala3 sowmya
I6 mala3 sowmya
 
H4 neelavathy
H4 neelavathyH4 neelavathy
H4 neelavathy
 
I3 madankarky2 karthika
I3 madankarky2 karthikaI3 madankarky2 karthika
I3 madankarky2 karthika
 
10 11 ry1-g
10 11 ry1-g10 11 ry1-g
10 11 ry1-g
 

Similar to I2 madankarky1 jharibabu

Co-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsCo-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsEditor IJCATR
 
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...journalBEEI
 
Query Sensitive Comparative Summarization of Search Results Using Concept Bas...
Query Sensitive Comparative Summarization of Search Results Using Concept Bas...Query Sensitive Comparative Summarization of Search Results Using Concept Bas...
Query Sensitive Comparative Summarization of Search Results Using Concept Bas...CSEIJJournal
 
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...cseij
 
An effective citation metadata extraction process based on BibPro parser
An effective citation metadata extraction process based on BibPro parserAn effective citation metadata extraction process based on BibPro parser
An effective citation metadata extraction process based on BibPro parserIOSR Journals
 
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...IRJET Journal
 
Cricket Score and Winning Prediction
Cricket Score and Winning PredictionCricket Score and Winning Prediction
Cricket Score and Winning PredictionIRJET Journal
 
COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...
COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...
COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...IJCSEA Journal
 
COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...
COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...
COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...IJCSEA Journal
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Miningiosrjce
 
An effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded contentAn effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded contentijdpsjournal
 
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...cscpconf
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithmijnlc
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithmkevig
 
META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON...
META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON...META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON...
META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON...ijcseit
 
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State UniversityLSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State Universitydhabalia
 
Sentimental Analysis For Electronic Product Review
Sentimental Analysis For Electronic Product ReviewSentimental Analysis For Electronic Product Review
Sentimental Analysis For Electronic Product ReviewIRJET Journal
 
Elg 5100 project report anurag & jayanshu
Elg 5100 project report   anurag & jayanshuElg 5100 project report   anurag & jayanshu
Elg 5100 project report anurag & jayanshuAnurag Das
 

Similar to I2 madankarky1 jharibabu (20)

Co-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsCo-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online Reviews
 
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
 
Query Sensitive Comparative Summarization of Search Results Using Concept Bas...
Query Sensitive Comparative Summarization of Search Results Using Concept Bas...Query Sensitive Comparative Summarization of Search Results Using Concept Bas...
Query Sensitive Comparative Summarization of Search Results Using Concept Bas...
 
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...
QUERY SENSITIVE COMPARATIVE SUMMARIZATION OF SEARCH RESULTS USING CONCEPT BAS...
 
An effective citation metadata extraction process based on BibPro parser
An effective citation metadata extraction process based on BibPro parserAn effective citation metadata extraction process based on BibPro parser
An effective citation metadata extraction process based on BibPro parser
 
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...
IRJET- Sentimental Prediction of Users Perspective through Live Streaming : T...
 
Cricket Score and Winning Prediction
Cricket Score and Winning PredictionCricket Score and Winning Prediction
Cricket Score and Winning Prediction
 
COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...
COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...
COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...
 
COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...
COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...
COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Mining
 
E017252831
E017252831E017252831
E017252831
 
An effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded contentAn effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded content
 
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
 
Cricket 2
Cricket 2Cricket 2
Cricket 2
 
META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON...
META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON...META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON...
META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON...
 
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State UniversityLSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
 
Sentimental Analysis For Electronic Product Review
Sentimental Analysis For Electronic Product ReviewSentimental Analysis For Electronic Product Review
Sentimental Analysis For Electronic Product Review
 
Elg 5100 project report anurag & jayanshu
Elg 5100 project report   anurag & jayanshuElg 5100 project report   anurag & jayanshu
Elg 5100 project report anurag & jayanshu
 

More from Jasline Presilda (20)

H3 anuraj
H3 anurajH3 anuraj
H3 anuraj
 
H2 gunasekaran
H2 gunasekaranH2 gunasekaran
H2 gunasekaran
 
H1 iniya nehru
H1 iniya nehruH1 iniya nehru
H1 iniya nehru
 
G3 chandrakala
G3 chandrakalaG3 chandrakala
G3 chandrakala
 
G2 selvakumar
G2 selvakumarG2 selvakumar
G2 selvakumar
 
G1 nmurugaiyan
G1 nmurugaiyanG1 nmurugaiyan
G1 nmurugaiyan
 
Front matter
Front matterFront matter
Front matter
 
F2 pvairam sarathy
F2 pvairam sarathyF2 pvairam sarathy
F2 pvairam sarathy
 
F1 ferdinjoe
F1 ferdinjoeF1 ferdinjoe
F1 ferdinjoe
 
Emerging
EmergingEmerging
Emerging
 
E3 ilangkumaran
E3 ilangkumaranE3 ilangkumaran
E3 ilangkumaran
 
E2 tamilselvan
E2 tamilselvanE2 tamilselvan
E2 tamilselvan
 
E1 geetha2 karthikeyan
E1 geetha2 karthikeyanE1 geetha2 karthikeyan
E1 geetha2 karthikeyan
 
D5 radha chellappan
D5 radha chellappanD5 radha chellappan
D5 radha chellappan
 
D4 sundaram
D4 sundaramD4 sundaram
D4 sundaram
 
D3 dhanalakshmi
D3 dhanalakshmiD3 dhanalakshmi
D3 dhanalakshmi
 
D2 anandkumar
D2 anandkumarD2 anandkumar
D2 anandkumar
 
D1 singaravelu
D1 singaraveluD1 singaravelu
D1 singaravelu
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
C8 akumaran
C8 akumaranC8 akumaran
C8 akumaran
 

Recently uploaded

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 

Recently uploaded (20)

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

I2 madankarky1 jharibabu

  • 1.
  • 2. Tamil Summary Generation for a Cricket Match J. Jai Hari Raju, P. Indhu Reka, K.K Nandavi, Dr. Madhan Karky Tamil Computing Lab (TaCoLa), College of Engineering Guindy, Anna University, Chennai. jaihari1989@gmail.com, p.indhu@gmail.com, ashwathas@gmail.com, madhankarky@gmail.com Abstract Cricket is one of the most followed sports in the Indian subcontinent. There is a wide requirement for natural language descriptions, which summarize a cricket match effectively. The process of generating match summaries from statistical data is a manual process. The objective of this paper is to propose a framework for automatic analysis and summary generation for a cricket match in Tamil, with the scorecard of the match as the input. Data analytics is performed on the statistical match data, to mine all frequently occurring patterns. The paper proposes a parameter called Interestingness, which quantifies the interestingness of the match. The paper also proposes a customization model for the summary. We propose an evaluation parameter called humanness, which quantifies the similarity between the output and a manually written summary. Discussing the results and analyzing the summaries generated for matches based on scorecards, this paper concludes with proposing some extensions for future developments. 1. Introduction The number of websites which facilitate people to follow and analyze sports has increased manyfold. Among them, there are an exceptionally large number of sites devoted to Cricket. Mostly these involve participation of experts, who present their views and summaries in English about cricket matches. There are no such sites which provide similar services in Tamil. In this case it is also desirable if there is an alternative for human creativity. As a solution the paper proposes an automated Tamil summary generation framework which is capable of analyzing and generating a Tamil summary about a cricket match, provided the score card as the input. This paper discusses the overall architecture and implementation details of such a framework. The large amount of data in this domain makes it possible to apply data mining and data analytics techniques. The input scorecard is analyzed to construct feature vectors, which are then subjected to data mining. Based on the various parameters identified, the interestingness of the match is quantified. The summary generation part involves extraction of key players and events from a match. Appropriate sentences are then synthesized to express these selected events. The sentence constructs and the vocabulary used are chosen based on the linguistic ability specified by the user. Then the sentences are combined in to a meaningful summary. The results of the system, i.e. the summaries, are evaluated based on the Humanness parameter. This parameter gives the degree of similarity between the generated summary and the manually written summary, with which it is compared. This value helps us decide, the level of creativity achieved by 271
  • 3. the system. In section 2 we provide an overview of the literature survey conducted. In section 3 we discuss the design of the various modules of the framework. In section 4 we discuss the implementation of the proposed framework and the results obtained from the analysis. Finally we conclude in section 5 with extensions to the current framework and directions for further studies in the field of Tamil Summary Generation Systems. 2. Background In the literature there are existing works on summary generation from statistical data. Alice Oh et al. generated multiple stories about a single baseball game based on different perspectives using a reordering algorithm [1]. Ehud Reiter et al. in their book building natural language generation systems explain the difference between natural language generation and natural language processing and also describe the various steps involved in the natural language generation process with examples [2].Jacques Robin et al. presented a system (called STREAK) for summarizing data in natural language. It focuses on basketball game to design and evaluate the system [3]. L. Bourbeau et al. came up with the FoG (Forecast Generator) using the streamlined version of the Meaning-Text linguistic model. This system was capable of generating weather forecasts in both English and French [4]. 3. Summary Generation Framework The Tamil Cricket Summary Generator consists of the following major components: • Data Gathering and Modeling module • Data Mining and Data Analytics module • Summary Generator • Evaluator Figure (1) given below depicts the Summary Generation framework. Figure 1: Tamil Cricket Summary Generator Framework 272
  • 4. 3.1 Data Gathering and Modeling Module Data gathering is the first step of the system. The data to be gathered is present in internet. This module has a custom designed parser, for the tag structure of the site. The user must provide the URL from where the particular match’s data can be obtained. The module checks whether the match has already been processed. If not the parser parses the page and retrieves the statistical data. The statistical data is then modeled in the form of the predefined feature vectors. 3.2 Data Mining and Analytics Module Modified version of Apriori algorithm is used to find the association rules from the feature vectors. After performing mathematical analysis using correlation of variance (CoV), CoV is plotted against average to give an idea about how consistent the player is. The interestingness of the match is calculated based on the weighted average of the scores assigned to the factors identified, they include the Winning margin, Team history, Individual records made, High run rate, Series state, Relative position in international ranking, Reaction in social networks etc. 3.3 Summary Generator The summary generator part of the framework consists of the following sub modules Content Determiner, Aggregator, Tamil Morphological Generator and Layout Determiner. The events to be included in the summary are not predefined and are not the same for every match. Based on the interestingness of the total match, the interestingness of the individual events and the expert level chosen by the user, particular events are chosen to be included in the summary. The content determiner is responsible for identifying those facts which are worth mentioning in the summary. Aggregation of relevant events from other matches in the summary will make it more readable and interesting. The aggregator performs this function. It chooses events based on their similarity and coherence and aggregates them with the key events selected in the content determiner module. As a next step, the sentences used to describe the events are synthesized. The sentence which is the most apt to the current event under consideration is selected. The vocabulary used in the sentence and the depth to which an event is discussed is also varied based on the expert level of the user. The nouns in the key events are passed to the morphological generator along with the desired case endings and the generated variants are added to the sentences. The layout determiner module chooses the layout of the summary to be generated. The layout is varied based on the interestingness of the match. The sentences are aggregated in the fashion of the layout selected and the final output summary is passed to Evaluator. 3.4 Evaluator The summary generated by the system is evaluated based on its degree of similarity with human written summaries. The summaries are compared based on two parameters, the Nouns Mentioned and the Events Mentioned. The nouns and the events in the summaries are extracted along with their absolute positions. The events in the summary are modeled as a set consisting of, one or more Performers (the persons who takes part in the event), Numeral (the numeric part involved in the event e.g. 4 wickets) and a 273
  • 5. Descriptor (the action connecting the Performer and the Numeral). Their absolute positions refer to the sentence number in which they are mentioned. Then these absolute positions are normalized based on the total number of sentences present in the summary. Three different scores are calculated they are, ◦ Similarity Score: The ratio of the number of nouns and events mentioned in both the summaries to the total number of nouns and events mentioned at least in one summary. ◦ Count Score: The ratio of the number of nouns and events mentioned in the system generated summary to the number of nouns and events mentioned in the human written summary ◦ Closeness Score: The degree of closeness, in terms of the normalized positions of the nouns and events mentioned in both the summaries. A weighted average of these three scores yields the final humanness score. 4. Implementation To implement the proposed framework, espncricinfo a reliable and prominent site for Cricket data is chosen as the source of input. The frame work was implemented in java. The URL of the match for which the summary is to be generated is obtained from the user. The feature vectors designed for modeling a match are stored as rows with unique identities, in the back end oracle database. The patterns which are generated as a result of data mining are validated based on the support and confidence parameters. As a design decision all nouns are stored in English and are translated on the fly using a constantly updated look up database. This decision was taken to allow interoperability and easy extension of the system to other languages in future. The sentence pattern files are stored external to the system, so as to allow modifications without changes in the system. The summary generated for the match is stored in the back end, indexed with the unique identity assigned already. The user interface is designed to be simple and robust. It allows the users to search matches based on various parameters and also to save their preferences. 4.1 Results Score cards of 90 One Day International matches where retrieved and their summaries were generated. These include matches between 9 countries. Both individual matches and series were considered. A large number of hidden patterns in cricket domain have been retrieved based on the algorithm used. The patterns have been validated and the ones which are interesting have been reported. The factors contributing to the interestingness of the match have been identified and the weights associated with them have been found. The consistency of a player has been modelled and consistency analysis of a player is done to analyse his performance. The difference in the language used and the events mentioned in the summary is pronounced when the user opts for an expert level. Similar facts occurring in the past have been identified and added to the summary. Each summary was compared with two human written summaries, one an expert summary and other an average summary, their cumulative scores were considered. The humanness score of the summaries tend to be in the range of 70% to 85%. The recurrence of layouts is also minimal, which reflects the fact that the summaries generated are not monotonous. 274
  • 6. 5. Conclusion and future work In this paper we have proposed the framework for an Automated Tamil Cricket Summary Generator. The current implementation of the system can be enhanced by adding machine learning capabilities to make the summaries more human and interesting. The system can be extended to produce summaries in multiple languages apart from Tamil. The system can be enhanced to generate summaries about the match in real time. As a next level the system can be modified for summary generation in other sports too. Figure 2: Screenshot of the Tamil Cricket Summary Generation System The frame work can be used as a guideline to develop summary generation systems, which can be applied for any domain where frequent numerical reports are used. (Weather Prediction, Industrial Quality Testing etc) References Alice Oh and Howard Shrobe, “Generating baseball summaries from multiple perspectives by reordering content,” in Proc. 5th International Natural Language Generation Conference, 2008, pp. 173-176. Ehud Reiter and Robert Dale, “Building natural language generation systems,” Cambridge: Cambridge University Press, 2000. Jacques Robin and Kathleen McKeown, “Empirically Designing and Evaluating a New Revision-Based Model for Summary Generation,” Department of Computer Science, Columbia University, 1996, vol. 85, pp.135-179. L. Bourbeau, D. Carcagno, E. Goldberg, R. Kittredge and A. Polguere. “Bilingual generation of weather forecasts in an operations environment,” In Proc. 13th International Conference on Computational Linguistics, Helsinki University, Finland, 1990. COLING 275