This document presents a graphical model for generating sports news snippets from large collections of sports commentary text. The model creates an entity graph by extracting named entities like teams, players, and stadiums from the commentary and commentary database. It then performs a depth-first search of the graph and synthesizes a 3-sentence summary answering predefined questions like score, goals, and winner based on the extracted information. Evaluation shows the entity graph model generates more coherent and readable summaries compared to supervised and extractive baselines, though it has constraints like requiring sufficient domain knowledge in the commentary.
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
A graphical model for football story snippet synthesis
1. A Graphical Model for Football Story Snippet
Synthesis from Large Scale Commentary
(Paper ID: 179)
Anirudh Vyas, Sangram Gaikwad*, Chiranjoy
Chattopadhyay**
* Tata Consultancy Services, Research Lab, Kolkata
** Indian Institute of Technology Jodhpur
7th International Conference on Pattern
Recognition and Machine Intelligence
1
3. Motivation
1. GBs of data in live form of
commentary text
2. Patterns in the news
reported by journalists
3
4. Problem Statement
• Generate sports news snippets from sports commentary
scripts that is at par with the news reported by sports
journalists.
• Input: A large collection text based commentary on
football for a particular game
• Output: A Story snippet for the same match
4
5. Data
Collection
Web Scraping from
ESPN, CricInfo &
Data Cleaning
Feature
Generation
Similarity with
Previous & Next
sentences; TFIDF;
Position
Training
Model
Supervised Machine
Learning using
Random Forest
Summary
Generation
Select top 10
sentences according
to Rank predicted
Results
Evaluate summary
using Precision,
Recall and F-Score
Known Results
5
6. 6
ISSUES
•Summary not coherent
•Poor human readability scores
•Extractive summary so a lot of useless information in the
summary
•Redundant information
8. Referencing Database
• Stanford NER failed
• Used open-football database* to reference names of
teams, players and stadiums
• Crucial for graph creation
ORGANISATION
PERSON
* https://openfootball.github.io/
the newly-named Estadio de la Ceramica and it is a real cracker of an affair as an
exciting Villarreal team
8
9. SYNTHESIS EXAMPLE
Villarreal tied the match against Barcelona in the la liga league with 1 goal(s)
each. Villarreal was playing at home in their stadium Estadio de la Ceramica.
The score at half time was Villarreal 0-0 Barcelona. Sansone scored a
magnificent goal for Villarreal at 50 min. Messi scored a magnificent goal for
Barcelona at 90 min. The score at the end of the match was Villarreal 1-1
Barcelona.
Query: Which Player scored which Goal and what Time?
9
10. TRAVERSAL AND SYNTHESIS ALGORITHM
1. Depth First Search
2. Query based searching
1. What league or tournament is the match being played in?
2. Which 2 teams are playing the match?
3. Who won the match?
4. What was the score at half-time and full-time?
3. Snippet formats based on the amount of information
obtained in the graph
4. Summary obtained by filling in information in the snippet
format 10
11. Constraints of the Model
• Requires sufficient domain knowledge
• The commentary should have answers to all the questions
• For missing data this model will generate incomplete
summary
• Variabilty in terms of named entity is not considered
• E.g. Ronaldo ←→ CR7
• Barcelona ← → Barca, etc.
Entity graph is scripted. but if there is dynamic content then graph
creation must be dynamic
11
12. EVALUATION & RESULTS
ROUGE Metric
MODEL RECALL PRECISION F-SCORE
Graph Based 0.325 0.512 0.362
Lex Rank 0.451 0.371 0.398
Supervised 0.303 0.492 0.358
12
Gold Standard Supervised / NER Entity Graph
Human Readability 4.54 2.50 4.10
Syntax Correctness 4.07 1.77 3.85
Content 4.95 1.50 3.50
13. Conclusion
1. Propose a graphical method to synthesize story snippet
from football match commentaries
2. Qualitative and quantitative analysis is shown
3. Snippets are coherent and score high on human
readability test
4. Useful for video shot tagging, indexing (future scope)
13
14. Reference
1. Bouayad-Agha, N., Casamayor, G., Mille, S., Wanner, L.: Perspective-
oriented generation of football match summaries: Old tasks, new
challenges. ACM Trans. Speech Lang. Process. 9(2), 3:1–3:31 (2012)
2. Ganesan, K., Zhai, C., Viegas, E.: Micropinion generation: an unsupervised
approach to generating ultra-concise summaries of opinions. In:
Proceedings of the 21st international conference on World Wide Web. pp.
869–878. ACM (2012)
3. Nichols, J., Mahmud, J., Drews, C.: Summarizing sporting events using
twitter. In: Proceedings of the 2012 ACM international conference on
Intelligent User Interfaces. pp. 189–198. ACM (2012)
4. Zhang, J., Yao, J.g., Wan, X.: Toward constructing sports news from live
text commentary. In: Proceedings of ACL (2016)
5. Gupta, V., Lehal, G.S.: A survey of text summarization extractive
techniques. Journal of emerging technologies in web intelligence 2(3),
258–268 (2010)
14