Automatic Summarization of Multiple Travel Blog Entries Focusing on Travelers’ Behavior
1. ENTER 2018 Research Track Slide Number 1
Automatic Summarization of
Multiple Travel Blog Entries
Focusing on Travelers’ Behavior
Shumpei Iinuma
* Hidetugu Nanba
Toshiyuki Takezawa
Hiroshima City University, JAPAN
nanba@Hiroshima-cu.ac.jp
http://www.ls.info.Hiroshima-cu.ac.jp/~nanba
2. ENTER 2018 Research Track Slide Number 2
Purpose
• To generate a summary of multiple travel
blog entries.
• Our method identifies significant sentences
in addition to the images.
4. ENTER 2018 Research Track Slide Number 4
Related work
Travel Information Recommendation
•[Wu 08]: selects and shows medias for each
type of query (What is the historical
background of Tian Tan?)->history category
-> text information
•[Hao 10]: shows representative tags and
snippets for a given destination
We generate a text summary with images
5. ENTER 2018 Research Track Slide Number 5
Our Summarization System
Input: geographical region and content type
1.Cluster blog entries
2.Calculate the importance of each sentence
and image for each cluster
3.Select three to five important sentences
and images for each cluster
6. ENTER 2018 Research Track Slide Number 6
Identification of Content Type of
Each Blog Entry (Fujii+2016)
Content type Criterion
Watch Sightseeing for watching enjoyment
Experience Experience (scuba diving, dance)
Buy Shopping or souvenir stores
Dine Drinking and dining
Stay Accommodation
7. ENTER 2018 Research Track Slide Number 7
LexRank [Erkan 04]
PageRank-base text
summarization
8. ENTER 2018 Research Track Slide Number 8
x1 x2
x3
P1
P2
P3
=
3
2
1
3
2
1
012/1
002/1
100
P
P
P
P
P
P
1321 =++ PPP
=
5/2
5/1
5/2
3
2
1
P
P
P
10. ENTER 2018 Research Track Slide Number 10
−+
=
3
2
1
3
2
1
002/1
002/1
100
)1(
3/13/13/1
3/13/13/1
3/13/13/1
P
P
P
dd
P
P
P
1321 =++ PPP
11. ENTER 2018 Research Track Slide Number 11
LexRank [Erkan 04]
• Make a graph by
connecting sentences,
whose similarity scores are
higher than a threshold
value
• Then, apply PageRank
algorithm to this graph,
and select important
sentences.
12. ENTER 2018 Research Track Slide Number 12
• Make a graph by
connecting images,
whose similarity scores
are higher than a
threshold value
• Then, apply PageRank
algorithm to this graph,
and select important
images.
13. ENTER 2018 Research Track Slide Number 13
• Connect both text
and image graphs
• Then, apply PageRank
algorithm to this
graph, and select
important sentences
and images.
14. ENTER 2018 Research Track Slide Number 14
Similarity between items
• Similarity between sentences:
tfidf
Cosine distance
• Similarity between images:
Color histogram (HSV)
Bag of Visual Words: SIFT
Cosine distance
15. ENTER 2018 Research Track Slide Number 15
Summarization taking account of
content types
• Watch: view, beautiful, park…
• Dine: tasty, noodle…
16. ENTER 2018 Research Track Slide Number 16
−+
=
3
2
1
3
2
1
002/1
002/1
100
)1(
3/13/13/1
3/13/13/1
3/13/13/1
P
P
P
dd
P
P
P
1321 =++ PPP
17. ENTER 2018 Research Track Slide Number 17
• a
A sentence that have a strong
relationship with a given
content type
18. ENTER 2018 Research Track Slide Number 18
Experiments
Data
•Manually created summaries for 20 spots.
Evaluation measure
•ROUGE-N
•Ranking (MANUAL-TEXT, MANUAL-IMAGE-TEXT)
Alternatives
•Lead (baserline)
•LexRank (baseline)
•LR+IMG
•LR+IMG+TYPE
•LR+IMG+TYPE
19. ENTER 2018 Research Track Slide Number 19
Evaluation by ROUGE-N
(automatic evaluation)
ROUGE-1 ROUGE-2
LexRank (baseline) 0.316 0.207
IR+IMG 0.331 0.227
LR+TYPE 0.345 0.240
IR+IMG+TYPE 0.340 0.237
20. ENTER 2018 Research Track Slide Number 20
Manual Evaluation
MANUAL-TEXT
Human-produced 1.28
Lead (baseline) 4.01
LexRank (baseline) 3.09
LR+IMG 2.85
LR+TYPE 3.22
IR+IMG+TYPE 2.99
21. ENTER 2018 Research Track Slide Number 24
Conclusions
• We propose a method of summarizing multiple
travel blog entries.
• By connecting a text and an image graph->
content+image was improved
• By taking account of content type -> more accurate
content type-biased summary
• We also constructed a summarization system
• http://www.ls.info.Hiroshima-cu.ac.jp/blogMap/