Tag-based Approaches to Sharing Background Information regarding Social Problems towards Facilitating Public Collaboration

Tag-based Approaches to Sharing
Background Information regarding Social
Problems towards Facilitating Public
Collaboration
Masaru Watanabe,
Shun Shiramatsu, Yasuaki Goto
Nagoya Institute of Technology
1

Outline
1. Background and Goal
2. Automatic annotation
1. Generate tags
1. Filtering
2. SVM
2. Annotate tags
1. TF-IDF
2. Paragraph Vector
3. Prototype of API
3. Systems for sharing collaborative activities
4. Conclusion
2

Background
CivicTech :
Citizens and IT engineers cooperate to solve social
problems
 Hackathons are frequently held.
 When participants discuss the solutions of social
problems, they need to share background
regarding the problems
3
Goal： Sharing background information about social problems

Our Approaches
4
1. Automatic annotation to web articles with social
problem tags
 If articles have tags of social problems, these articles
can be found easily as background information of the
problems.
 By making the activities in the organization open data,
citizen collaboration across the organization is
promoted.
Goal： Sharing background information about social problems

Outline
1. Generate tags
1. Filtering
2. SVM
2. Annotate tags
1. TF-IDF
2. Paragraph Vector
3. Prototype of API
4. Conclusion
5

Tag-based Search
6
Disaster
Global
Warming
Hunger
Click
Articles about
"Global Warming"
Global
WarmingDiscussion
about
solutions of
global
warming

Knowledge Connector
Site that can share works
such as ideas, applications,
datas
 Tag-based search is
supported
7
 Users often forget to annotate or
do not understand the necessity of annotation
 Orthographical variants of tags
http://idea.linkdata.org/

Our Solution
8
 Orthographic variants of tags
  Automatic annotation with social problem tags
  Automatic generation of a tagset in advance

System Architecture (Repeated)
10

Generate Tags
Requirements for generating tags
 Hierarchical structure
 Because exploratory browsing of related problems
promotes understanding of background information
 Sufficient amount of social problem tags
11
"Social problem" category of DBpedia Japanese
DBpedia Japanese：
well-known linked open dataset that is converted from Wikipedia.
 Some articles in the category are
unrelated to "social problem"

Filtering to
exclude inappropriate tags
12
Extract page title from "Social Problem"
Category and its sub categories
(within n hierarchical levels).
Filtering noisy resources by tracing other
particular categories.

Categories used for filtering
Filter A
Stub Category,
Computer Science,
Judgment, Work, Social
Movement Organization,
People, Biology Field，
Criminal Studies, Crime
type,
Peace Studies, Logic
13
Filter B
almost the same as
Filter A, except
"Biological Field" is
excluded.

Evaluation method (Filtering)
Recall
Six participants selected 102
pages that relate social
problem from Japanese
Wikipedia
Precision
Select 100 tags randomly
from the tag list
14
Calculated the
percentage of these
items that were included
in the tag list
Ask 25 participants to evaluate
whether these data were social
problems on a five-point scale.
Calculate the percentage of
regarded tags.
(more than three on the scale
were regarded)

Evaluation
(Tag Generation by Filtering)
15
The method with Filter B and 2 hierarchical levels has best balance.
recall : 43%
precition : 49%

Filter based on SVM
Dataset
Pages belonging to a lower category within three hierarchical
levels of "Category: Social problem"
Feature vector used
a. category page that can reach within 5 hierarchical levels
from any one of the acquired pages, the occurrence
frequency is 9 or more
b. Total of distributed representation vectors of words
(word2vec) included in each page title
c. Distributed representation vector of the full text of each
page(doc2vec)
d. Mixing a. and c.
16

Evaluation method(SVM)
10-fold cross validation test
Both positive and negative examples used 120 cases
 Use the results that obtained when evaluate the
precision of filtering method
17
Recall
percentage of examples
categorized into the positive
class among the positive
examples
Precision
percentage of the positive
examples among examples
categorized into the positive
class

Evaluation
(Tag Generation by SVM)
18
Filtering methods : recall is 43%
precision is 49%
79.2
68.8
90.8
50.7
73.6

19

Annotate Tags
Calculate Cos similarity between target article and all
Wikipedia articles with title of tag name.
When the similarity is equal to or higher than the
threshold, the title is set as the tag to be attached.
Two methods are used for vector generation.
1. TF-IDF
2. Paragraph Vector
20

Evaluation method (Annotate)
Measure Cos similarity with each method for 10
articles on social problems collected in advance.
21
Evaluate the validity of the tags in seven-point scale by
showing to 25 participants up to ten tags which
annotated to article and three randomly extracted tags.
Calculate correlation coefficient and accuracy
based on evaluation.

Evaluation
(Tag Annotated by TF-IDF)
22
correlation coefficient : 0.732
accuracy rate at threshold 0.2 : 0.812
Tags with similarity of 0.2 or more : 37/85

Example of false recognition:
Evaluation value by system differs from the
evaluation value by human
In the article of Hunger,
"Food crisis" Human : 7 (very high)
System : 0.154 (low)
In the article of Bullying
"Social isolation" Human : 5 (high)
System : 0.152 (low)
 Similarity assessment by related terms could not be
considered.
23Note : These tags are translated from Japanese.

Evaluation
(Tag Annotated by Paragraph Vector)
24
correlation coefficient : 0.346
accuracy rate at threshold 0.35 : 0.824
Tags with similarity of 0.35 or more ：8/102

25

Prototype of API
26
Input : http://foo-bar.net/tag-recom/[Target page URL]
Output：
Note : These tags are translated from Japanese.

Outline
1. Generate tags
1. Filtering
2. SVM
2. Annotate tags
1. TF-IDF
2. Paragraph Vector
3. Prototype of API
4. Conclusion
27

Knowledge Connector
(Repeated)
Site that can share works
such as ideas, applications,
data
28
We aim to solve these problems by developing MissionForest
 Orthographic variants of tags
 Lack of a task management function

MissionForest
29
Web system for sharing social activities and research
activities.
 Managing tasks in a tree structure like Work
Breakdown Structure.
 Activity data is published as linked open data.

Benefits of
linked open data
30
You can discovery information about social problem from tags.

Future work for MissionForest
31
• Annotate each task with social problem tags that can
be used for exploratory browsing of social activities
 browsing other organization's solution is helpful for
discussing about own problems
Environmental
destruction
Global warming

Outline
1. Generate tags
1. Filtering
2. SVM
2. Annotate tags
1. TF-IDF
2. Paragraph Vector
3. Prototype of API
4. Conclusion
32

Conclusion
Automatic Annotation
• Filtering method based SVM can generate
sufficient tag set from DBpedia Japanese.
• TF-IDF method can tag articles with
reasonable precision.
Systems for sharing collaborative activities
• We are developing MissionForest for
connect collaboration within the university
laboratory and cross-organization
collaboration
33

Tag-based Approaches to Sharing Background Information regarding Social Problems towards Facilitating Public Collaboration

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Tag-based Approaches to Sharing Background Information regarding Social Problems towards Facilitating Public Collaboration

Similar to Tag-based Approaches to Sharing Background Information regarding Social Problems towards Facilitating Public Collaboration (20)

More from siramatu-lab

More from siramatu-lab (20)

Recently uploaded

Recently uploaded (20)

Tag-based Approaches to Sharing Background Information regarding Social Problems towards Facilitating Public Collaboration

Editor's Notes