Learning Emergent Knowledge from Blog Postings

이이이이 세션은세션은세션은세션은
• Social Web (and Social Web search) is a great thing, but…
– 수많은 사람들의 경험을, 아주 쉽게 검색할 수 있다
• 영화, 책, 여행지, 음악, …
– 아직도, 소셜웹에는 더 많은 것이 (검색 되지 않고 남아) 있다.
– 개별 경험 이상의 것: 많은 숫자의 다양한 “경험”들이 모이면
• 트랜드, 숨어있던 관계, 새로운 지식, …
• Social Web + Semantic Web Technology
– A prototype “Experience Search” system
– 새로운 종류의 정보 요구
• 여성 얼리어댑터들이 좋아하는 MP3 플레이어들은?
• 젊은이들이, 시대를 타지 않고 꾸준히 읽는 책을 리스트 해 달라.
• 폴 오스터의 책을 좋아 사람들이 요즘 읽은 책과, 그들의 관련 포스팅
을 보고 싶다.
• 남자는 스릴러, 여자는 로맨스를 읽는다는데, 정말 그럴까?

Overview
• A Semantic Search System on Social Web Content
• Social Web + Semantic Web
– Social Web Content
• Blog postings
• Experiences of Web users
– Semantic Web Technology– Semantic Web Technology
• Publishing portable-data
• Accessing web-based open knowledge

Overview
• By the term “Semantic Search”…
– Not by “text matching”
– But by satisfying the “conditions” given in the query.
• By “Experience Search”…
– On the “topics” of Bloggers
– Example queries
• “20대가 선호하는 mp3 플레이어는?” (mp3 players that are favored by
20s.)
• “폴 오스터 팬들이 요즘 읽는 책은?” (List the books that paul auster fans
read these days.)
• “애플제품 매니아들이 요즘 이야기 하는 최신 전자제품은?” (List the
devices that are being talked by apple-lovers.)
• “남자는 스릴러, 여자는 로맨스를 읽는다는데, 정말 그럴까?” (Men
read thrillers, women read romances. Is it true in Blogosphere?)

Overview
• Challenges!
– 1) Blog postings are free-text.
• No semantics
• No explicit/machine-readable topics
– 2) Database/Ontology does not have such information.
• For example, our book ontology does not know that a book• For example, our book ontology does not know that a book
is favored by some group or not.
• How to draw such a “previously unknown”, “not recorded
in the DB” type of knowledge?

The Idea
• Answer for Challenge 1 : Semantic Blogs
– A little semantics from blog postings.
– Topic: what is the topic of this posting?
• Semantic Blog with Semantic Tags
– Converting conventional blogs to semantic blogs
– Blogger: who is the blogger?– Blogger: who is the blogger?
• Basic information about for each bloggers
– Age group, gender, job
– Published in FOAF (Friend-of-a-Friend)
– Manually published + predicted by maching-learning

The Idea
• Answer for Challenge 2 : Emergent Knowledge
– Connections make new information
– Some blog postings are about specific topic-items.
• They draws a new connection between the author (blogger)
and the topic item (book, IT-device, movie, etc)
• New tendency/relationships can be found from this• New tendency/relationships can be found from this
connections,
• If large number of such connections are available.

Emerging Information from
Connections
• Sci-fi Fan Example
Book Ontology
-Book Title
-ISBN
-Book Author
-Genre
Blog Postings
(SemTag)
-Topic (->)
-Date/Time
-Blogger (->)
Personal Info
(FOAF)
-age
-gender
-address -Genre
-Publisher
-Blogger (->)-address
topic
Blogger
22
Female
Daegu
-> (uri)
2010.03.
<- (uri)
The Vor Game
9788989571506
Lois Bujold
Sci-fi
Baen Books

Connections
• Sci-fi Fan Example
Personal Info
(FOAF)
-age
-gender
-address
Blog Postings
(SemTag)
-Topic (->)
-Date/Time
-Blogger (->)
Book Ontology
-Book Title
-ISBN
-Book Author
-Genre-address -Blogger (->) -Genre
-Publisher
topic
genre
Sci-Fi
Blogger
SciSciSciSci----Fi fanFi fanFi fanFi fan

Connections
• Examples: Emerging information from connections
– 20대가 선호하는 기기 (favored by age-group 20s)
– “반지의 제왕”을 읽은 사람들이 (bloggers who have read
the book “Lord of the rings”. )
– 올해의 베스트셀러 탑 50 (top 50 books of this year)
– 폴 오스터 책을 많이 읽은 블로거 (bloggers who have– 폴 오스터 책을 많이 읽은 블로거 (bloggers who have
read many books of author paul auster)

Implementation: Semantic Blog
• A Semantic Blog example

Implementation: Blog postings as an
Event
• Postings as Ontology Instances

Implementation: Converting
Conventional postings to SemBlogs
• Problem
– To acquire “emergent knowledge”, we need a lot of postings
with semantic tags.
– There aren’t many semantic blogs, yet.
• Answer
– There are a large number of “topic-known” blog postings.– There are a large number of “topic-known” blog postings.
– Let’s convert such postings to semantic blog postings

• DB-links in conventional blogs
– DB-links: Ability to explicitly mark the topic by making a
link to Database Item of portal services.
• Naver (DB-attachment), Daum (DB-link): movie, books,
• Yes24/Alladdin blogs: books, IT-devices
– In essential, they are “semantic tags” in limited domain– In essential, they are “semantic tags” in limited domain
• Postings
– Collected nearly 100,000 Blog postings with DB-link
– Converted into Semantic Blog postings (event instances)
– Postings about “movies”, “books”, “IT devices”, “travel
locations”.

• Blogger information
– Among the collected postings, 2000 bloggers have been
selected.
• Who posted more than 20 topic-known postings.
– Manually tagged FOAF info for 2000 bloggers
• Age, Gender, Home location (city level), Occupation.• Age, Gender, Home location (city level), Occupation.
– Their blog texts are then become the training data
– Classification methods have been applied to other bloggers
– In total 5000+ bloggers have been collected for search data
• The data
– 5000+ bloggers, 100,000+ postings, over 3 years.

Implementation: Selecting Domain
Ontologies
• Domain ontologies are needed
– DBPedia could provide good topic-vocabulary…
• However, not enough Korean books and locations in the
DBPedia.
– Domain ontologies are separately prepared for the search
systemsystem
– Travel locations
• GeoNames ontology (geonames.org)
– Books
• Book ontology (bizier et al.)
– IT devices
• IT ontology (Kaist CoreOnto Ontology)

Implementation: The Main Idea (again),
and Semantic Labels
• “Simple and large (instances)” is better than “rich and few”
• Simple semantics from texts/blog postings
– Relatively easy to achieve in large numbers
• From Large number of Instances
– Large number of “connections” can be found
– Knowledge that are not described in the ontology can be
found from the connections
• How normal users can explicitly use/find such connections?
– Name the patterns: Semantic Labels

Implementation: Semantic Labels
• Semantic Labels
– Connect human concepts to graph-patterns
– Graph patterns are described in SPARQL
• SPARQL is query language, which can also be used as a
rule language
– With additions of Aggregation functions, etc.– With additions of Aggregation functions, etc.
– Name the “Findings”
• In the implementation, new findings are attached to
instances as a label
• This label can be used in the semantic search.
• Rule-based findings of meaningful patterns

Semantic Label Examples
Antecedents are
described in
Rule a language.
( SPARQL
+ additional
functions )functions )

Search System Architecture
advanced
users Semantic
Label
Definitions
Rule Process
Module
Query
Search
Module
keyword
Search
Inference and modify
SPARQL queries
Rule
authoring
RDF Store
People
Event
Domain
Ontologies
users
User
Interface
Process
Module
Module
Analysis
Module
keyword
queries
Search result
in XML
Analysis
request
Analysis
request
query
Analysis result
in XML
Event
Ontology
FOAF
Instances
Ontologies

Semantic Search Demonstrations

결론결론결론결론
• 블로그스피어에서 찾는 창발적 지식(Emergent
Knowledge)
– 블로그 포스팅을 연결삼아 (Blog postings as
“Connections”)
– 새로운 지식 발견이 가능
• “Simple Semantic goes a long way”• “Simple Semantic goes a long way”
– 단순한 Semantic (data), 다양한 사례 (Instances)
• Social Web + Semantic Web

Additional Information
about the system
• Detailed information about the system, and its evaluation can
be found in the paper, doi:10.1016/j.websem.2010.05.001
TG Noh et al., Learning the emergent knowledge from annotated blog postings,
Web Semantics: Science, Services and Agents on the World Wide Web, 2010
• You can access the paper, data and prototype demo and its
video in
– http://nweb.knu.ac.kr/

Learning Emergent Knowledge from Blog Postings

Recommended

Recommended

More Related Content

Similar to Learning Emergent Knowledge from Blog Postings

Similar to Learning Emergent Knowledge from Blog Postings (20)

More from Saltlux Inc.

More from Saltlux Inc. (17)

Learning Emergent Knowledge from Blog Postings