Learning Emergent Knowledge from Blog Postings


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Learning Emergent Knowledge from Blog Postings

  1. 1. 이이이이 세션은세션은세션은세션은 • Social Web (and Social Web search) is a great thing, but… – 수많은 사람들의 경험을, 아주 쉽게 검색할 수 있다 • 영화, 책, 여행지, 음악, … – 아직도, 소셜웹에는 더 많은 것이 (검색 되지 않고 남아) 있다. – 개별 경험 이상의 것: 많은 숫자의 다양한 “경험”들이 모이면 • 트랜드, 숨어있던 관계, 새로운 지식, … • Social Web + Semantic Web Technology – A prototype “Experience Search” system – 새로운 종류의 정보 요구 • 여성 얼리어댑터들이 좋아하는 MP3 플레이어들은? • 젊은이들이, 시대를 타지 않고 꾸준히 읽는 책을 리스트 해 달라. • 폴 오스터의 책을 좋아 사람들이 요즘 읽은 책과, 그들의 관련 포스팅 을 보고 싶다. • 남자는 스릴러, 여자는 로맨스를 읽는다는데, 정말 그럴까?
  2. 2. Overview • A Semantic Search System on Social Web Content • Social Web + Semantic Web – Social Web Content • Blog postings • Experiences of Web users – Semantic Web Technology– Semantic Web Technology • Publishing portable-data • Accessing web-based open knowledge
  3. 3. Overview • By the term “Semantic Search”… – Not by “text matching” – But by satisfying the “conditions” given in the query. • By “Experience Search”… – On the “topics” of Bloggers – Example queries • “20대가 선호하는 mp3 플레이어는?” (mp3 players that are favored by 20s.) • “폴 오스터 팬들이 요즘 읽는 책은?” (List the books that paul auster fans read these days.) • “애플제품 매니아들이 요즘 이야기 하는 최신 전자제품은?” (List the devices that are being talked by apple-lovers.) • “남자는 스릴러, 여자는 로맨스를 읽는다는데, 정말 그럴까?” (Men read thrillers, women read romances. Is it true in Blogosphere?)
  4. 4. Overview • Challenges! – 1) Blog postings are free-text. • No semantics • No explicit/machine-readable topics – 2) Database/Ontology does not have such information. • For example, our book ontology does not know that a book• For example, our book ontology does not know that a book is favored by some group or not. • How to draw such a “previously unknown”, “not recorded in the DB” type of knowledge?
  5. 5. The Idea • Answer for Challenge 1 : Semantic Blogs – A little semantics from blog postings. – Topic: what is the topic of this posting? • Semantic Blog with Semantic Tags – Converting conventional blogs to semantic blogs – Blogger: who is the blogger?– Blogger: who is the blogger? • Basic information about for each bloggers – Age group, gender, job – Published in FOAF (Friend-of-a-Friend) – Manually published + predicted by maching-learning
  6. 6. The Idea • Answer for Challenge 2 : Emergent Knowledge – Connections make new information – Some blog postings are about specific topic-items. • They draws a new connection between the author (blogger) and the topic item (book, IT-device, movie, etc) • New tendency/relationships can be found from this• New tendency/relationships can be found from this connections, • If large number of such connections are available.
  7. 7. Emerging Information from Connections • Sci-fi Fan Example Book Ontology -Book Title -ISBN -Book Author -Genre Blog Postings (SemTag) -Topic (->) -Date/Time -Blogger (->) Personal Info (FOAF) -age -gender -address -Genre -Publisher -Blogger (->)-address topic Blogger 22 Female Daegu -> (uri) 2010.03. <- (uri) The Vor Game 9788989571506 Lois Bujold Sci-fi Baen Books
  8. 8. Emerging Information from Connections • Sci-fi Fan Example Personal Info (FOAF) -age -gender -address Blog Postings (SemTag) -Topic (->) -Date/Time -Blogger (->) Book Ontology -Book Title -ISBN -Book Author -Genre-address -Blogger (->) -Genre -Publisher topic genre Sci-Fi Blogger SciSciSciSci----Fi fanFi fanFi fanFi fan
  9. 9. Emerging Information from Connections • Examples: Emerging information from connections – 20대가 선호하는 기기 (favored by age-group 20s) – “반지의 제왕”을 읽은 사람들이 (bloggers who have read the book “Lord of the rings”. ) – 올해의 베스트셀러 탑 50 (top 50 books of this year) – 폴 오스터 책을 많이 읽은 블로거 (bloggers who have– 폴 오스터 책을 많이 읽은 블로거 (bloggers who have read many books of author paul auster)
  10. 10. Implementation: Semantic Blog • A Semantic Blog example
  11. 11. Implementation: Blog postings as an Event • Postings as Ontology Instances
  12. 12. Implementation: Converting Conventional postings to SemBlogs • Problem – To acquire “emergent knowledge”, we need a lot of postings with semantic tags. – There aren’t many semantic blogs, yet. • Answer – There are a large number of “topic-known” blog postings.– There are a large number of “topic-known” blog postings. – Let’s convert such postings to semantic blog postings
  13. 13. Implementation: Converting Conventional postings to SemBlogs • DB-links in conventional blogs – DB-links: Ability to explicitly mark the topic by making a link to Database Item of portal services. • Naver (DB-attachment), Daum (DB-link): movie, books, • Yes24/Alladdin blogs: books, IT-devices – In essential, they are “semantic tags” in limited domain– In essential, they are “semantic tags” in limited domain • Postings – Collected nearly 100,000 Blog postings with DB-link – Converted into Semantic Blog postings (event instances) – Postings about “movies”, “books”, “IT devices”, “travel locations”.
  14. 14. Implementation: Converting Conventional postings to SemBlogs • Blogger information – Among the collected postings, 2000 bloggers have been selected. • Who posted more than 20 topic-known postings. – Manually tagged FOAF info for 2000 bloggers • Age, Gender, Home location (city level), Occupation.• Age, Gender, Home location (city level), Occupation. – Their blog texts are then become the training data – Classification methods have been applied to other bloggers – In total 5000+ bloggers have been collected for search data • The data – 5000+ bloggers, 100,000+ postings, over 3 years.
  15. 15. Implementation: Selecting Domain Ontologies • Domain ontologies are needed – DBPedia could provide good topic-vocabulary… • However, not enough Korean books and locations in the DBPedia. – Domain ontologies are separately prepared for the search systemsystem – Travel locations • GeoNames ontology (geonames.org) – Books • Book ontology (bizier et al.) – IT devices • IT ontology (Kaist CoreOnto Ontology)
  16. 16. Implementation: The Main Idea (again), and Semantic Labels • “Simple and large (instances)” is better than “rich and few” • Simple semantics from texts/blog postings – Relatively easy to achieve in large numbers • From Large number of Instances – Large number of “connections” can be found – Knowledge that are not described in the ontology can be found from the connections • How normal users can explicitly use/find such connections? – Name the patterns: Semantic Labels
  17. 17. Implementation: Semantic Labels • Semantic Labels – Connect human concepts to graph-patterns – Graph patterns are described in SPARQL • SPARQL is query language, which can also be used as a rule language – With additions of Aggregation functions, etc.– With additions of Aggregation functions, etc. – Name the “Findings” • In the implementation, new findings are attached to instances as a label • This label can be used in the semantic search. • Rule-based findings of meaningful patterns
  18. 18. Semantic Label Examples Antecedents are described in Rule a language. ( SPARQL + additional functions )functions )
  19. 19. Search System Architecture advanced users Semantic Label Definitions Rule Process Module Query Search Module keyword Search Inference and modify SPARQL queries Rule authoring RDF Store People Event Domain Ontologies users User Interface Process Module Module Analysis Module keyword queries Search result in XML Analysis request Analysis request query Analysis result in XML Event Ontology FOAF Instances Ontologies
  20. 20. Semantic Search Demonstrations
  21. 21. Semantic Search Demonstrations
  22. 22. Semantic Search Demonstrations
  23. 23. Semantic Search Demonstrations
  24. 24. 결론결론결론결론 • 블로그스피어에서 찾는 창발적 지식(Emergent Knowledge) – 블로그 포스팅을 연결삼아 (Blog postings as “Connections”) – 새로운 지식 발견이 가능 • “Simple Semantic goes a long way”• “Simple Semantic goes a long way” – 단순한 Semantic (data), 다양한 사례 (Instances) • Social Web + Semantic Web
  25. 25. Additional Information about the system • Detailed information about the system, and its evaluation can be found in the paper, doi:10.1016/j.websem.2010.05.001 TG Noh et al., Learning the emergent knowledge from annotated blog postings, Web Semantics: Science, Services and Agents on the World Wide Web, 2010 • You can access the paper, data and prototype demo and its video in – http://nweb.knu.ac.kr/