Social Relation Based Scalable
 Semantic Search Refinement
  Yi Zeng1, Xu Ren1, Yulin Qin1,2, Ning Zhong1,3,
         Zhis...
Motivation
•   Vague/Incomplete queries over large scale semantic data
    (How to get more refined queries to reduce the ...
Social Relations and Social Networks
             • Most of the social networks follow the power law distribution.
       ...
Search Refinement through
                      Social Relationship
Table 1: A partial result of the expert finding search...
Social Network based Interest Retention
    Models for Search Refinement




                                          5
Obtaining the Retained Interests
 •  Are retained interests appeared more frequently than others?
  (Frequency) Total Inte...
Obtaining the Retained Interests
•   (Frequency and Recency) Exponential Model for Interest Retention :
    EIR(i ) = ∑ j ...
Obtaining the Retained Interests
                                                                          •   To some ext...
Network
                                                                                          Link
             Retain...
Search Refinement by Interests
               from Different Perspectives
•   Vague/incomplete queries may produce too man...
Refinement with Retained interests,
     group retained interests

                             8 requests to DBLP authors...
Future Research




                  12
Semantic Similarity
                  ---- Obtaining More Accurate Interest Descriptions and
                             ...
Thank You!




             14
Upcoming SlideShare
Loading in …5
×

Social Relation Based Scalable Semantic Search Refinement

1,072
-1

Published on

This is a talk presented at the 2009 Asian Scalable Semantic Data Processing Workshop co-located with the 2009 Asian Semantic Web Conference.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,072
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Social Relation Based Scalable Semantic Search Refinement

  1. 1. Social Relation Based Scalable Semantic Search Refinement Yi Zeng1, Xu Ren1, Yulin Qin1,2, Ning Zhong1,3, Zhisheng Huang4, Yan Wang1 1. International WIC Institute, Beijing University of Technology, China 2. Carnegie Mellon University, USA 3. Maebashi Institute of Technology, Japan 4. Vrije University Amsterdam, the Netherlands 1
  2. 2. Motivation • Vague/Incomplete queries over large scale semantic data (How to get more refined queries to reduce the size of the result set?). • Large scale semantic data vs most relevant data for a specific user Diversity for different users in the context of large scale semantic data User interests Network of friends, collaborators, etc. Interests based Search refinement search refinement through social relationship Group interests based search refinement 2
  3. 3. Social Relations and Social Networks • Most of the social networks follow the power law distribution. • Using the FOAF vocabularies, the DBLP coauthor network is created. Fig. 1: Coauthor number distribution in Fig. 2: log-log diagram of Figure 1. the SwetoDBLP dataset. • Approximate power law distribution not many authors who have a lot of coauthors, and most of the authors are with very few coauthors. • Considering the scalability issue, when the number of authors expand rapidly, it will not hard to rebuild the coauthor network since most of the authors will just have a few links. 3
  4. 4. Search Refinement through Social Relationship Table 1: A partial result of the expert finding search task Domain experts “Artificial Intelligence authors”(User name: John McCarthy). dataset Satisfied Authors without Satisfied Authors with social relation refinement social relation refinement User URIs Carl Kesselman (312) Hans W. Guesgen (117) * Thomas S. Huang (271) Virginia Dignum (69) * Coauthor Network dataset Edward A. Fox (269) John McCarthy (65) * Lei Wang (250) Aaron Sloman (36) * Bridging two separate John Mylopoulos (245) Carl Kesselman (312) datasets together and help to Ewa Deelman (237) Thomas S. Huang (271) refine the expert finding task. ... ... In an enterprise setting, if the found experts have some previous relationship with the employer, the cooperation may be smoother. 4
  5. 5. Social Network based Interest Retention Models for Search Refinement 5
  6. 6. Obtaining the Retained Interests • Are retained interests appeared more frequently than others? (Frequency) Total Interest : TI (i ) = ∑n m(i, j ) j =1 • Except for frequency, what else is important to correctly obtain retained interests? Forgetting mechanism in cognitive memory retention (exponential function model, power function model) [Anderson, Schooler 1991]. (Frequency and Recency) Memory Retention: P = Ae−bT ; P = AT −b Pictures from: [Schooler 1993] Schooler, L. J. & Anderson, J. R.: Recency and Context: An Environmental Analysis of Memory. In Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society, pp. 889-894, 1993. 6
  7. 7. Obtaining the Retained Interests • (Frequency and Recency) Exponential Model for Interest Retention : EIR(i ) = ∑ j =1 m(i, j ) × Ae n − bTi , j • (Frequency and Recency) Power Model for Interest Retention : PIR(i ) = ∑ j =1 m(i, j ) × ATi , j − b n [Zeng 2009a] Cognitive Memory Retention Based Starting Point for Query Extension and Granular Selection, Yi Zeng, Haiyan Zhou, Ning Zhong, Yulin Qin, Shengfu Lu, Yiyu Yao, Yang Gao. In: Cognitive Memory Component (v1), LarKC deliverable 2-3-1, Coordinated by Jose Quesada and Yi Zeng, March 30, 2009. [Zeng 2009b] Yi Zeng, Yiyu Yao, Ning Zhong. DBLP-SSE: A DBLP Search Support Engine, In: Proceedings of the 2009 IEEE/WIC/ACM International Conference on Web Intelligence, IEEE Computer Society, Milan, Italy, September 15-18, 2009. [Maanen 2009] Leendert van Maanen, Julian N. Marewski.: Recommender Systems for Literature Selection: A Competition between Decision Making and Memory Models, CogSci 2009, July 31-August 1, 2009. 7
  8. 8. Obtaining the Retained Interests • To some extend, current interests are relevant to interest retention. Using the power law model, under A=0.855, and b=1.295, we selected all the authors whose publication numbers are above 100, Figure 7a: A comparative study of Figure 7b: Difference on and we predict their top 9 total research interests from 1990 to the contribution values interests from 2000 to 2008 and retained interests in 2009 from papers published in 2007 using interest (based on both the power law and different years exponential law models) retention (1226 persons). 49.54% of this samples can predict 3 out of 9 interests. • We analyzed research Interest retention for all the 615,124 computer scientists based on the SwetoDBLP dataset. We released the “computer scientists’ research interest RDF dataset : http://www.iwici.org/dblp-sse Figure 7c: A Comparison of Total Interests and Interest Retentions http://wiki.larkc.eu/csri-rdf of the author “Ricardo A. Baeza-Yates”. (Nov, 2009 from DBLP) 8
  9. 9. Network Link Retained Interests Search Search PageRank in a Social Environment Information Retrieval Web Web Carlos Castillo Group Retained Interests : Query Content Spam • Diversity Challenge Ricardo A. Baeza-Yates • Consistency Engine Mining Analysis Analysis Detection Group Retained Interest : Top 9 Retained Top 9 Group Retained Interests Interests ⎧1 (i ∈ RItop 9 ) ⎪ p Web 7.81 Search 35 E (i, p ) = ⎨ , Search 5.59 Retrieval 30 ⎪0 (i ∉ RI p ) top 9 ⎩ Retrieval 3.19 Web 28 GIR (i ) = ∑ p =1 E (i, p ), n Information 2.27 Information 26 Query 2.14 System 19 For most prolific authors in DBLP Engine 2.10 Query 18 (publication number >50): Minining 1.26 Analysis 14 5161 persons Challenge … Text … On average, 52.55% of an individual’s retained interests are Analysis … Model … consistent with his/her group Top 9 interests retention of a user and his group interests retention. (Ricardo A. Baeza-Yates, retained interests. based on May 2008 version of SwetoDBLP). 9
  10. 10. Search Refinement by Interests from Different Perspectives • Vague/incomplete queries may produce too many results that the users have to wade through. • Research interests may be very related with search tasks. • Research interests can be evaluated from various perspectives. (1) Total Interests; (2) Retained Interests; (3) Co-author Group retained interests; 10
  11. 11. Refinement with Retained interests, group retained interests 8 requests to DBLP authors were sent out. 7 replied. Participants 7 DBLP authors: • Preference order 100% : List 2, List 3 List 1 • Preference order 100% : List 2 ≈ List 3 • Preference order 83.3% : List 2 > List 3 List 1 • Preference order 16.7% : List 3 > List 2 List 1 11
  12. 12. Future Research 12
  13. 13. Semantic Similarity ---- Obtaining More Accurate Interest Descriptions and Observations of Interest Dynamics Network Link Search Search PageRank search retrieval 0.645 search query 0.552 Information Retrieval Web Web Carlos Castillo search pagerank 0.813 Query Content Spam retrieval query 0.467 Challenge Ricardo A. Baeza-Yates retrieval pagerank 0.293 Analysis Analysis Detection Engine Mining query pagerank 0.098 Figure 14. Consistent interests without consideration of semantic similarity. logic reasoning 0.667 Network Link logic inference 0.606 Search Search PageRank reasoning inference 0.909 Retrieval ontology OWL 0.805 Information Web Web Carlos Castillo Table . Some examples on Query Content semantic similarities based on Spam Challenge Normalized Google Distance. Ricardo A. Baeza-Yates Analysis Analysis Detection Engine Mining Figure 15. Consistent interests with consideration of semantic similarity. 13
  14. 14. Thank You! 14
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×