Ranking Objects by Following Paths in Entity-Relationship Graphs (PhD Workshop at CIKM)

1. 1 Ranking Objects by Following Paths in Entity-Relationship Graphs Seoul National University, Seoul, Republic of Korea 4th Workshop for Ph.D. Students in Information and Knowledge Management (PIKM 2011) in conjunction with CIKM 2011 School of Computer Science and Engineering October 28th, 2011 Minsuk Kahng, Sangkeun Lee and Sang-goo Lee
2. 2Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Problem • Objective – Given some query objects, rank objects of a specified type • Examples – For a given user and current timestamp, which songs are most preferable? – For a given paper, which conferences are most relevant to be submitted? Introduction Brian 28 OCT 11:30 AM Query Ranked List
3. 3Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Outline • Introduction • Data Model • Method • Preliminary Experiments • Related Work • Challenging Issues • Conclusion
4. 4Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Heterogeneous Data for Search • Various types of data for search and recommendation – Traditionally, only documents and words – Now, various objects and relationships between them • Documents, Products, Music, User, Clickthrough Log, Map, Contextual Data (e.g. time & place) • Incorporating these heterogeneous data is key! Introduction
5. 5Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Utilize Graph-based Data Model • Graph-based data models have gained popularity for dealing with these heterogeneous data. – Objects and relationships can be modeled by nodes and edges. – It is called “Entity-Relationship Graphs”. • Solve the problem with graph operations Introduction Star: stenier… G. Kasneci 2009 ICDEIEEE Paper#052 Paper#065 Discover: keyword… M. Ramanath M. Sozio F. Suchanek G. Weikum 2008 Paper#053 Naga: searching… hasTitle hasTitle hasPublisher User Doc Click Time #12 cikm10.org 0 11:00 #12 cikm11.org 1 11:01 #500 … 0 13:45 #500 … 0 13:46 #500 … 1 13:46 #904 … 1 13:51 degree random walk shortest path Original database Graph data model Graph problem
6. 6Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Object Ranking with Graphs • With graph-based data models – Problem Definition • Given query objects, rank objects of a specified type – With graph models, it becomes: • Which target nodes are most similar to the query nodes? • Challenges – How to define similarity between nodes in the graph? • Not just shortest distances • Capture semantics of structured data Introduction
7. 7Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Existing work, but… • Homogeneous Graph – focus on structure of graph, not consider ‘type’ of nodes/edges – PageRank – inlink/outlink of nodes, random walks, … [1] • Consider the type of edges: Edge-level feature – Each edge has a ‘type’. – ObjectRank - Each edge type has different weight [2]. • But still, hard to capture semantics – Importance of edge type is always same regardless of tasks Introduction [1] Brin and Page. … hypertextual web search engine. In WWW, 1998. [2] Balmin et al. Objectrank: authority-based keyword search …. In VLDB, 2004.
8. 8Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Path-level feature instead of edge • Recently, the path-level feature has been mentioned as a replacement for the edge-level feature [3, 4]. – Each path, a sequence of edge types, has weight. – An edge has a different role depending on the tasks. Introduction [3] Sun et al. PathSim: meta path-based top-k similarity …. In VLDB, 2011. [4] Lao et al. … retrieval … path constrained random walks. In ECML-PKDD, 2010. hasAuthor hasTitle cite pubWhere 0.90 0.50 0.35 0.15 hasTitle-1 cite pubWhere hasChair -1 hasAuthor -1 hasAuthor pubWhere hasTitle -1 pubWhere 0.90 0.50 0.35 0.15 feature weight feature weight edge-level feature path-level feature
9. 9Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Proposed: Object Ranking with Paths • We propose an object ranking method. – Transform original data into graph-based data models – Use paths in the graphs for ranking Introduction Paper#051 1998 L. Page S. Brin The anatomy of a large… Star: stenier… G. Kasneci 2009 ICDEIEEE Paper#052WWW Paper#061 2004 A. Balmin Objectrank: authority… V. Hristidis VLDB Paper#065 2002 Discover: keyword…Y. Papakonstantinou M. Ramanath M. Sozio F. Suchanek G. Weikum 2008 Paper#053 G. Ifrim Naga: searching… hasTitle hasTitle hasTitle hasTitle hasPublisher
10. 10Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Outline • Introduction • Data Model • Method
11. 11Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Data Graph & Schema Graph • Data graph – Instances (types both on nodes & edges) • Schema graph – Schema of data graph Data Model Using DBLP dataset, we build a data graph which has types on both nodes & edges. Example of data graph Paper#051 1998 L. Page S. Brin The anatomy of a large… Star: stenier… G. Kasneci 2009 ICDEIEEE Paper#052 WWW Paper#061 2004 A. Balmin Objectrank: authority… V. Hristidis VLDB Paper#065 2002 Discover: keyword…Y. Papakonstantinou M. Ramanath M. Sozio F. Suchanek G. Weikum 2008 Paper#053 G. Ifrim Naga: searching… hasTitle hasTitle hasTitle hasTitle pubWhere pubWhere hasPublisher pubYear pubWhere WWW‘98 ICDE’09 VLDB’04 VLDB’02 ICDE’08 Publisher Proceeding Title Author ConferencePaper Year
12. 12Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Method Overview • Problem – Given query nodes and target type, which target nodes are most relevant to the query nodes? • Steps 1. Choose schema paths from the schema graph 2. For each path, look into data graph, and score target nodes 3. Combine those results of each path Method
13. 13Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| 1-1) What is Schema Path? • A schema path p is a sequence of edge types. • Represents a workflow of how we traverse the graph. Method T ARTIST ALBUM TRACK LISTEN_LOG USER GENRE TRACK_GENRE USER: username USER: birthday USER: register_date USER: gender LISTEN_LOG: listen_dateTRACK: title ARTIST: name ARTIST: genderALBUM: title ALBUM: release_date GENRE: name TRACKLLOG hasTrack TRACK:title termhasTitle hasTerm The edge types are omitted. USER hasUser-1 term TRACK:title TRACKhasTerm -1 hasTitle-1 ALBUM hasAlbum
14. 14Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| 1-2) Choose Schema Paths • Discover candidate schema paths, and select some of them. – Simply, find all paths between two nodes on schema graph • 1) User > Log > Song > Album • 2) User > Log > Song > Artist > Album – Symmetric Paths • a) User > Log > Song > Log > User • b) Song > SongTitle > Term > SongTitle > Song – Expanding with Symmetric Paths • 1+a) User > Log > Song > Log > User > Log > Song -> Album (CF) • 1+b) User > Log > Song > STitle > Term > STitle > Song > Album Method AlbumUser symmetric path
15. 15Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| 2-1) For each path, look into data graph • For each path, look into data graph, and score target objects. • From query to target, follow edges in the path. Method t1 t2 t3 (Term) (SongTitle) (Song) (Album) (Song) t4 Schema Path (extracted from schema graph) from Data Graph query Q target type dr r = M (m) ×…× M (1) ×q = A×q
16. 16Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| 2-2) How to give scores to nodes? • How to give scores to the nodes during propagation? • Define scoring functions – Number of paths – Distribute the current state equally (random walk) – Distribute, but not equally (e.g. for diminishing popularity effect) Method ? 2 0 1 0 0 0 2 3 2 1 0 0.20 0.39 0.22 0.12 0.07 0.35 0.20 0.20 0.16 0.09 0.60 0.05 0.20 0.05 0.05 0.05 r = M (m) ×…× M (1) ×q = A×q Define scoring functions = Define how to fill value of the matrix M
17. 17Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| 3) Combine the results • Now, we have scores of nodes obtained from each path. • Weighted combination of each result – Manually – End user – Learning (e.g. learning-to-rank) Method ∑ ×= j ijji qxRwqxR ),(),( i-th objects of target type weight of j-th path score of i-th object obtained from j-th path
18. 18Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Outline • Introduction • Data Model • Method • Preliminary Experiments – Datasets – Tasks (Scenarios)
19. 19Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Two Real-world Datasets • Music streaming service (Bugs Music) – Music Metadata (song, album, artist, genre) – Users’ Listening Log • e.g. “’User #12’ listened to ‘Song #59’ at time t.” – Node types: song, album, artist, genre, user, date, log, etc. • DBLP publication – All papers published in 20 major conferences – Node types: paper, author, conference, etc. Preliminary Experiments
20. 20Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Task #1: Basic Scenario • Scenario: Recommend related conferences – Input: a set of terms – Output: Conferences • Path – (Term) -> (PaperTitle) -> (Paper) -> (Conference) Preliminary Experiments 1 WWW 2 ICWSM 3 CHI 4 ECIR 5 WSDM Query #1: “twitter” 1 KDD 2 CIKM 3 ICDE 4 ICDM 5 SDM 1 ICDE 2 SIGMOD 3 VLDB 4 DASFAA 5 CIKM 1 SIGIR 2 ECIR 3 CIKM 4 AAAI 5 WWW Query #2: “graph clustering” Query #3: “transaction lock protocal optimizing” Query #4: “language model smoothing”
21. 21Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Task #2: Different paths • Research Q: How two different paths produce different results • Scenario: Find similar papers – Input: particular Paper(w/ title) – Output: Papers(w/ title) Preliminary Experiments 1 DISCOVER: Keyword Search in Relational Datab. 2 Discover Relaxed Periodicity in Temporal Datab. 3 Discover Relev. Env. Feature Using Concurrent Reinf. Learning 4 Mining Propositional Knowl. Bases to Discover Multi-level Rules 5 Mining Multivar. Time-Ser. Sensor Data to Discover Behav. Envel. 1 DISCOVER: Keyword Search in Relational Datab. 2 ObjectRank: Authority-Based Keyword Search in Datab. 3 ObjectRank: A System for Auth.-based Search on Datab. 4 Keyword Proximity Search on XML Graphs 5 Efficient IR-Style Keyword Search over Relational Datab. Path #1: “(Paper) -> (PaperTitle) -> (Term) -> (PaperTitle) -> (Paper)” Path #2: “(Paper) -> (PaperAuthorRelation) -> (Author) -> (PaperAuthorRelation) -> (Paper)” Query: “DISCOVER: Keyword Search in Relational Databases”
22. 22Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Task #3: Combine Paths • Research Q: Combine results from two different paths • Scenario: Given a user and current date, recommend artists based on user's listening log and daily popularity – Input: particular User, Current Date – Output: Artists Preliminary Experiments Path #1: “(User) -> (ListenLog) -> (Song) -> (ListenLog) -> (User) -> (ListenLog) -> (Song) -> (Artist)” Path #2: “(Date) -> (ListenTimestamp) -> (ListenLog) -> (Song) -> (Artist)” Artist’s name Path #1 Path #2 1 Black Eyed Peas 1 21 2 8Elight (local) 8 2 3 V.O.S. (local) - 1 4 Eminem 4 37 5 Ciara 2 86
23. 23Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Outline • Introduction • Data Model • Method • Preliminary Experiments • Related Work – Keyword search in DB – Recommender systems – Graph-based Ranking • Challenging Issues – Learning – Efficiency – Flexible Querying Framework – Result Presentation
24. 24Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Related Work • Keyword Search in DB – Also deal with heterogeneous objects – More concerned with performance, not ranking • Recommender Systems – Focus more on semantic similarity – Only two types of entities: User & Item • Graph-based Ranking – Mostly edge-level – Recently, path-level methods have been introduced. – Some work with RDF & SPARQL
25. 25Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Challenging Issues (Future Work) • Learning – can improve the quality of ranked results – Weights of paths can be determined (e.g. learning-to-rank) [4] • Efficiency – Performance issues when data size go up – Materializing [3], Filtering • Flexible Querying Framework – Expressiveness of query [5] – Provide operators, conditional clauses, choose score functions • Result presentation – Explanation of results [6] (why it is ranked 1st) – Schema path would be helpful. [5] Varadarajan et al. Flexible and efficient querying and ranking …. In EDBT, 2009. [6] Yu et al. Recommendation diversification using explanations. In ICDE, 2009.
26. 26Ranking Objects by Following Paths in Entity-Relationship Graphs© 2011 Minsuk Kahng PIKM 2011|| Conclusion • Propose an object ranking method. • Heterogeneous Data – Utilize graph-based data model to capture heterogeneity. • Paths in Graph – Use schema path in the graph which implies the semantics. • Ranking – By following these paths, objects are ranked. • Discussion – Many challenging issues & much room for improvements
27. 27 Thank you minsuk@europa.snu.ac.kr