Successfully reported this slideshow.
CORE
CO-author REcommendation using network information and interest similarity
Overview
• Problem statement
• Data collection & storage
• Calculations
• Technical infrastructure
• Conclusion
Problem statement
• Researchers in search of future
cooperation partners
• writing a paper
• writing a project proposal
• ...
Problem statement
• Researchers in search of future
cooperation partners
• writing a paper
• writing a project proposal
• ...
CORE
Similarity (of interest) / homophily (Ibarra, 1992; Lazarsfeld &
Merton, 1954; McPherson, Smith-Lovin, & Cook, 2001; ...
Data collection
• data:
• dspace.ou.nl (recommendation)
• Google scholar h-index (visualisation)
• Mendeley hr-index (visu...
dspace harvester response
• identifier
• timestamp
• title
• creator: authors
• subject: keywords
• description:APA ref,
sp...
Data storage
4.3.2 New data collection
On top of the initial data we also collect data from two different sources.
Illustr...
Additional data: h-index
• For each article:
• search google scholar
• scrape citations
• 1000 requests → Captcha
• → swit...
Additional data: hr-index
• h-index for Mendeley reads
• Reader Meter
Architecture
Interest similarity
• Vector space model (Salton,Wang, &Yang, 1975)
• every author has a keyword vector
• per keyword:TF-I...
Betweenness centrality
• requirement: a network
• co-author network
Betweenness centrality
• requirement: a network
• co-author network
Betweenness centrality
• requirement: a network
• co-author network
Betweenness centrality
• requirement: a network
• co-author network
Betweenness centrality
• requirement: a network
• co-author network
Betweenness centrality
• requirement: a network
• co-author network
OUNL co-author network5
Fig. 1. Co-authorship network
X.2.2 Calculations
Sie et al. (accepted)
Betweenness centrality
• betweenness = number of times an author is on the shortest
path between two other authors / total...
Betweenness centrality
number of times v is on the shortest path between s and t
number of shortest paths between s and t
Betweenness centrality
g(Hendrik) = σMarlies,Erik(Hendrik)/σMarlies,Erik + σMarlies,Denis(Hendrik)/
σMarlies,Denis + σPete...
Betweenness centrality
g(Hendrik) = σMarlies,Erik(Hendrik)/σMarlies,Erik + σMarlies,Denis(Hendrik)/
σMarlies,Denis + σPete...
Problem
• Find a new co-author with:
• similar interest (vector similarity)
• influence (betweenness centrality)
GUI
GUI
fill out additional keywords
GUI
add keyword to user’s vector
GUI
adjust sliders to your liking
GUI
press the button
Recommendation result
Author page (1/2)
Author page (2/2)
Keyword page
Usability
• SUS System Usability Scale: 67/100 points
• Q4: no help from a technical person needed
COCOON CORE (questions ...
Considerations
• Interest similarity: Keyword vector or keyword network?
• average distance between their keywords?
• Word...
References
• Ibarra, H. (1992). Homophily and Differential Returns : Sex Differences in Network Structure and Access in an...
Networks are everywhere
Thank you for your attention!
rory.sie@ou.nl
http://www.open.ou.nl/rse
openrory, maisonpoublon
Rory Sie
openrse
http://nl....
Upcoming SlideShare
Loading in …5
×

CORE: co-author recommendation using network information and interest similarity

1,816 views

Published on

description of CORE for the EPFL REACT team in Google Hangouts

Published in: Technology, Design
  • Be the first to comment

CORE: co-author recommendation using network information and interest similarity

  1. 1. CORE CO-author REcommendation using network information and interest similarity
  2. 2. Overview • Problem statement • Data collection & storage • Calculations • Technical infrastructure • Conclusion
  3. 3. Problem statement • Researchers in search of future cooperation partners • writing a paper • writing a project proposal • finding people with similar interest
  4. 4. Problem statement • Researchers in search of future cooperation partners • writing a paper • writing a project proposal • finding people with similar interest Whom to choose / ask when you want to work together?http://www.flickr.com/photos/jaygooby/
  5. 5. CORE Similarity (of interest) / homophily (Ibarra, 1992; Lazarsfeld & Merton, 1954; McPherson, Smith-Lovin, & Cook, 2001; Stahl, 2005) Influence / Power over information/dissemination flow (similar to Word-of-Mouth (Money, Gilly, & Graham, 1998; Park & Suh, 2013))
  6. 6. Data collection • data: • dspace.ou.nl (recommendation) • Google scholar h-index (visualisation) • Mendeley hr-index (visualisation) • storage: MAMP
  7. 7. dspace harvester response • identifier • timestamp • title • creator: authors • subject: keywords • description:APA ref, sponsors • language • type: conf. paper, article, book chapter <?xml version="1.0" encoding="UTF-8"?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2002-02-08T08:55:46Z</responseDate> <request verb="GetRecord" identifier="oai:arXiv.org:cs/0112017" metadataPrefix="oai_dc">http://arXiv.org/oai2</request> <GetRecord> <record> <header> <identifier>oai:arXiv.org:cs/0112017</identifier> <datestamp>2001-12-14</datestamp> <setSpec>cs</setSpec> <setSpec>math</setSpec> </header> <metadata> <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:title>Using Structural Metadata to Localize Experience of Digital Content</dc:title> <dc:creator>Dushay, Naomi</dc:creator> <dc:subject>Digital Libraries</dc:subject> <dc:description>With the increasing technical sophistication of both information consumers and providers, there is increasing demand for more meaningful experiences of digital information. We present a framework that separates digital object experience, or rendering, from digital object storage and manipulation, so the rendering can be tailored to particular communities of users. </dc:description> <dc:description>Comment: 23 pages including 2 appendices, 8 figures</dc:description> <dc:date>2001-12-14</dc:date> </oai_dc:dc> </metadata> </record> </GetRecord> </OAI-PMH>
  8. 8. Data storage 4.3.2 New data collection On top of the initial data we also collect data from two different sources. Illustration 10: Data structure database Sie; red objects are relevant to COCOON CORE project
  9. 9. Additional data: h-index • For each article: • search google scholar • scrape citations • 1000 requests → Captcha • → switch to another server • total runtime: 1 hour • Compute h-index per year
  10. 10. Additional data: hr-index • h-index for Mendeley reads • Reader Meter
  11. 11. Architecture
  12. 12. Interest similarity • Vector space model (Salton,Wang, &Yang, 1975) • every author has a keyword vector • per keyword:TF-IDF = term frequency * inverse document frequency • boolean TF: 1 if author uses keyword, 0 otherwise • IDF: all authors / number of times keyword is used by an author • compute cosine similarity between vectors
  13. 13. Betweenness centrality • requirement: a network • co-author network
  14. 14. Betweenness centrality • requirement: a network • co-author network
  15. 15. Betweenness centrality • requirement: a network • co-author network
  16. 16. Betweenness centrality • requirement: a network • co-author network
  17. 17. Betweenness centrality • requirement: a network • co-author network
  18. 18. Betweenness centrality • requirement: a network • co-author network
  19. 19. OUNL co-author network5 Fig. 1. Co-authorship network X.2.2 Calculations Sie et al. (accepted)
  20. 20. Betweenness centrality • betweenness = number of times an author is on the shortest path between two other authors / total number of shortest paths
  21. 21. Betweenness centrality number of times v is on the shortest path between s and t number of shortest paths between s and t
  22. 22. Betweenness centrality g(Hendrik) = σMarlies,Erik(Hendrik)/σMarlies,Erik + σMarlies,Denis(Hendrik)/ σMarlies,Denis + σPeter,Erik(Hendrik)/σPeter,Erik + σPeter,Denis(Hendrik)/ σPeter,Denis + σRory,Erik(Hendrik)/σRory,Erik + σRory,Denis(Hendrik)/σRory,Denis g(Hendrik) = 2/2 + 2/2 + 1/1 + 1/1 + 1/1 + 1/1 = 6 normalization by (N-1)(N-2)/2 gives Cb(Hendrik) = 0.6Marlies Peter Rory Hendrik Erik Denis
  23. 23. Betweenness centrality g(Hendrik) = σMarlies,Erik(Hendrik)/σMarlies,Erik + σMarlies,Denis(Hendrik)/ σMarlies,Denis + σPeter,Erik(Hendrik)/σPeter,Erik + σPeter,Denis(Hendrik)/ σPeter,Denis + σRory,Erik(Hendrik)/σRory,Erik + σRory,Denis(Hendrik)/σRory,Denis g(Hendrik) = 2/2 + 2/2 + 1/1 + 1/1 + 1/1 + 1/1 = 6 normalization by (N-1)(N-2)/2 gives Cb(Hendrik) = 0.6Marlies Peter Rory Hendrik Erik Denis Hendrik is on the edge of his network
  24. 24. Problem • Find a new co-author with: • similar interest (vector similarity) • influence (betweenness centrality)
  25. 25. GUI
  26. 26. GUI fill out additional keywords
  27. 27. GUI add keyword to user’s vector
  28. 28. GUI adjust sliders to your liking
  29. 29. GUI press the button
  30. 30. Recommendation result
  31. 31. Author page (1/2)
  32. 32. Author page (2/2)
  33. 33. Keyword page
  34. 34. Usability • SUS System Usability Scale: 67/100 points • Q4: no help from a technical person needed COCOON CORE (questions 4 and 10, Figures 7 and 8), for instance not needing a technical person to use COCOON CORE (question 4). Also, when looking at the proportions of responses (Figure 8), participants think that there are few inconsist- encies in COCOON CORE (question 6) and that COCOON CORE is not unneces- sarily complex (question 2). Fig. 7. Median score for each question of the System Usability Scale (SUS) 0" 0,5" 1" 1,5" 2" 2,5" 3" 3,5" 4" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10" Median' Questions'
  35. 35. Considerations • Interest similarity: Keyword vector or keyword network? • average distance between their keywords? • Wordnet as keyword network • GUI: • KISS • Connect individuals directly • Performance and scalability: • graph search depth • smart indexing • PHP or JAVA?
  36. 36. References • Ibarra, H. (1992). Homophily and Differential Returns : Sex Differences in Network Structure and Access in an Advertising Firm. Science, 37(3), 422–447. • Lazarsfeld, P. F., & Merton, R. K. (1954). Friendship as a social process:A substantive and methodological analysis. In M. Berger,T.Abel, & C. H. Page (Eds.), Freedom and Control in Modern Society (Vol. 18, pp. 18–66).Van Nostrand. Retrieved from http://www.questia.com/PM.qst?a=o&docId=23415760 • McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a Feather: Homophily in Social Networks.Annual Review of Sociology, 27(1), 415–444. doi: 10.1146/annurev.soc.27.1.415 • Money, R. B., Gilly, M. C., & Graham, J. L. (1998). Explorations of National Culture and Word-of-Mouth Referral Behavior in the Purchase of Industrial Services in the United States and Japan. Journal of Marketing, 62(October), 76–87. • Park, J. H., & Suh, B. (2013).The impact of influential’s betweenness centraon the WOM effect under the online social networkingservice environment. In Pacific Asia Conference on Information Systems (PACIS 2013). Jeju Island, Korea:The Korea Society of Management Information Systems. • Salton, G.,Wong,A., &Yang, C. S. (1975).A vector space model for automatic indexing. Information Retrieval and Language Processing, 18(11), 613–620. • Sie R. L. L.,Van Engelen, B.J., Bitter-Rijpkema, M., & Sloep, P. B. (accepted). COCOON CORE: CO-author Recommendation based on Betweenness Centrality and Interest Similarity. SpringerVolume on Recommender Systems for Technology Enhanced Learning: Research Trends & Applications, pp. • Stahl, G. (2005). Group cognition in computer-assisted collaborative learning. Journal of Computer Assisted Learning, 21(2), 79–90. doi:10.1111/j. 1365-2729.2005.00115.x
  37. 37. Networks are everywhere
  38. 38. Thank you for your attention! rory.sie@ou.nl http://www.open.ou.nl/rse openrory, maisonpoublon Rory Sie openrse http://nl.linkedin.com/in/rorysie thebigbangrory.blogspot.com

×