Finding Co-solvers on Twitter, with the Little Help from Linked Data
Finding Co-solvers on Twitter, with a Little Help from Linked DataMilan Stankovic, Hypios, Université Paris-Sorbonne, France*Matthew Rowe, KMi, Open University, UKPhilippe Laublet, Université Paris-Sorbonne, France
Outline• Context• Problem• Our Approach• Evaluation• Example of use• Conclusion and questions
Context: Innovation on the Web Academia Solvers fromInnovation Seekers industry, research etc.
Problem: Find CollaboratorsInnovation Seeker ?? ? Problem solver
Problem: Find Collaborators •How to find collaborators thatInnovation Seeker complement the solver’s competence with regards to the problem ?? ? ? •How to find collaborators that are Problem compatible with him in terms of teamwork solver
Problem: Find Collaborators Complementary CompetenceProblem Interest Similarity Social Similarity solverinspired by social studies on team composition, and factors that influence good teamwork
Our Approachprofiling >> profile extension >> calculation of similarities >> rankingImplementation and tests performed using data from Twitter
Our Approach: Profiling solver conceptialcandidate collaborators social problem
Our Approach: Profiling• Conceptual Profiles – users: Zemanta used to extract DBPedia concepts from textual elements that the user created on twitter (tweets, bio, etc.). Profiles contain concepts and the frequency of their occurrence – problem: Text of the innovation problem treated with Zemanta to extract concepts• Social Profiles – contain all the contacts of a given user on Twitter• Both types of profiles are in vector form.• Simple in purpose, to get most topics, not to specialize for topics of highest expertise
Our Our Approach: Profiling Approach: Profile Extension• Why extend profiles: – imperfection of source data (tweets) – incompleteness of coverage (due to difference in vocabulary some concepts may stay unnoticed) – to perform broader/lateral match
Our Our Approach: Profiling Approach: Profile Extension• How – hyProximity (HPSR): a graph measure using Linked Data (tested on DBPedia) – DMSR: distributional measure inspired by Normalized Google Distance – PRF: Pseudo Relevance Feedback
Our Our Approach: Profiling Approach: Profile Extension • HSPR (hyProximity)HPSR(c1,c 2 ) = å ic(K i ) + å link( p,c1,c 2 ) · pond( p,c1 ) K i Î K (c1 ,c 2 ) pÎ P skos:broader skos:broader dct:subject
Our Our Approach: Profiling Approach: Profile Extension• DMSR – Distributional Measure of Semantic Relatedness ocurrence(c1,c 2 )DMSRτ (c1,c 2 ) = ocurrence(c1 ) + ocurrence(c 2 ) c1 c2 c16 c18 c32 c1 and c2 more related c1 c2 c15 c43 c56 then c1 and c3 c1 c3 c4 c10 c13
Our Our Approach: Profiling Approach: Profile Extension• PRF: Pseudo Relevance Feedback – Distributional measure based on the profiles appearing in the n best ranked solutions. – The same measure of co-occurrence as DMSR, applied to the set of first 10 suggestions – This method can be applied with any ranking technique
Our Approach: Similarities Our Approach: ProfilingComplementarity (Similarity with difference topics) Conceptual Similarity (Similarity of conceptual profiles) Social Similarity (Similarity of Social Profiles)
Our Approach: Profiling Ranking• By one similarity measure – complementarity – conceptual similarity – social similarity• By a linear combination of measures a*Comp+b*ConcSim+c*SocSim• By a product of measures Comp*ConcSim*SocSim
Our Approach: Profiling Evaluation• Evaluation 1 – recommending a collaborator to a group of solvers – a group of 3 solvers (experts in Semantic Web) is trying to solve 3 cross-disciplinary problems – problems inspired from real challenges (workshops, calls for papers, etc.)• Evaluation 2 – recommending collaborators to individual solvers – 12 twitter users, experts in Semantic Web look for collaborators for the same 3 problems
Our Approach: Profiling Evaluation: Metrics• Discounted Cumulative Gain – what is the value of considering first 10 suggestions, and what is the quality of their ordering 10 ratingi DCG = rating1 + ∑ i =2 log 2 i• Average Precision – what is the cumulative benefit of considering each next suggestion in a particular ranking
Our Approach: Profiling Evaluation 1• Discounted Cumulative Gain compatibility
Our Approach: Profiling Evaluation 2• Composite Ranking Functions: Product – Comp*ConcSim*SocSim – PRF(Comp*ConcSim*SocSim): PRF problem profile expansion with composite similarity. – HSPR(Comp)*ConcSim*SocSim: HPSR expansion performed on difference topics prior to calculating the complementarity (similarity with difference topics) – Comp*DMSR(ConcSim)*SocSim: DMSR expansion performed over the seed user profile prior to calculating interest similarity. – HSPR(Comp)*DMSR(ConcSim)*SocSim: composite function in which HPSR is used to expand profile topics and DMSR to expand seed user topic profile prior to calculating the similarities.
Our Approach: Profiling Conclusions• The Linked Data based concept expansion technique (hyProximity) gives best results when expanding topics for Compatibility measures. A distributional one works slightly better for Conceptual Similarity measures.• In a composite ranking function, expanding profiles with hyProximity is beneficial if applied only to Compatibility. Expansion in both Compatibility and Conceptual Similarity has negative effects.• All profile expansion techniques, applied individually, have positive effects in comparisons to direct similarity calculation with no expansion.
Our Approach: Profiling Take AwayCompatibility Expansion( , ) Problem hyProximity a Linked Data- based measureConceptual Similarity DMSR a distributional measure
Our Approach: Profiling ExampleProblem : Semantic Web representation of start-up history for start-up performance indicatorsUser: Milan Stankovic (@milstan) Angel investor specializedSuggestions: davidsrose in technology statups fundingpost ECVentureCapita BVCA Investors and Entrepreneurs, Information vc20 technology AndySack CVCACanada Austin_Startups tgmtgm Entrepreneur, Social davidblerner Networks (KLOUT), Metrics