Machine Learning at PeerIndex
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Machine Learning at PeerIndex

on

  • 3,977 views

Slides for talk given at London Machine Learning Meetup on 29 Feb about machine learning behind measuring people's influence at PeerIndex.

Slides for talk given at London Machine Learning Meetup on 29 Feb about machine learning behind measuring people's influence at PeerIndex.

Statistics

Views

Total Views
3,977
Views on SlideShare
1,282
Embed Views
2,695

Actions

Likes
4
Downloads
11
Comments
0

6 Embeds 2,695

http://datasciencelondon.org 1845
http://blog.peerindex.com 843
https://twitter.com 3
http://webcache.googleusercontent.com 2
http://translate.googleusercontent.com 1
http://www.slashdocs.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Machine Learning at PeerIndex Presentation Transcript

  • 1. Machine Learning at PeerIndex @fhuszar Ferenc HuszárWednesday, 16 May 12
  • 2. PeerIndex.com: understand your influenceWednesday, 16 May 12
  • 3. PeerPerks.com: free stuff for influencersWednesday, 16 May 12
  • 4. PeerPerks: free stuff for influencersWednesday, 16 May 12
  • 5. Machine Learning @ PeerIndexWednesday, 16 May 12
  • 6. Machine Learning @ PeerIndex • The usual stuffWednesday, 16 May 12
  • 7. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLsWednesday, 16 May 12
  • 8. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedInWednesday, 16 May 12
  • 9. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the systemWednesday, 16 May 12
  • 10. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the system • sentiment classification: happy/sad/neutralWednesday, 16 May 12
  • 11. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the system • sentiment classification: happy/sad/neutral • The really exciting stuffWednesday, 16 May 12
  • 12. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the system • sentiment classification: happy/sad/neutral • The really exciting stuff • inferring networks of influence - more about this laterWednesday, 16 May 12
  • 13. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the system • sentiment classification: happy/sad/neutral • The really exciting stuff • inferring networks of influence - more about this later • visualise different aspects of influence, in an engaging wayWednesday, 16 May 12
  • 14. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the system • sentiment classification: happy/sad/neutral • The really exciting stuff • inferring networks of influence - more about this later • visualise different aspects of influence, in an engaging way • influence maximisation - submodular optimisationWednesday, 16 May 12
  • 15. Inferring networks of influenceWednesday, 16 May 12
  • 16. Inferring networks of influence Social networkWednesday, 16 May 12
  • 17. Inferring networks of influence Social network Propagation probabilities pi,jWednesday, 16 May 12
  • 18. Inferring networks of influence Social network Propagation probabilities pi,j Information cascade logs http://www.pcworld.com/article/239719 http://techcrunch.com/2011/11/21/... 1079306 2011-08-25T00:03:06+01:00 259725 2011-10-24T03:32:19+01:00 4549198 2011-08-25T04:32:25+01:00 76539 2011-10-24T03:32:23+01:00 2662975 2011-08-25T00:35:11+01:00 1922351 2011-10-24T04:28:47+01:00 2333224 2011-08-25T01:43:18+01:00 9183 2011-10-24T03:30:57+01:00 3141371 2011-08-25T01:52:06+01:00 3335398 2011-10-24T03:34:01+01:00 3482720 2011-08-25T07:18:24+01:00 1616885 2011-10-24T03:48:16+01:00 1403682 2011-08-25T03:52:58+01:00 82198 2011-10-24T03:48:29+01:00 4679657 2011-08-25T01:07:48+01:00 906390 2011-10-24T23:13:51+01:00 32460 2011-08-25T01:11:43+01:00 1051322 2011-10-24T03:40:02+01:00Wednesday, 16 May 12
  • 19. Heurisric approaches to estimate pi,jWednesday, 16 May 12
  • 20. Heurisric approaches to estimate pi,j • purely based on local network structure 1 pi,j din (j)Wednesday, 16 May 12
  • 21. Heurisric approaches to estimate pi,j • purely based on local network structure 1 pi,j din (j) • trivalency “model” my personal favourite :) pi,j {0.1, 0.01, 0.01} randomlyWednesday, 16 May 12
  • 22. Heurisric approaches to estimate pi,j • purely based on local network structure 1 pi,j din (j) • trivalency “model” my personal favourite :) pi,j {0.1, 0.01, 0.01} randomly • data-driven heuristics number of items shared by j after i shared it pi,j number of items shared by iWednesday, 16 May 12
  • 23. Heurisric approaches to estimate pi,j • purely based on local network structure 1 pi,j din (j) • trivalency “model” my personal favourite :) pi,j {0.1, 0.01, 0.01} randomly • data-driven heuristics number of items shared by j after i shared it pi,j number of items shared by i How do you solve this with machine learning?Wednesday, 16 May 12
  • 24. The likelihoodWednesday, 16 May 12
  • 25. The likelihood P( D | ✓ )Wednesday, 16 May 12
  • 26. The likelihood P( D | ✓ ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00Wednesday, 16 May 12
  • 27. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,jWednesday, 16 May 12
  • 28. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , unWednesday, 16 May 12
  • 29. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascadeWednesday, 16 May 12
  • 30. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade p0,u1Wednesday, 16 May 12
  • 31. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade p0,u1(1 (1 p0,u2 ) (1 pu1 ,u2 ))Wednesday, 16 May 12
  • 32. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade p0,u1(1 (1 p0,u2 ) (1 pu1 ,u2 ))· · ·Wednesday, 16 May 12
  • 33. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade 0 1 n Y i 1 Y = @1 (1 puj ,ui )A i=1 j=1Wednesday, 16 May 12
  • 34. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade 0 1 n Y i 1 Y = @1 (1 puj ,ui )A i=1 j=1 for users that are not in cascadeWednesday, 16 May 12
  • 35. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade 0 1 n Y i 1 Y = @1 (1 puj ,ui )A i=1 j=1 for users that are not in cascade Y Y (1 pu,v ) u2{u1 ...un } v2users /Wednesday, 16 May 12
  • 36. Maximum likelihood at scaleWednesday, 16 May 12
  • 37. Maximum likelihood at scale • data too sparse to learn one parameter per edgeWednesday, 16 May 12
  • 38. Maximum likelihood at scale • data too sparse to learn one parameter per edge • large scale gradient-based optimisation is costlyWednesday, 16 May 12
  • 39. Maximum likelihood at scale • data too sparse to learn one parameter per edge • large scale gradient-based optimisation is costly • Solution: combine ensemble of heuristics with MLWednesday, 16 May 12
  • 40. Maximum likelihood at scale • data too sparse to learn one parameter per edge • large scale gradient-based optimisation is costly • Solution: combine ensemble of heuristics with ML • use heuristics to compute probabilities at scaleWednesday, 16 May 12
  • 41. Maximum likelihood at scale • data too sparse to learn one parameter per edge • large scale gradient-based optimisation is costly • Solution: combine ensemble of heuristics with ML • use heuristics to compute probabilities at scale • use ML to tune parameters on small-scale dataWednesday, 16 May 12
  • 42. Influence maximisationWednesday, 16 May 12
  • 43. Influence maximisation • Select a set of users to maximise outreachWednesday, 16 May 12
  • 44. Influence maximisation • Select a set of users to maximise outreach • Influence of people combines non-linearlyWednesday, 16 May 12
  • 45. Influence maximisation • Select a set of users to maximise outreach • Influence of people combines non-linearly • In many models it combines sub-modularlyWednesday, 16 May 12
  • 46. Influence maximisation • Select a set of users to maximise outreach • Influence of people combines non-linearly • In many models it combines sub-modularly A ✓ B =) f (A [ {x}) f (A) f (B [ {x}) f (B)Wednesday, 16 May 12
  • 47. Influence maximisation • Select a set of users to maximise outreach • Influence of people combines non-linearly • In many models it combines sub-modularly A ✓ B =) f (A [ {x}) f (A) f (B [ {x}) f (B) • these functions are fun to optimiseWednesday, 16 May 12
  • 48. Influence maximisation • Select a set of users to maximise outreach • Influence of people combines non-linearly • In many models it combines sub-modularly A ✓ B =) f (A [ {x}) f (A) f (B [ {x}) f (B) • these functions are fun to optimise • pops up many times in machine learningWednesday, 16 May 12
  • 49. Wrap upWednesday, 16 May 12
  • 50. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerksWednesday, 16 May 12
  • 51. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasksWednesday, 16 May 12
  • 52. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasks • some uniquely exciting problemsWednesday, 16 May 12
  • 53. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasks • some uniquely exciting problems • inferring propagation probabilitiesWednesday, 16 May 12
  • 54. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasks • some uniquely exciting problems • inferring propagation probabilities • compute expected number of users one reaches out toWednesday, 16 May 12
  • 55. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasks • some uniquely exciting problems • inferring propagation probabilities • compute expected number of users one reaches out to • putting all aspects together into a single number, and visualiseWednesday, 16 May 12
  • 56. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasks • some uniquely exciting problems • inferring propagation probabilities • compute expected number of users one reaches out to • putting all aspects together into a single number, and visualise • influence maximisationWednesday, 16 May 12
  • 57. Thanks We’re hiring ML scientists, interns and engineers... @fhuszar fh@peerindex.comWednesday, 16 May 12