Machine Learning at                           PeerIndex                             @fhuszar     Ferenc HuszárWednesday, 1...
PeerIndex.com: understand your influenceWednesday, 16 May 12
PeerPerks.com: free stuff for influencersWednesday, 16 May 12
PeerPerks: free stuff for influencersWednesday, 16 May 12
Machine Learning @ PeerIndexWednesday, 16 May 12
Machine Learning @ PeerIndex                   •   The usual stuffWednesday, 16 May 12
Machine Learning @ PeerIndex                   •   The usual stuff                       •   topic modelling/classification...
Machine Learning @ PeerIndex                   •   The usual stuff                       •   topic modelling/classification...
Machine Learning @ PeerIndex                   •   The usual stuff                       •   topic modelling/classification...
Machine Learning @ PeerIndex                   •   The usual stuff                       •   topic modelling/classification...
Machine Learning @ PeerIndex                   •   The usual stuff                       •   topic modelling/classification...
Machine Learning @ PeerIndex                   •   The usual stuff                       •   topic modelling/classification...
Machine Learning @ PeerIndex                   •   The usual stuff                       •   topic modelling/classification...
Machine Learning @ PeerIndex                   •   The usual stuff                       •   topic modelling/classification...
Inferring networks of influenceWednesday, 16 May 12
Inferring networks of influence           Social networkWednesday, 16 May 12
Inferring networks of influence           Social network                Propagation probabilities                          ...
Inferring networks of influence           Social network                                                            Propaga...
Heurisric approaches to estimate pi,jWednesday, 16 May 12
Heurisric approaches to estimate pi,j                •      purely based on local network structure                       ...
Heurisric approaches to estimate pi,j                •      purely based on local network structure                       ...
Heurisric approaches to estimate pi,j                •      purely based on local network structure                       ...
Heurisric approaches to estimate pi,j                •      purely based on local network structure                       ...
The likelihoodWednesday, 16 May 12
The likelihood          P( D |                        ✓ )Wednesday, 16 May 12
The likelihood          P( D |                                                ✓ )                       http://www.pcworld...
The likelihood          P( D |                                                       )                       http://www.pc...
The likelihood          P( D |                                                           )                           http:...
The likelihood          P( D |                                                                      )                     ...
The likelihood          P( D |                                                                      )                     ...
The likelihood          P( D |                                                                              )             ...
The likelihood          P( D |                                                                                   )        ...
The likelihood          P( D |                                                                                   )        ...
The likelihood          P( D |                                                                                     )      ...
The likelihood          P( D |                                                                                      )     ...
Maximum likelihood at scaleWednesday, 16 May 12
Maximum likelihood at scale                   •   data too sparse to learn one parameter per edgeWednesday, 16 May 12
Maximum likelihood at scale                   •   data too sparse to learn one parameter per edge                   •   la...
Maximum likelihood at scale                   •   data too sparse to learn one parameter per edge                   •   la...
Maximum likelihood at scale                   •   data too sparse to learn one parameter per edge                   •   la...
Maximum likelihood at scale                   •   data too sparse to learn one parameter per edge                   •   la...
Influence maximisationWednesday, 16 May 12
Influence maximisation                   • Select a set of users to maximise outreachWednesday, 16 May 12
Influence maximisation                   • Select a set of users to maximise outreach                   • Influence of peopl...
Influence maximisation                   • Select a set of users to maximise outreach                   • Influence of peopl...
Influence maximisation                   • Select a set of users to maximise outreach                   • Influence of peopl...
Influence maximisation                   • Select a set of users to maximise outreach                   • Influence of peopl...
Influence maximisation                   • Select a set of users to maximise outreach                   • Influence of peopl...
Wrap upWednesday, 16 May 12
Wrap up                   •   two lines of ‘data’ products: PeerIndex, PeerPerksWednesday, 16 May 12
Wrap up                   •   two lines of ‘data’ products: PeerIndex, PeerPerks                   •   lots of ‘standard’ ...
Wrap up                   •   two lines of ‘data’ products: PeerIndex, PeerPerks                   •   lots of ‘standard’ ...
Wrap up                   •   two lines of ‘data’ products: PeerIndex, PeerPerks                   •   lots of ‘standard’ ...
Wrap up                   •   two lines of ‘data’ products: PeerIndex, PeerPerks                   •   lots of ‘standard’ ...
Wrap up                   •   two lines of ‘data’ products: PeerIndex, PeerPerks                   •   lots of ‘standard’ ...
Wrap up                   •   two lines of ‘data’ products: PeerIndex, PeerPerks                   •   lots of ‘standard’ ...
Thanks            We’re hiring ML scientists, interns and engineers...                                @fhuszar            ...
Upcoming SlideShare
Loading in...5
×

Machine Learning at PeerIndex

3,818

Published on

Slides for talk given at London Machine Learning Meetup on 29 Feb about machine learning behind measuring people's influence at PeerIndex.

Published in: Technology, Education
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,818
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
14
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Machine Learning at PeerIndex

    1. 1. Machine Learning at PeerIndex @fhuszar Ferenc HuszárWednesday, 16 May 12
    2. 2. PeerIndex.com: understand your influenceWednesday, 16 May 12
    3. 3. PeerPerks.com: free stuff for influencersWednesday, 16 May 12
    4. 4. PeerPerks: free stuff for influencersWednesday, 16 May 12
    5. 5. Machine Learning @ PeerIndexWednesday, 16 May 12
    6. 6. Machine Learning @ PeerIndex • The usual stuffWednesday, 16 May 12
    7. 7. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLsWednesday, 16 May 12
    8. 8. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedInWednesday, 16 May 12
    9. 9. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the systemWednesday, 16 May 12
    10. 10. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the system • sentiment classification: happy/sad/neutralWednesday, 16 May 12
    11. 11. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the system • sentiment classification: happy/sad/neutral • The really exciting stuffWednesday, 16 May 12
    12. 12. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the system • sentiment classification: happy/sad/neutral • The really exciting stuff • inferring networks of influence - more about this laterWednesday, 16 May 12
    13. 13. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the system • sentiment classification: happy/sad/neutral • The really exciting stuff • inferring networks of influence - more about this later • visualise different aspects of influence, in an engaging wayWednesday, 16 May 12
    14. 14. Machine Learning @ PeerIndex • The usual stuff • topic modelling/classification of tweets/statuses/URLs • identity resolution across twitter, facebook, linkedIn • spambot/fraud detection: identify people gaming the system • sentiment classification: happy/sad/neutral • The really exciting stuff • inferring networks of influence - more about this later • visualise different aspects of influence, in an engaging way • influence maximisation - submodular optimisationWednesday, 16 May 12
    15. 15. Inferring networks of influenceWednesday, 16 May 12
    16. 16. Inferring networks of influence Social networkWednesday, 16 May 12
    17. 17. Inferring networks of influence Social network Propagation probabilities pi,jWednesday, 16 May 12
    18. 18. Inferring networks of influence Social network Propagation probabilities pi,j Information cascade logs http://www.pcworld.com/article/239719 http://techcrunch.com/2011/11/21/... 1079306 2011-08-25T00:03:06+01:00 259725 2011-10-24T03:32:19+01:00 4549198 2011-08-25T04:32:25+01:00 76539 2011-10-24T03:32:23+01:00 2662975 2011-08-25T00:35:11+01:00 1922351 2011-10-24T04:28:47+01:00 2333224 2011-08-25T01:43:18+01:00 9183 2011-10-24T03:30:57+01:00 3141371 2011-08-25T01:52:06+01:00 3335398 2011-10-24T03:34:01+01:00 3482720 2011-08-25T07:18:24+01:00 1616885 2011-10-24T03:48:16+01:00 1403682 2011-08-25T03:52:58+01:00 82198 2011-10-24T03:48:29+01:00 4679657 2011-08-25T01:07:48+01:00 906390 2011-10-24T23:13:51+01:00 32460 2011-08-25T01:11:43+01:00 1051322 2011-10-24T03:40:02+01:00Wednesday, 16 May 12
    19. 19. Heurisric approaches to estimate pi,jWednesday, 16 May 12
    20. 20. Heurisric approaches to estimate pi,j • purely based on local network structure 1 pi,j din (j)Wednesday, 16 May 12
    21. 21. Heurisric approaches to estimate pi,j • purely based on local network structure 1 pi,j din (j) • trivalency “model” my personal favourite :) pi,j {0.1, 0.01, 0.01} randomlyWednesday, 16 May 12
    22. 22. Heurisric approaches to estimate pi,j • purely based on local network structure 1 pi,j din (j) • trivalency “model” my personal favourite :) pi,j {0.1, 0.01, 0.01} randomly • data-driven heuristics number of items shared by j after i shared it pi,j number of items shared by iWednesday, 16 May 12
    23. 23. Heurisric approaches to estimate pi,j • purely based on local network structure 1 pi,j din (j) • trivalency “model” my personal favourite :) pi,j {0.1, 0.01, 0.01} randomly • data-driven heuristics number of items shared by j after i shared it pi,j number of items shared by i How do you solve this with machine learning?Wednesday, 16 May 12
    24. 24. The likelihoodWednesday, 16 May 12
    25. 25. The likelihood P( D | ✓ )Wednesday, 16 May 12
    26. 26. The likelihood P( D | ✓ ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00Wednesday, 16 May 12
    27. 27. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,jWednesday, 16 May 12
    28. 28. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , unWednesday, 16 May 12
    29. 29. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascadeWednesday, 16 May 12
    30. 30. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade p0,u1Wednesday, 16 May 12
    31. 31. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade p0,u1(1 (1 p0,u2 ) (1 pu1 ,u2 ))Wednesday, 16 May 12
    32. 32. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade p0,u1(1 (1 p0,u2 ) (1 pu1 ,u2 ))· · ·Wednesday, 16 May 12
    33. 33. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade 0 1 n Y i 1 Y = @1 (1 puj ,ui )A i=1 j=1Wednesday, 16 May 12
    34. 34. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade 0 1 n Y i 1 Y = @1 (1 puj ,ui )A i=1 j=1 for users that are not in cascadeWednesday, 16 May 12
    35. 35. The likelihood P( D | ) http://www.pcworld.com/article/239719 1079306 2011-08-25T00:03:06+01:00 4549198 2011-08-25T04:32:25+01:00 2662975 2011-08-25T00:35:11+01:00 2333224 2011-08-25T01:43:18+01:00 3141371 2011-08-25T01:52:06+01:00 3482720 2011-08-25T07:18:24+01:00 1403682 2011-08-25T03:52:58+01:00 4679657 2011-08-25T01:07:48+01:00 32460 2011-08-25T01:11:43+01:00 pi,j what’s the probability of the cascade u1 , u2 , u3 , . . . , un for subsequent users in cascade 0 1 n Y i 1 Y = @1 (1 puj ,ui )A i=1 j=1 for users that are not in cascade Y Y (1 pu,v ) u2{u1 ...un } v2users /Wednesday, 16 May 12
    36. 36. Maximum likelihood at scaleWednesday, 16 May 12
    37. 37. Maximum likelihood at scale • data too sparse to learn one parameter per edgeWednesday, 16 May 12
    38. 38. Maximum likelihood at scale • data too sparse to learn one parameter per edge • large scale gradient-based optimisation is costlyWednesday, 16 May 12
    39. 39. Maximum likelihood at scale • data too sparse to learn one parameter per edge • large scale gradient-based optimisation is costly • Solution: combine ensemble of heuristics with MLWednesday, 16 May 12
    40. 40. Maximum likelihood at scale • data too sparse to learn one parameter per edge • large scale gradient-based optimisation is costly • Solution: combine ensemble of heuristics with ML • use heuristics to compute probabilities at scaleWednesday, 16 May 12
    41. 41. Maximum likelihood at scale • data too sparse to learn one parameter per edge • large scale gradient-based optimisation is costly • Solution: combine ensemble of heuristics with ML • use heuristics to compute probabilities at scale • use ML to tune parameters on small-scale dataWednesday, 16 May 12
    42. 42. Influence maximisationWednesday, 16 May 12
    43. 43. Influence maximisation • Select a set of users to maximise outreachWednesday, 16 May 12
    44. 44. Influence maximisation • Select a set of users to maximise outreach • Influence of people combines non-linearlyWednesday, 16 May 12
    45. 45. Influence maximisation • Select a set of users to maximise outreach • Influence of people combines non-linearly • In many models it combines sub-modularlyWednesday, 16 May 12
    46. 46. Influence maximisation • Select a set of users to maximise outreach • Influence of people combines non-linearly • In many models it combines sub-modularly A ✓ B =) f (A [ {x}) f (A) f (B [ {x}) f (B)Wednesday, 16 May 12
    47. 47. Influence maximisation • Select a set of users to maximise outreach • Influence of people combines non-linearly • In many models it combines sub-modularly A ✓ B =) f (A [ {x}) f (A) f (B [ {x}) f (B) • these functions are fun to optimiseWednesday, 16 May 12
    48. 48. Influence maximisation • Select a set of users to maximise outreach • Influence of people combines non-linearly • In many models it combines sub-modularly A ✓ B =) f (A [ {x}) f (A) f (B [ {x}) f (B) • these functions are fun to optimise • pops up many times in machine learningWednesday, 16 May 12
    49. 49. Wrap upWednesday, 16 May 12
    50. 50. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerksWednesday, 16 May 12
    51. 51. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasksWednesday, 16 May 12
    52. 52. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasks • some uniquely exciting problemsWednesday, 16 May 12
    53. 53. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasks • some uniquely exciting problems • inferring propagation probabilitiesWednesday, 16 May 12
    54. 54. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasks • some uniquely exciting problems • inferring propagation probabilities • compute expected number of users one reaches out toWednesday, 16 May 12
    55. 55. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasks • some uniquely exciting problems • inferring propagation probabilities • compute expected number of users one reaches out to • putting all aspects together into a single number, and visualiseWednesday, 16 May 12
    56. 56. Wrap up • two lines of ‘data’ products: PeerIndex, PeerPerks • lots of ‘standard’ machine learning tasks • some uniquely exciting problems • inferring propagation probabilities • compute expected number of users one reaches out to • putting all aspects together into a single number, and visualise • influence maximisationWednesday, 16 May 12
    57. 57. Thanks We’re hiring ML scientists, interns and engineers... @fhuszar fh@peerindex.comWednesday, 16 May 12
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×