Successfully reported this slideshow.
Your SlideShare is downloading. ×

Pydata influencer validation

More Related Content

Pydata influencer validation

  1. 1. Influencer Validation 5th September 2017 Dr Ed Cannon
  2. 2. 2Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 Whose the one?
  3. 3. 3Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 Objective • Create a plugin to validate social media influencers for marketing • Key Advantages: • Can be used by analysts • Can be extended by developers & data scientists • Can be part of a workflow – simple plug & play • Works ontop of hadoop using pyspark can scale to millions • Can get raw metrics & analyse further • Option to output to ppt
  4. 4. 4Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 Influencer Validation A way to measure the effectiveness of an influential social media entity prior to using them as a service • To quickly identify potential influencers for a marketing campaign • Target correct audience • Understand potential value of an influencer/ ROI Who needs influencers to be validated? 1 43 • Brand managers • PR Agencies • Breakdown of metrics across several social media channels • What age group they target • What the audience is interested in • Gender of the audience • Type of account • Written in python code & plugin was created • On demand 2 Why do we need it? How is it ran and how often? What does the service provide?
  5. 5. 5Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 What data sources are used? Primary Data Sources
  6. 6. 6Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 Metrics that Matter  Engagement e.g. youtube comments/likes; twitter retweets  Reach Youtube views, twitter followers etc  Target demographic & Interest e.g. males aged 25-34  Channel(s) e.g. Youtube for a video campaign  Indices: H-index, M-index, G-index
  7. 7. Indices: H, M, G
  8. 8. 8Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 H-index Definition: “A scholar with an index of h has published h papers each of which has been cited in other papers at least h times. Thus, the h-index reflects both the number of publications and the number of citations per publication.” H-index = 116 H-index = 116
  9. 9. 9Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 The H-Index is a means of measuring influence Twitter Author-level metric – measures: productivity (tweets) & engagement (retweets) Tweet Retweeted Retweeted 1 10 25 2 8 8 3 5 5 4 4 3 5 3 3 200 Last 200 tweets H-index = #of tweets which have been retweeted H or more times H-index = 4 H-index = 3
  10. 10. 10Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 Twitter H-index Twitter H-index = 10 Twitter H-index = 5
  11. 11. 11Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 M-Index  The m-index is defined as h/n, where n is the time delta in years between the last tweet and the 200th tweet & h is the h-index  M-indices tend to be higher than H or G-indices as the time taken to tweet 200 times could be days in some circumstances
  12. 12. 12Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 G-Index The g-index can be seen as the h-index for an averaged retweet count, largest number n of highly engaged tweets for which the average number of retweets is at least n The index is calculated based on the distribution of retweets received by a given authors tweets, such that given a set of tweets ranked in decreasing order of the number of of retweets that they received, the g-index is the unique largest number such that the top g tweets received together at least g2 retweets.
  13. 13. Case Study - Foodies
  14. 14. 14Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 Objective Validate a set of influencers that can be used to market a food brand on Youtube, or Twitter, targeting mixed audience, primarily females in age group 25-34, intrested in food & parenting?
  15. 15. 15Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 Followers 6527902 Location London & Essex Created @ Tue Jan 06 14:21:45 2009 Average Reach 728.26 + Sentiment(%) 44.60 - Sentiment(%) 4.50 Average Impact 38.24 H-Index 61 55% 45% Women Men Audience Information 77% 23% Individual Organisation 0 50 100 150 0-9 10-17 18-24 25-34 35-44 45-54 55-64 65+ Audience Age 0% 5% 10% 15% 20% food & drinks family & parenting sports beauty/health & fitness music Audience Interest Followers 5765451 Posts 5090 Videos(last 6m) 250 Average Likes 6414 Average Dislikes 186 Average Views 476999 Average Comments 421 jamieoliver jamieoliver JamieOliver jamieoliver London & Essex
  16. 16. 16Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 Followers 79178 Location United Kingdom Created @ Tue Feb 22 22:19:13 2011 Average Reach 943.12 + Sentiment(%) 54.17 - Sentiment(%) 0.00 Average Impact 46.96 H-Index 15 38% 63% Women Men Audience Information 54% 46% Individual Organisation 0 1 2 3 4 0-9 10-17 18-24 25-34 35-44 45-54 55-64 65+ Audience Age 0% 10% 20% 30% 40% animals & pets tv automotive food & drinks photo & video Audience Interest Followers 139316 Posts 332 Videos(last 6m) 0 Average Likes Not available Average Dislikes Not available Average Views Not available Average Comments Not available CandiceBrown candicebrown candicebrown United Kingdom
  17. 17. 17Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 H-index Foodie Distribution (sample 1K)
  18. 18. 18Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 Foodie H-index By Gender 0.0% 2.5% 5.0% 7.5% 10.0% 0 50 100 150 200 Twitter h−index UserPercentage female male Set threshold acceptance criteria Female foodie influencers have more engagement
  19. 19. 19Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 0% 10% 20% 0 20000 40000 Twitter m−index UserPercentage female male Foodie M-index By Gender Highly influential female influencers who have tweeted a lot over a short period of time @DeniseCop1 74K tweets Joined: July 2013 H-index: 159 M-index: 58K
  20. 20. 20Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 Foodie G-Index By Gender 0% 5% 10% 15% 0 20 40 60 Twitter g−index UserPercentage female male Skewed to female foodies having more collections of tweets with higher retweets
  21. 21. Comparing Foodies To Beauticians
  22. 22. 22Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 Beauty H-index By Gender 0% 20% 40% 0 50 100 150 200 Twitter h−index UserPercentage Female Male 0.0% 2.5% 5.0% 7.5% 10.0% 0 50 100 150 200 Twitter h−index UserPercentage female male FoodieBeauty H-indices lower for beauticians & follows exponential distribution
  23. 23. 23Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 Beauty M-Index 0% 20% 40% 60% 0 20000 40000 Twitter m−index UserPercentage Female Male 0% 10% 20% 0 20000 40000 Twitter m−index UserPercentage female male Both follow exponential distributions FoodieBeauty
  24. 24. 24Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 Beauty G-Index 0% 10% 20% 30% 40% 0 25 50 75 Twitter g−index UserPercentage Female Male 0% 5% 10% 15% 0 20 40 60 Twitter g−index UserPercentage female male FoodieBeauty Foodies & females across these 2 categories have higher G-indices
  25. 25. 25Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 Conclusions  Developed software that can validate influencers for social media marketing campaigns across different channels  Introduced 3-novel indices to measure an influencers engagement (H, M & G)  Indices are quick to calculate, can be incorporated into workflows, are easily scalable in a distributed fashion and used by a variety of audiences & categories  Analysis is automated to provide both metrics & ppt
  26. 26. 26Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 APIs Leveraged  Twitter API  Instagram API  Youtube API  Faceplusplus  Pandas  Requests  Rpy2  Python-pptx
  27. 27. 27Copyright © Capgemini 2015. All Rights Reserved Insights & Data: Data Science | Version 1.0 Questions

Editor's Notes

  • Worked on a number of engagements at Capgemini, one of which I am going to talk about was for a Fast Consumer Marketing Goods Company, they wanted to improve their social media analytics suite, and in particular focus on influencer marketing
  • Identifying the most influential person can often be a difficult task: there is so much criteria to choose from. Do I love the person, do they make me laugh, are we compatible? Who is the one? Contrasting this to digital marketing campaigns – we often need not just one, but many influencers, these influencers are however not for life, but there is different criteria to choose from – do the influencers reach/ engage the number of people I’m looking for? Do they reach the target audience?
  • Create a piece of software in this case a plugin in Dataiku which can be distributed
  • Initial data sources are boxed, but additional data sources are being sourced to enhance the service
  • Wide variety of metrics to get a true 360 picture of an influencer
  • Lets have a look at these indices in more depth, what they are and why they are valuable
  • H-index originally came from academia – whereby you measure the influence of an academics publications
  • On twitter I have defined the H-index to be:

    Used in academic literature to measure the ability of an academic to publish papers and get them cited.
    In the users last 200 tweets, it is the number of tweets that have been retweeted X times
    4th tweet, has only 3 retweets -> H-index = 3
    Likewise, the H-index can be applied to facebook where you take the last 200 posts and calculate the number of likes on each post

    1-request to api per user
  • [2341, 540, 249, 142, 152, 217, 222, 227, 66, 80] whilst stephen has a large number of retweets, he has only tweeted 10 times! Illustrating he has very influential tweets, but is not very active on twitter!

    Peter has retweeted thousands of times, but has been retweeted not very often!
  • M-index reflects how bursty an influencers tweets are and how influential they tweets are.


    Are they tweeting a lot over a short period and receiving much engagement, do they tweet a lot but receive little engagement, or does the candidate tweets very little and have little engagement
  • G-index reflects not just one tweet but a group of tweets and looks at the distribution of a tweeters retweets
  • Ideal scenario, is someone like Jamie Oliver, but we can’t always afford such expensive chefs
  • Fine, maybe we have these two influencers in mind, but how well does the H-index/engagement stack up relative to other Foodies on twitter?
    Chunk of people who could potentially be more influential & cost less that we can mine into
  • Bin Width = 20; Bimodal for women, thresholds -> create benchmarks from previous campaigns etc
  • Bin width = 2k, Most of Denise’s last 200 tweets were done over a 1-day period
  • Bindwidth = 5
  • Exponential Curve – Shows that people care more about eating, less about beauty (less engagement), and it drops substantially!
  • Binwidth = 5; females have more engagement than males, but for beauty there are less tweets that get retweeted a lot

×