Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What is Content Analytics - MeasureCamp London 2016


Published on

We usually say "Content is king" but in our Analytics tool, we usually talk about traffic and conversion. There are a lot of Natural Language Processing metrics that we could use to have a better understanding of your actual content. Let's dive into that.

Published in: Internet
  • Be the first to comment

What is Content Analytics - MeasureCamp London 2016

  1. 1. What is Content Analytics ?
  2. 2. Content is King
  3. 3. ...and yet what content metrics and dimensions do you use ?
  4. 4. On Google Analytics Some dimensions : ● Title ● URL ● Keywords (or what is left of it) No actual metrics directly related to content
  5. 5. What should we get ?
  6. 6. NLP Data ● Natural Language Processing statistics New data : – How many times the main keywords are in my content ? – How many times these keywords are subject of a sentence ? – How relevant are the words I am using ?
  7. 7. Quick poll Who has ever heard about TF-IDF metric ?
  8. 8. Metric : TF - IDF Numerical statistic that is intended to reflect how important a word is to a document in a corpus Frequency of a word (or series of words) in a document. To avoid words that would be too specific to only 1 document, it is compared to the frequency in the corpus
  9. 9. Quick poll Who knows what is a n-gram ?
  10. 10. N-gram What is a n-gram ? N-gram is a contiguous sequence of n items from a given sequence of text.
  11. 11. Example of 2-grams I am attending Measure Camp in London ● I am ● am attending ● attending Measure ● Measure Camp ● Camp in ● in London
  12. 12. If you remove useless words ● attending Measure ● Measure Camp ● Camp London
  13. 13. Let's say you want to be as relevant as possible (and therefore rank on Google) for « Measure Camp »
  14. 14. 1st step Analyse your content with a n-gram analysis
  15. 15. 2nd - Topic Corpus Now, create a Topic corpus around your keyword (basically, pages ranked in Google) Let's get 100 top results for these keywords ● Analytics event ● Analytics conference ● Measure Camp Get the n-gram within all the documents (around 200 documents if you remove duplicate) Calculate TF-IDF for each n gram
  16. 16. YAY !!! : My first relevant Content Metrics:) measure camp : 100 (very frequent) analytics conference : 60 (quite frequent) ● Peter O'Neill : 50 (quite frequent) ● Stay (in) London : 30 (somewhat frequent) * not actual data. Simplified version of TF-IDF
  17. 17. Now, create a topic-neutral corpus (basically take thousands and thousands of random webpages and create a corpus with it) Get the n-gram out of it Extract : Click here (very frequent) Stay London (appears a few times) Peter O'Neill (nowhere to be found) Measure Camp (1 time in the corpus) 3rd – topic neutral corpus
  18. 18. 4 - Now let's compare ● Stay London : somewhat frequent in both corpus : not so relevant for your content ● Peter O'Neill : Yay ! ● Measure Camp : not so frequent in English, very frequent in our topic corpus : I shall use it
  19. 19. ● Big data : very frequent in the topic corpus, not seo frequent → Oh, sounds like something people want to hear about. Let's write content about it.
  20. 20. 5 – Optimize your content Proofread your content with these new relevant expressions in mind. Can I add more value to the user ? Can it help improve my organic ranking ?
  21. 21. Let's discuss What kind of other content metrics or dimensions would we use ?