Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Entity Disambiguation - the Semantic XRay


Published on

In today's Semantic Web, generic copy has no place.
Using data extraction, we can view our content as Googlebot
Then, implementing entity disambiguation (, we can help Google rank our content with confidence.
This is how I do it.
Got copy you need disambiguating? Need more semantic layers? You can order yours, here:

Published in: Internet

Entity Disambiguation - the Semantic XRay

  2. 2. Overview In this presentation, we‘ll use your example: Maria’s Florist, Monterotondo. We’d just like to put on record, before we rip it to shreds, that compared to much of the content we view online, the original copy was fair! ☺ Taking our own Web Copywriting guidelines and then subjecting the improved text to further analysis, we’ll build up a template to which your copywriters can adhere for future articles. Over time, these best practices do become habit rather than the chore they may will seem today. If any of the following training you find ambiguous, please, please ask for clarification. Remember, the goal is to produce copy that leaves the web’s search indexers in no doubt about the topic of your content, yet present it so that it appears natural and fluent to the untrained eye. Improbable without training? Yes. Impossible? Absolutely not. Buckle up – it’s a bumpy ride »
  3. 3. Rip it up and start again DISSECTING MA RI A’ S FLORIST, MONTEROTONDO
  4. 4. Readability This is the original analysis result, testing your article for NLP. We have: » 6 x hard to read sentences » 4 adverbs » 3 overly-complicated phrases » 4 uses of the passive voice The readability Grade 8 owes much to short sentences (named universities, hospital and funeral homes, etc.). From this original content, we can discern the following »
  5. 5. What the document tells Googlebot From the extracted entities, one could assume that all of the key figures were visible to the search engine. Woo-hoo! Well, maybe. But that’s only half of the story. What does your content say about those ‘entities’? »
  6. 6. Concepts – what’s your content about? Now we’re seeing the bigger picture. Because of the overwhelming presence of content (by percentage) about universities, the search engine thinks that we’re writing about Graduation. Flower appears, so that’s great. Although it could do with being more relevant (61% not great). However, there are some references that may seem to fit in with the context at first glance, but then we dig deep into dbpedia: » A Great Way to Care refers to a Hong Kong medical drama; » Arrangement is musical, rather than a display of flora; » Pomp and Circumstance Marches also musical.
  7. 7. Taxonomies – fitting in with a hierarchy This was the most pleasing aspect of the original document’s content. When categorising the topics found, the tool’s algorithms identified: » /shopping/gifts (although not confident) » /shopping/gifts/flowers » /travel/tourist destinations/Italy (country identified, but troubled that the content is classified as ‘travel’ – perhaps due to the tone of the ‘flowery’ description ☺)
  9. 9. Formatting for scanning on the web The Internet has given birth to a new type of reader: the scanner. The way your copy and accompanying images are laid out on page has never been more critical. As well as the layout appearing aesthetically pleasing to the reader, search engines also understand HTML, the code that marks up text, images and hyperlinks and controls how on-page elements are displayed. HTML is also the foundation of the web’s semantic layer. For the human, there are three key areas of formatting to ensure they, well, read your content: 1. Headings and sub-headings 2. Sentence and paragraph structure 3. Bullet (unordered) or numbered (ordered) lists when summarising numerous benefits and features of a product or service, or its unique elements.
  10. 10. Stuffing keywords Once upon a time, the more often a keyword appeared in your content, the greater the likelihood that it would rank well in SERPs. Add an in-linking template to focus the indexer on those keywords and you could appear in the top 3 results with ease, even in highly competitive niches. Nowadays, that’s just not so. Old keyword-stuffing and linking practises will see you penalised with an indefinite recovery period. There are, however, hot-spots, places where keywords are appropriate. The difference is that relevance is based on the human factor, rather than trying to game a search engine: 1. Keyword density: no greater than 2% (max. 14 appearances per 700-word article); 2. Your main keyword should appear in the main heading and one sub-heading; 3. Your keywords should appear naturally, as they would be spoken in conversation
  11. 11. Composition – handing over the baton Promotional tone should be used sparingly and only when expressly relevant, for two main reasons: 1. Google has stated categorically that its search engine is an informational highway, not a sales channel for businesses with an Internet presence; 2. Customers do not react well to pressurised sales patter. They need to realise the benefits of using your service/product and feel that they are in control of the decision-making process. Your copy should also appeal to your entire target audience. This means making it comprehensible for all potential customers, irrespective of academia. Specifically for Maria’s Florist, there are no intellectual barriers to people who can buy flowers. Copy must therefore be accessible to all. The Fleisch Reading Ease scale will help you determine for whom your copy is suitable. With personalised search now so prevalent, it’s possible that Google Search can tailor results to the academic level of the customer if they’re signed into their Google account. If your copy is deemed too academic, you could potentially lose an audience sector that’s not educated to the standard your copy dictates is prerequisite.
  12. 12. Like your content, but better US I N G TO O L S TO B REA K DOWN T H E CO PY ’ S COMP O N E N TS
  13. 13. De-fluffing – keep it relevant Is all that content necessary? There’s a reason that fiction writers edit with a hatchet. The reader only wants copy that contributes to the story. For web copywriting, this practise has become more critical. Initially because the reader wants the information quickly. But there’s something else. The more content you include as fluff – or irrelevant to your message – the more off topic your article will be in the eyes of the search engine. In the GIF to the right, we look at how this impacts the original content, along with some other errors sampled from the opening paragraphs. (GIF won't play? Go here or Download this presentation.) Use the |► button to flick through frames.
  14. 14. Comparing apples with core values [1] YOUR ORIGINAL RE-WRITE
  15. 15. Comparing apples with core values [2] YOUR ORIGINAL RE-WRITE
  16. 16. Let me entity tame you Nothing pleases us more than a green screen. Maria’s Florist and Monterotondo are the stand-out entities. They are both also classified correctly. All other entities are relevant, nearby locations all incorporated and, more importantly, all are expressed in a positive light. »
  17. 17. This information is classified The highest taxonomy is now clearly defined: /shopping/gifts/flowers With a score of 0.64 relevance, any doubt about what the content describes has been eliminated. Also pleasing, the inclusion of /marriage/weddings Although it’s ‘not confident’, a search for wedding flowers Monterotondo (big money spinner) will make it so.
  18. 18. Copywriting is a conceptual art As with the Taxonomy, the Concepts in the article are now crystal clear. By making the copywriting strong and diversifying with longer-tail keywords, we’ve brought more (relevant) concepts to the table. No, I’ve got no idea where ‘2005 singles’ fits into the mix, but compare the ‘Relevance’ score to the original document’s concepts: none score less than 0.52, showing strong alignment with the topic.
  19. 19. Keywords – who gives a stuff? Not us! Similar to Taxonomies and Concepts, the lowest ‘Relevance’ keyword score is now markedly higher than in the original document. Even without physically including “Florist Monterotondo”, it’s listed as the top keyword with a 0.94 relevance factor (1.0 being optimum). We’ve also clarified that ‘arrangements’ refers to flowers, not music. Plus we see ‘best bouquets’, ‘strong reputation’, ‘skilled florists’ and many more double-barrel keywords becoming highly relevant. Yes, that’s pleasing.
  20. 20. Summary Plan of action: ► Look for ways to add value to the topic you’re writing about over and above that which exists online; ►► This will often mean researching the topic and competition before you write your first word; ► Identify your customer’s pain points and provide them with a solution; ► Write the content so that it is readable (accessible) by your entire audience; ► Remove the passive voice, overly-complicated words and adverbs (as per slide 4) ►► “The road to Hell is paved with adverbs”, Stephen King; ► Pick out the main points in the article and craft them into suitable sub-headings; ►► It’s suggested that a reader should be able to grasp the article’s point by scanning the headings only; ► Structure sentences and paragraphs so that readers can scan them; ►► don’t write huge blocks of text and use bullet/numbered lists where appropriate; ► Ensure that the subject is always doing something to the object: ►► When you need to send birthday wishes to your wife…, not: ►► When birthday wishes need to be sent by you to your wife…; ► Do avail yourself of Copywriting Guidelines and form them as habits. So, there we have it. All of the steps you need to disambiguate entities to make Googlebot bow to your will. It’s not about gaming search engines. It’s not about sell, sell, selling to your human audience. It is about: » clarifying your product and service; » identifying your customer and their pain points; » and then providing solutions that: »»» work for the reader, and »»» that Googlebot can associate with their query. Thank you for your interest.
  21. 21. Get in touch Thank you so much for seeing this presentation through to the end. You host has been Jason Darrell. On social, you'll find him on: ◦ Google+ ◦ LinkedIn ◦ Twitter ◦ Pinterest You can order your Semantic XRay through this PPH 'Hourlie' And do check out his F+ daily ezine (free): Freelancer Plus ezine For more in-depth background to NLP, please see Jason's "Disambiguate Entities" article.