SEO for the Semantic Web


Published on

A brief history of SEO from WWW to RDF, Microformats and SPARQL.
First presented at GeekMeet #2 in Cluj Napoca on Mar 1st 2008

Published in: Technology, Education
  • Thank you for the nice comments Rick!
    The material is very old and SEO is changing radically or slowly dissapearing. I think :)
    Are you sure you want to  Yes  No
    Your message goes here
  • Great presentation material Mihai. Thanks for sharing this as it definitely brought back some Internet memories over the last 15 years. I especially like how you laid out the history and distilled concepts like Pagerank to their essence. It's amazing that in the time since you developed this talk that the number of factors that determine page rank has now gone beyond 200 from what I understand!
    Are you sure you want to  Yes  No
    Your message goes here
  • Thanks for the remark. But it's the past, not the future, the presentation is 2 yrs old. Things changed.
    Are you sure you want to  Yes  No
    Your message goes here
  • Thanks for posting this presentation. I needed a quick understanding of the future of SEO and you provided it.
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SEO for the Semantic Web

  1. 1. How do the machines know what Tasty Wheat tasted like? Mouse – The Matrix
  2. 2. Short SEO History Short SEO History • Web1 0 Web1.0 • Web2.0 • Web3.0
  3. 3. Genesis • A story of the Internet by A story of the Internet, by • Solving the most important problems l i fl db • Greatly influenced by one man…
  4. 4. Tim Berners‐Lee Tim Berners Lee “the World Wide Web is Berners-Lee's alone. He designed it. He loosed it on the g world. And he more than anyone else has fought to keep it open, nonproprietary and free.” Time Magazine, 1999 Time Magazine 1999
  5. 5. The Problem The Problem • Where can I find the information? Where can I find the information? “Our ineptitude in getting at the record is largely caused by the artificiality of the systems of indexing ” indexing. The Atlantic Monthly, 1945
  6. 6. Archie, 1990 Archie, 1990 • Indexed file names and Indexed file names and • Returned results based on pattern matching
  7. 7. Web1.0 Web1 0
  8. 8. Web1.0 • Means HTML Means HTML • Is born in 1991, with the help of • Tim Berners‐Lee (TBL), who also founded i ( ) h l f d d • WWW Consortium (W3C) at MIT, and also • Created WWW Virtual Library – the 1st catalog
  9. 9. Yahoo Directory, 1994 Yahoo Directory, 1994 • Vertical = categories is like Vertical = categories... is like • “Show me all the stuff and I’ll handle it” • Manually indexed stuff, which was ll i d d ff hi h • OK for starters, but… • Websites quickly grew in number and • Y! started charging money for one listing Y! started charging money for one listing • Increasingly more money...
  10. 10. ,1994 • First SE to fully search text First SE to fully search text • Bought by AOL, then • S ld Sold to Excite, which i hi h • Excite went bankrupt and • WebCrawler ends up bought by InfoSpace
  11. 11. Other  Search Engines Other “Search Engines” • 1994, reaches 60mil pages in  96 1994 reaches 60mil pages in ‘96 • 1995, bought by Overture, bought by Y! • 1996, meta search, bought by Lycos 996 h b h b • 1997, bought by IAC/InterActiveCorp • 1999, bought by Overture, meaning Y!
  12. 12. Shopping fun, right? Shopping fun, right?
  13. 13. , 1998 , 1998 • Open Directory Project Open Directory Project • Each listing is checked and certified by a  volunteer • The main source for Google Directory
  14. 14. Current State of Search Industry Current State of Search Industry
  15. 15. Web1.0 Problems • SE couldn’t understand text so SE couldn t understand text, so  • They said “why don’t you implement some  meta tags (description & keywords) so we can  meta tags (description & keywords) so we can get a glimpse of what you’re saying” • Th The relevancy of a page with respect to a  l f ih keyword was determined by a few factors, so • It was very easy to abuse and spam, therefore p q • Search Results had poor qualityy
  16. 16. Web2.0 Web2 0
  17. 17. Web2.0 • Is coined by Tim O’Reilly yet Is coined by... Tim O Reilly, yet • TBL later said that “web2.0” is a stupid,  meaningless term and that he thought of it  meaningless term and that he thought of it first in ’96 anyway
  18. 18. Web2.0 means Web2.0 means • which grew apart because of which grew apart because of • PageRank (1998) invented by • Larry & Sergei who adapted the algo from &S i h d d h l f • An MIT professor who had developed • A nasty mathematical formula for positioning  y p keywords in a 3d space model based on the  relevancy that one kw holds … whatever
  19. 19. PageRank actually means PageRank actually means • That a link is a vote and That a link is a vote and • Not all links are created equal, so • It matters who links to you h li k • Just like in our real life society
  20. 20. • Read the content of pages really well just that Read the content of pages really well, just that • Pages were crappy: –NNon‐standard coding t d d di – Ugly tech (like applets) – Senseless IA • So Google said: “don’t do evil and try to nicely  format the info, according to W3C standards” (remember TBL)
  21. 21. Enter the SEO Enter the SEO
  22. 22. SEO • Is a multitude of practices aimed at facilitating Is a multitude of practices aimed at facilitating  the indexing of pages by search engines • Evolves as the ranking algorithm changes and Evolves as the ranking algorithm changes, and • Of course, the algorithm is kept secret.
  23. 23. SEO actually means SEO actually means Courtesy of Kelly Ishikawa
  24. 24. SEO actually means SEO actually means • An on‐going battle between bots & SEO guys An on going battle between bots & SEO guys • Now 100+ factors influence ranking • And I’d like to take the time to talk about each  d ’d lik k h i lk b h one of them in the following…
  25. 25. Just kidding Just kidding
  26. 26. My SEO Cheat Sheet My SEO Cheat Sheet • Consider: 1. Page Titles 2. URLs (mod_rewrite) 3. Anchor Text 4. Website Architecture (IA) 5. Link Title & Alt Images 6. Relevant content (text) 7. 7 Sitemap xml Sitemap.xml 8. Hosting 9. Freshness
  27. 27. Resources Matt Cutts Blog Mihai’s SEO Cheat Sheet :D
  28. 28. Web2.0 Problems • © for pictures articles books etc for pictures, articles, books, etc • PPC fraud • Privacy i • Search Engine SPAM • Link bombing • Paid links Paid links • But more important...
  29. 29. Web2.0 Problems • SE still don’t understand what the $#%@ SE still don t understand what the $#%@  you’re talking about • Crawling a website’s interface to extract info is Crawling a website s interface to extract info is  almost insane
  30. 30. Web3.0 Web3 0
  31. 31. Web3.0 Web3.0  • Means semantic web semantic web • Attention migrates from syntax/formatting to  semantics and semantics and • Meta Data (data about the data) becomes...
  32. 32. Web3.0 & Resource Description Resource Description Microformats Framework
  33. 33. Resource Description Framework Resource Description Framework • A kind of XML A kind of XML • RDF = Subject + Predicate + Object • S + P + O creates a Triple which O i l hi h • Can describe almost anything in the universe • Triples are connectable (eg: FOAF) • RDFa = XHTML + RDF (W3C compliant) RDFa  XHTML + RDF (W3C compliant)
  34. 34. Microformats • hCalendar  • hCard • rel‐tag • VoteLinks • XFN • Geo • hResume • hReview hR i • etc
  35. 35. Case Study Case Study
  36. 36. SPARQL • SPARQL Protocol and RDF Query Language SPARQL Protocol and RDF Query Language • Standardized on 15th Jan 08 (1 month ago) and • Endorsed by?... TBL d db ? quot;Trying to use the Semantic Web without SPARQL is like trying to use a relational Q y g database without SQL“ TBL
  37. 37. Potential • With SPARQL you skip the presentation layer With SPARQL you skip the presentation layer • You can query ad‐hoc any API, so • You don’t need to crawl in advance, therefore d ’ d li d h f • Information will be as fresh as it gets
  38. 38. And possibilities And possibilities • Query: “I can has pizza?”  Query:  I can has pizza? • Returns:  –Af i d f A friend of yours (XFN ‐ F b k) (XFN Facebook)  – has a colleague (FOAF ‐ LinkedIN) who – said that they make good pizza (hReview ‐ yelp) at ( ) – a restaurant nearby (geo – Gmaps) – Tip: U2 in concert today (hCalendar ‐ upcoming)
  39. 39. Perhaps now we can see Perhaps now we can see • Why Social Networking Communities are Why Social Networking Communities are  worth so much, even though most of them  don’t have a revenue model – Facebook – LinkedIN – Meebo – Beebo  – Pipu... • They/We are the databases of the future
  40. 40. Thanks! “Most of the right choices in SEO come from asking: What’s the best thing for the user?” g g Matt Cutts Mihai Gheza  Mih i Gh Creative Commons Attribution‐Noncommercial‐Share Alike 3.0 Unported License.