Understanding Seo At A Glance

  • 1,578 views
Uploaded on

SEO Tutorial Presentation

SEO Tutorial Presentation

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,578
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Understanding SEO at a Glance E-mail us: presales@mosaic-service.com
  • 2.  
  • 3.  
  • 4.  
  • 5.  
  • 6. Necessity of SEO… Online advertising drives $6 offline (in stores) for every $1 spent online. Search marketing has a greater impact on in-store sales lift than display advertising—three times greater, in fact 74% of respondents used search engines to find local business information versus 65% who turned to print Yellow Pages, 50% who used Internet Yellow Pages, and 44% who used traditional newspapers. 86% surveyed said they have used the Internet to find a local business, a rise from the 70% figure reported the year before. 80% reported researching a product or service online, then making that purchase offline from a local business
  • 7. “ Iprospect and Jupiter” Research… 62% of search engine users click on a search result within the first page of results, and 90% within the first three pages. 41% of search engine users who continue their search when not finding what they seek report changing their search term and/or search engine if they do not find what they’re looking for on the first page of results; 88% report doing so after three pages. 36% of users agree that “seeing a company listed among the top results on a search engine makes me think that the company is a top one within its field.”
  • 8. Searches Breakdown
  • 9. How do “Search Engines” work ?
    • Defining “Search Engine” : system which collects , organizes & presents a way to select Web documents based on certain words, phrases, or patterns within documents
      • Model the Web as a full-text DB
      • Index a portion of the Web docs
      • Search Web documents using user-specified words/patterns in a text
  • 10. Categories of “Search Engines”
      • general-purpose search engine, e.g. Yahoo !, AltaVista and Google
      • special-purpose search engines (or Internet Portals), e.g. LinuxStart (www.linuxstart.com)
  • 11. Components of “Search Engines”
    • Two main components:
    • web crawler (spider), which collects massive Web pages.
    • large database , which stores and indexes collected Web pages.
    • Ranking has to be performed without accessing the text, just the index
  • 12. “ Search Engine” Models
    • Information Retrieval (IR) is a key to search engine or Web Search.
    • Most commonly – used models:
      • Boolean Model
      • Vector Space Model (VSM)
      • Probability Model
      • their variations
  • 13. Google Obsession…
  • 14. Google “PageRank”…
    • The PageRank in Google is defined as follow:
    • Assume page A has pages P 1 ...P n which point to it. The parameter d is a damping factor which can be set between 0 and 1. Also C ( P i ) is defined as the number of links going out of page P i . The PageRank of a page A is given as follows:
    • PR ( A ) = ( 1 - d ) + d ( PR ( P 1 )/ C ( P 1 ) + ... + PR ( P n )/ C ( P n ))
    • Usually the parameter d is set to 0.85. PageRank or PR ( A ) can be calculated using a simple iterative algorithm.
    • Other features: anchor text processing, location information management and various data structures, which fully make use of the features of the web.
  • 15.  
  • 16. Ranking result pages
    • Based on content
      • Based on content
      • Number of occurrences of the search terms
    • Based on link structure
      • Backlink count
      • PageRank
    • And more.
    • (http://www.cs.duke.edu/~junyang/courses/cps296.1-2002-spring/)
  • 17. Problems with content-based ranking
    • Many pages containing search terms may be of poor quality or irrelevant
      • Example: a page with just a line “search engine”.
    • Many high-quality or relevant pages do not even contain the search terms
      • Example: Google homepage
    • Page containing more occurrences of the search terms are ranked higher; spamming is easy
      • Example: a page with line “search engine” Repeated many times
  • 18. Based on link structure
    • Hyperlinks among web pages provide new web search opportunities.
    • Our focus
    • - PageRank
      • HITS
  • 19. Backlink
    • A backlink of a page p is a link that points to p
    • A page with more backlinks is ranked higher
    • Each backlink is a “vote” for the page’s importance
    • Pages pointed by high-ranking pages are ranked higher
    • Definition is recursive by design
  • 20. PageRank
    • Web can be viewed as a huge directed graph G(V, E)
      • where V is the set of web pages (vertices) and E is the set of hyperlinks (directed edges).
    • Each page may have a number of outgoing edges (forward links) and a number of incoming links (backlinks).
    • Each backlink of a page represents a citation to the page.
    • PageRank is a measure of global web page importance based on the backlinks of web pages.
  • 21. “ Crawlers” or “Spiders” in Web… The link structure of the Web serves to bind together all of the pages that were made public as a result of someone linking to them. Through links, search engines’ automated robots, called crawlers or spiders can reach the many billions of interconnected documents.
  • 22.  
  • 23. Hindering “Spiders”…
    • Tools used:
    • Robots.txt : to prevent search engines from crawling your pages.
    • NOINDEX : prevent content appearing in search results by adding "NOINDEX" to robots meta tag.
    • .htaccess : to password protect directories
    • Google Webmaster tools : remove content that has already been crawled
  • 24. Google “Crawl Budget”…
  • 25. Latest Update by “ Matt Cutts”…
    • Factors effecting Crawl Budget :
    • 1) PageRank : the number of pages that Google crawl is roughly proportional to the PageRank" : The pages that get linked to a lot tend to get discovered and crawled quite quickly
    • 2) Host load : It refers to the maximum number of simultaneous connections that a particular web server can handle.
    • Low host load – Allows only one page to be fetched at time.
    • Social Network sites like Facebook, or Twitter have a very high host load because they can take a lot of simultaneous connections.
  • 26.
    • 3) Content : Crawlers discard web pages with duplicate content.
    • Use 301 Redirects for duplicate URLs to merge those together into one single URL.
    • Note : “301 Redirects may result in certain PageRank loss”.
  • 27. How to Search Engines Rank Websites ?
  • 28. How Search Engine evaluate “trust in a Website”…
    • Key Factor : Click distance between your website and the most trusted websites.
    “ Your website” “ Most trusted website” click distance
  • 29. Search Engine “Retrieval and Ranking” Aspects…
    • Relevance : Degree to which the content of the documents returned in a search matches the user’s query intention and terms.
    • Importance or popularity : Relative importance, measured via citation (the act of one work referencing another, as often occurs in academic and business documents) of a given document that matches the user’s query.
    • Relative authority of the site, and the trust the search engine
  • 30. How to determine “Relevancy and Importance ”
    • IR scientists realized that two critical components comprised the majority of search functionality: relevance and importance
    • Combination of relevance and importance determines the ranking order.
    • Popularity and relevance aren’t determined manually
    • Algorithm used : “Ranking factors” or “ Algorithmic ranking criteria” .
  • 31. Analyzing Relevancy and Importance
    • Document analysis (including semantic analysis of concepts across documents)
    • Link (or citation) analysis.
  • 32. Document Analysis
    • Theories/ Concepts Used :
    • Semantic Connectivity
    • Fuzzy Logic Theory
    • Latent Semantic Indexing (LSI)
  • 33. What does “Semantic Connectivity” refers…
    • Semantic connectivity or Co-occurrence refers to words or phrases that are commonly associated with one another.
    • For example, if you see the word aloha you associate it with Hawaii, not Florida.
  • 34. Why to care about “Co-occurrence” ?
    • Keyword-brand associations.
    • Brand visibility across search engines.
    • Co-citation of products and services.
    • Search volume co-occurrence (Co-Volume).
    • Positioning of documents in search results pages ( SERPs).
    • Keywords research and terms discovery.
    • Analysis of seasonal trends.
    • Design of thematic sites.
  • 35. Under-standing Co-occurrence
    • Global; extracted from databases
    • Local; extracted from individual documents
    • Fractal; extracted from self-similar, scaled distributions
  • 36. What matters when working with co-occurrence data…
    • scope; i.e., whether the words behave as broader or narrower terms in a given context.
    • type; i.e., whether we are dealing with nouns, verbs, adjectives, stems, etc
    • synonymity; i.e., whether we are dealing with synonyms .
    • architecture; i.e., whether the documents reside in a horizontal , topic-specific vertical, or regional directory
    • seasonality; i.e., whether we are dealing with repositories containing seasonal trends and periodic fluctuations.
    • sequencing; i.e., the order in which terms are queried or appear in documents.
    • polysemy; . i.e., whether we are dealing with terms with multiple meanings
    • cognates; i.e., whether we are dealing with different terms with same meaning in different languages.
    • query modes; i.e., the retrieval modes used.
  • 37. “ Broader” and “Narrower” Terms…
    • For search query “ dog pet” or “ dog canine”
    • scenario 1: k1 = dog, k2 = canine
    • scenario 2: k1 = dog, k2 = pet
    • As of 06/16/05, searches in Google for these terms return
    • 53,400,000 results for dog
    • 55,800,000 results for pet
    • 3,570,000 results for canine
  • 38.
    • Observations :
    • Dog and pet ( Broader terms) returns more results then Canine ( Narrower Term)
    • Interpretation :
    • Canine is considered as narrower term because :
    • there is a synonymity relationship between "canine" and "dog" but not between "canine" and "pet" or "pet" and "dog".
    • "canine" has different meanings ( polysemy ). According to WordNet, "canine" can be used as a noun or adjective, each having different meanings.
    • "canine" is one of those terms that posses a meaning within a meaning. The terms behave as having a scope within a scope (or context within a context (fractality)] such as Canine of a canine
  • 39. “ Global” Co-occurrence…
    • In Google search engine, the default query mode is “AND”
    • As of 06/16/05 searches in Google for these terms return
    • scenario 1: 12,800,000 for the query, k12 = k1 + k2 = dog pet
    • scenario 2: 1,710,000 for the query, k12 = k1 + k2 = dog canine
  • 40.
    • Observations :
    • both queries return less number of documents
    • new set of results n12 and containing k1 and k2 must be a subset of n1 and n2; i.e., the sets containing k1 only or k2 only.
    • Interpretations :
    • The term "dog" is more frequently co-cited with "pet" than with "canine" since:
    • in scenario 1 we are combining two broader terms.
    • in scenario 1 the terms are not synonyms.
    • in scenario 2 we are combining a broader term with a narrower term.
    • in scenario 2 the terms are synonyms and synonyms rarely occur together but appear in similar contexts.
  • 41. “ Normalized” Co-occurence
    • Also known as Co-Occurrence Index" or C-index.
    • co-citation frequency between two and only two terms k1 and k2, the C-index is given by
    •  
    • where
    • c12 = 0 when n12 = 0; i.e., k1 and k2 do not co-occur (terms are mutually exclusive).
    • c12 > 0 when n12 > 0; i.e., k1 and k2 co-occur (terms are non mutually exclusive).
    • c12 = 1 when n12 = n1 = n2; i.e., k1 and k2 co-occur whenever either term occurs.
  • 42. Applying it to precious example …
    • scenario 1: (12,800,000/(53,400,000 + 55,800,000 - 12,800,000))*1000 = 132.7801 = 133 ppt
    • scenario 2: (1,710,000/(53,400,000 + 3,570,000 - 1,710,000))*1000 = 30.9446 = 31 ppt
  • 43. “ Syntagmatic and Paradigmatic Association” theory
    • Syntagmatic associations are terms that frequently occur together.
    • Paradigmatic associations are terms with high semantic similarity.
    • These type of associations allow us to understand why synonyms do not tend to co-occur together. This has a lot to do with contextuality or lexical neighborhoods.
  • 44. Fuzzy Set Theory…
    • Discovers the semantic connectivity between two words .
    • e.g. . both oranges and bananas are fruits , but both oranges and bananas are not round .
    • a machine knows an orange is round and a banana is not by scanning thousands of occurrences of the words banana and orange in its index and noting that round and banana do not have great concurrence , while orange and round do.
  • 45. Latent Semantic Indexing (LSI)
    • LSI (Latent Semantic Indexing) based on Fuzzy Logic theory uses semantic analysis to identify related web pages .
    • e.g , the search engine may notice one page that talks about doctors and another one that talks about physicians, and determine that there is a relationship between the pages based on the other words in common between the pages.
  • 46. Common types of searches in the IR field.
  • 47. Link Analysis…
    • Semantic Analysis
    • Identifying Authority of Links.
    • Identifying Relevancy of Links.
    • Link neighborhood : concept of grouping sites based on their relevance is referred to as a link neighborhood .
    • Placement of Links
  • 48. Top elements of SEO
    • Content
    • Title tag
    • Meta keyword tag
    • Alt attribute for images : alt attribute was originally intended to allow something to be rendered when viewing of the image is not possible
    • noscript tag: users do not allow JavaScript to run when they load a web page. For those users, nothing would be shown where the JavaScript is on the web page, unless the page contains a noscript tag.
  • 49. Evaluating Content…
    • Content : that defines what a page is about.
    • Act as navigational elements for the search engines during crawl and to do a detailed analysis of each web page
    • search engine performs detailed analysis of all the words and phrases that appear on a web page, and then building a map of that data for it to consider showing your page in the results when a user enters a related search query. This map is referred as semantic map.
  • 50. Semantic Map
    • Defines the relationships between web pages so that the search engine can better understand how to match the right web pages with user search queries.
  • 51. Google working on new techniques
    • search engines are able to detect that you are displaying an image, they have little idea what the image is a picture of, except for whatever information you provide them in the alt attribute
    • search engines will not recognize any text rendered in the image
    • Optical Character Recognition (OCR): to extract text from images
    • Search engines are beginning to extract information from Flash
    • A third type of content that search engines cannot see is the pictorial aspects of anything contained in Flash.
  • 52.
    • when text is converted into a vector-based outline (i.e., rendered graphically), the textual information that search engines can read is lost.
    • Audio and video files are also not easy for search engines to read. There are a few exceptions where the search engines can extract some limited data, such as ID3 tags within MP3 files,
    • Search engines also cannot read any content contained within a program
  • 53. Moving Ahead with AJAX
    • technology that can present significant human-readable content that the search engines cannot see is AJAX.
    • AJAX is a JavaScript-based method for dynamically rendering content on a web page after retrieving the data from a database, without having to refresh the entire page. This is often used in tools where a visitor to a site can provide some input and the AJAX tool then retrieves and renders the correct content.
  • 54. Positive Ranking Factors
    • Keyword use in title tag
    • Anchor text of inbound link
    • Global link authority of site
    • Age of site
    • Link popularity within the site’s internal link structure
    • Topical relevance of inbound links
    • Link popularity of site in topical community
    • Keyword use in body text
    • Global link popularity of sites that link to the site
  • 55. Negative Ranking Factor
    • Server is often inaccessible to crawlers
    • Search engines want their users to have good experiences. If your site is subject to frequent outages, by definition it is not providing a good user experience. So, if the search engine crawler frequently is unable to access your web pages, the search engine will assume that it is dealing with a low-quality site.
    • Content very similar to or duplicate of other web pages
    • External links to low-quality/spam sites
    • Participation in link schemes or actively selling links
    • Duplicate titles/meta tags on many pages
  • 56. Other Ranking Factors
    • Rate of acquisition of links
    • Usage Data
    • User Data
    • Google sandbox
  • 57. Have Some “Google Caffeine”
    • a next-generation architecture for Google’s web search
    • Focus :
    • A ranking system that heightens the importance of page load speeds
    • A more focused relevance on real-time search data
    • Stricter spam controls
  • 58.  
  • 59. Changes with Google Caffeine
    • Changes in how Google stores the massive amount of data gathered by their robots.
    • This is a direct response to the rise in new digital media such as streaming videos, blog posts, social media content ( Twitter, facebook ). The old Google infrastructure was built to handle data by way of Collection > Quality Ranking  > Sandbox > Indexing. However with the explosion of real-time content, search engines are faced with the daunting task of filtering all this content to provide a real-time search.
  • 60.
    • Changes in how the Google collects its data
    • Google uses robots that crawl through the web for data ( googlebot ), this is traditionally data that may not change or update in real-time. The caffeine update must include changes to the robot to cater for real-time content. The theory currently is Google has developed several types of robots that differ in its indexing rate and craw rate to cater for different media content.
  • 61. Google New Algorithm “ Caffeine”
    • an increased weighting on domain authority & some authoritative tag type pages ranking (like Technorati tag pages + Facebook tag pages), as well as pages on sites like Scribd ranking for some long tail queries based mostly on domain authority and sorta spammy on page text
    • perhaps slightly more weight on exact match domain names
    • perhaps a bit better understanding of related words / synonyms
    • tuning down some of the exposure for video & some universal search results
    • the new search engine improves the index size, the speed of the queries and most importantly, changes the value of search engine rankings.
  • 62. A search for on the new infrastructure, for instance, returns video and news results midway down the page .
  • 63. A search on the existing infrastructure, however, returns news at the top, video in the middle, and images at the bottom of the page.
  • 64. Tools to evaluate speed of the site.
    • Page Speed: An open source Firefox/Firebug add-on that evaluates the performance of web pages and gives suggestions for improvement.
    • Yslow: A free tool from Yahoo! that suggests ways to improve website speed.
    • Webpage test: Shows a waterfall view of your pages’ load performance plus an optimization checklist.
    • In Webmaster Tools, Labs > Site Performance shows the speed of your website as experienced by users around the world as in the chart below.
    • We’ve also blogged about site performance.
  • 65. Contact details: Website: www.mosaic-service.com e-mail id: info.mosaic-service.com Direct no: 0120-4626501,0120-4626508