Understanding Seo At A Glance
Upcoming SlideShare
Loading in...5

Understanding Seo At A Glance



SEO Tutorial Presentation

SEO Tutorial Presentation



Total Views
Slideshare-icon Views on SlideShare
Embed Views



2 Embeds 4

http://www.linkedin.com 3
https://www.linkedin.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Understanding Seo At A Glance Understanding Seo At A Glance Presentation Transcript

    • Understanding SEO at a Glance E-mail us: presales@mosaic-service.com
    • Necessity of SEO… Online advertising drives $6 offline (in stores) for every $1 spent online. Search marketing has a greater impact on in-store sales lift than display advertising—three times greater, in fact 74% of respondents used search engines to find local business information versus 65% who turned to print Yellow Pages, 50% who used Internet Yellow Pages, and 44% who used traditional newspapers. 86% surveyed said they have used the Internet to find a local business, a rise from the 70% figure reported the year before. 80% reported researching a product or service online, then making that purchase offline from a local business
    • “ Iprospect and Jupiter” Research… 62% of search engine users click on a search result within the first page of results, and 90% within the first three pages. 41% of search engine users who continue their search when not finding what they seek report changing their search term and/or search engine if they do not find what they’re looking for on the first page of results; 88% report doing so after three pages. 36% of users agree that “seeing a company listed among the top results on a search engine makes me think that the company is a top one within its field.”
    • Searches Breakdown
    • How do “Search Engines” work ?
      • Defining “Search Engine” : system which collects , organizes & presents a way to select Web documents based on certain words, phrases, or patterns within documents
        • Model the Web as a full-text DB
        • Index a portion of the Web docs
        • Search Web documents using user-specified words/patterns in a text
    • Categories of “Search Engines”
        • general-purpose search engine, e.g. Yahoo !, AltaVista and Google
        • special-purpose search engines (or Internet Portals), e.g. LinuxStart (www.linuxstart.com)
    • Components of “Search Engines”
      • Two main components:
      • web crawler (spider), which collects massive Web pages.
      • large database , which stores and indexes collected Web pages.
      • Ranking has to be performed without accessing the text, just the index
    • “ Search Engine” Models
      • Information Retrieval (IR) is a key to search engine or Web Search.
      • Most commonly – used models:
        • Boolean Model
        • Vector Space Model (VSM)
        • Probability Model
        • their variations
    • Google Obsession…
    • Google “PageRank”…
      • The PageRank in Google is defined as follow:
      • Assume page A has pages P 1 ...P n which point to it. The parameter d is a damping factor which can be set between 0 and 1. Also C ( P i ) is defined as the number of links going out of page P i . The PageRank of a page A is given as follows:
      • PR ( A ) = ( 1 - d ) + d ( PR ( P 1 )/ C ( P 1 ) + ... + PR ( P n )/ C ( P n ))
      • Usually the parameter d is set to 0.85. PageRank or PR ( A ) can be calculated using a simple iterative algorithm.
      • Other features: anchor text processing, location information management and various data structures, which fully make use of the features of the web.
    • Ranking result pages
      • Based on content
        • Based on content
        • Number of occurrences of the search terms
      • Based on link structure
        • Backlink count
        • PageRank
      • And more.
      • (http://www.cs.duke.edu/~junyang/courses/cps296.1-2002-spring/)
    • Problems with content-based ranking
      • Many pages containing search terms may be of poor quality or irrelevant
        • Example: a page with just a line “search engine”.
      • Many high-quality or relevant pages do not even contain the search terms
        • Example: Google homepage
      • Page containing more occurrences of the search terms are ranked higher; spamming is easy
        • Example: a page with line “search engine” Repeated many times
    • Based on link structure
      • Hyperlinks among web pages provide new web search opportunities.
      • Our focus
      • - PageRank
        • HITS
    • Backlink
      • A backlink of a page p is a link that points to p
      • A page with more backlinks is ranked higher
      • Each backlink is a “vote” for the page’s importance
      • Pages pointed by high-ranking pages are ranked higher
      • Definition is recursive by design
    • PageRank
      • Web can be viewed as a huge directed graph G(V, E)
        • where V is the set of web pages (vertices) and E is the set of hyperlinks (directed edges).
      • Each page may have a number of outgoing edges (forward links) and a number of incoming links (backlinks).
      • Each backlink of a page represents a citation to the page.
      • PageRank is a measure of global web page importance based on the backlinks of web pages.
    • “ Crawlers” or “Spiders” in Web… The link structure of the Web serves to bind together all of the pages that were made public as a result of someone linking to them. Through links, search engines’ automated robots, called crawlers or spiders can reach the many billions of interconnected documents.
    • Hindering “Spiders”…
      • Tools used:
      • Robots.txt : to prevent search engines from crawling your pages.
      • NOINDEX : prevent content appearing in search results by adding "NOINDEX" to robots meta tag.
      • .htaccess : to password protect directories
      • Google Webmaster tools : remove content that has already been crawled
    • Google “Crawl Budget”…
    • Latest Update by “ Matt Cutts”…
      • Factors effecting Crawl Budget :
      • 1) PageRank : the number of pages that Google crawl is roughly proportional to the PageRank" : The pages that get linked to a lot tend to get discovered and crawled quite quickly
      • 2) Host load : It refers to the maximum number of simultaneous connections that a particular web server can handle.
      • Low host load – Allows only one page to be fetched at time.
      • Social Network sites like Facebook, or Twitter have a very high host load because they can take a lot of simultaneous connections.
      • 3) Content : Crawlers discard web pages with duplicate content.
      • Use 301 Redirects for duplicate URLs to merge those together into one single URL.
      • Note : “301 Redirects may result in certain PageRank loss”.
    • How to Search Engines Rank Websites ?
    • How Search Engine evaluate “trust in a Website”…
      • Key Factor : Click distance between your website and the most trusted websites.
      “ Your website” “ Most trusted website” click distance
    • Search Engine “Retrieval and Ranking” Aspects…
      • Relevance : Degree to which the content of the documents returned in a search matches the user’s query intention and terms.
      • Importance or popularity : Relative importance, measured via citation (the act of one work referencing another, as often occurs in academic and business documents) of a given document that matches the user’s query.
      • Relative authority of the site, and the trust the search engine
    • How to determine “Relevancy and Importance ”
      • IR scientists realized that two critical components comprised the majority of search functionality: relevance and importance
      • Combination of relevance and importance determines the ranking order.
      • Popularity and relevance aren’t determined manually
      • Algorithm used : “Ranking factors” or “ Algorithmic ranking criteria” .
    • Analyzing Relevancy and Importance
      • Document analysis (including semantic analysis of concepts across documents)
      • Link (or citation) analysis.
    • Document Analysis
      • Theories/ Concepts Used :
      • Semantic Connectivity
      • Fuzzy Logic Theory
      • Latent Semantic Indexing (LSI)
    • What does “Semantic Connectivity” refers…
      • Semantic connectivity or Co-occurrence refers to words or phrases that are commonly associated with one another.
      • For example, if you see the word aloha you associate it with Hawaii, not Florida.
    • Why to care about “Co-occurrence” ?
      • Keyword-brand associations.
      • Brand visibility across search engines.
      • Co-citation of products and services.
      • Search volume co-occurrence (Co-Volume).
      • Positioning of documents in search results pages ( SERPs).
      • Keywords research and terms discovery.
      • Analysis of seasonal trends.
      • Design of thematic sites.
    • Under-standing Co-occurrence
      • Global; extracted from databases
      • Local; extracted from individual documents
      • Fractal; extracted from self-similar, scaled distributions
    • What matters when working with co-occurrence data…
      • scope; i.e., whether the words behave as broader or narrower terms in a given context.
      • type; i.e., whether we are dealing with nouns, verbs, adjectives, stems, etc
      • synonymity; i.e., whether we are dealing with synonyms .
      • architecture; i.e., whether the documents reside in a horizontal , topic-specific vertical, or regional directory
      • seasonality; i.e., whether we are dealing with repositories containing seasonal trends and periodic fluctuations.
      • sequencing; i.e., the order in which terms are queried or appear in documents.
      • polysemy; . i.e., whether we are dealing with terms with multiple meanings
      • cognates; i.e., whether we are dealing with different terms with same meaning in different languages.
      • query modes; i.e., the retrieval modes used.
    • “ Broader” and “Narrower” Terms…
      • For search query “ dog pet” or “ dog canine”
      • scenario 1: k1 = dog, k2 = canine
      • scenario 2: k1 = dog, k2 = pet
      • As of 06/16/05, searches in Google for these terms return
      • 53,400,000 results for dog
      • 55,800,000 results for pet
      • 3,570,000 results for canine
      • Observations :
      • Dog and pet ( Broader terms) returns more results then Canine ( Narrower Term)
      • Interpretation :
      • Canine is considered as narrower term because :
      • there is a synonymity relationship between "canine" and "dog" but not between "canine" and "pet" or "pet" and "dog".
      • "canine" has different meanings ( polysemy ). According to WordNet, "canine" can be used as a noun or adjective, each having different meanings.
      • "canine" is one of those terms that posses a meaning within a meaning. The terms behave as having a scope within a scope (or context within a context (fractality)] such as Canine of a canine
    • “ Global” Co-occurrence…
      • In Google search engine, the default query mode is “AND”
      • As of 06/16/05 searches in Google for these terms return
      • scenario 1: 12,800,000 for the query, k12 = k1 + k2 = dog pet
      • scenario 2: 1,710,000 for the query, k12 = k1 + k2 = dog canine
      • Observations :
      • both queries return less number of documents
      • new set of results n12 and containing k1 and k2 must be a subset of n1 and n2; i.e., the sets containing k1 only or k2 only.
      • Interpretations :
      • The term "dog" is more frequently co-cited with "pet" than with "canine" since:
      • in scenario 1 we are combining two broader terms.
      • in scenario 1 the terms are not synonyms.
      • in scenario 2 we are combining a broader term with a narrower term.
      • in scenario 2 the terms are synonyms and synonyms rarely occur together but appear in similar contexts.
    • “ Normalized” Co-occurence
      • Also known as Co-Occurrence Index" or C-index.
      • co-citation frequency between two and only two terms k1 and k2, the C-index is given by
      • where
      • c12 = 0 when n12 = 0; i.e., k1 and k2 do not co-occur (terms are mutually exclusive).
      • c12 > 0 when n12 > 0; i.e., k1 and k2 co-occur (terms are non mutually exclusive).
      • c12 = 1 when n12 = n1 = n2; i.e., k1 and k2 co-occur whenever either term occurs.
    • Applying it to precious example …
      • scenario 1: (12,800,000/(53,400,000 + 55,800,000 - 12,800,000))*1000 = 132.7801 = 133 ppt
      • scenario 2: (1,710,000/(53,400,000 + 3,570,000 - 1,710,000))*1000 = 30.9446 = 31 ppt
    • “ Syntagmatic and Paradigmatic Association” theory
      • Syntagmatic associations are terms that frequently occur together.
      • Paradigmatic associations are terms with high semantic similarity.
      • These type of associations allow us to understand why synonyms do not tend to co-occur together. This has a lot to do with contextuality or lexical neighborhoods.
    • Fuzzy Set Theory…
      • Discovers the semantic connectivity between two words .
      • e.g. . both oranges and bananas are fruits , but both oranges and bananas are not round .
      • a machine knows an orange is round and a banana is not by scanning thousands of occurrences of the words banana and orange in its index and noting that round and banana do not have great concurrence , while orange and round do.
    • Latent Semantic Indexing (LSI)
      • LSI (Latent Semantic Indexing) based on Fuzzy Logic theory uses semantic analysis to identify related web pages .
      • e.g , the search engine may notice one page that talks about doctors and another one that talks about physicians, and determine that there is a relationship between the pages based on the other words in common between the pages.
    • Common types of searches in the IR field.
    • Link Analysis…
      • Semantic Analysis
      • Identifying Authority of Links.
      • Identifying Relevancy of Links.
      • Link neighborhood : concept of grouping sites based on their relevance is referred to as a link neighborhood .
      • Placement of Links
    • Top elements of SEO
      • Content
      • Title tag
      • Meta keyword tag
      • Alt attribute for images : alt attribute was originally intended to allow something to be rendered when viewing of the image is not possible
      • noscript tag: users do not allow JavaScript to run when they load a web page. For those users, nothing would be shown where the JavaScript is on the web page, unless the page contains a noscript tag.
    • Evaluating Content…
      • Content : that defines what a page is about.
      • Act as navigational elements for the search engines during crawl and to do a detailed analysis of each web page
      • search engine performs detailed analysis of all the words and phrases that appear on a web page, and then building a map of that data for it to consider showing your page in the results when a user enters a related search query. This map is referred as semantic map.
    • Semantic Map
      • Defines the relationships between web pages so that the search engine can better understand how to match the right web pages with user search queries.
    • Google working on new techniques
      • search engines are able to detect that you are displaying an image, they have little idea what the image is a picture of, except for whatever information you provide them in the alt attribute
      • search engines will not recognize any text rendered in the image
      • Optical Character Recognition (OCR): to extract text from images
      • Search engines are beginning to extract information from Flash
      • A third type of content that search engines cannot see is the pictorial aspects of anything contained in Flash.
      • when text is converted into a vector-based outline (i.e., rendered graphically), the textual information that search engines can read is lost.
      • Audio and video files are also not easy for search engines to read. There are a few exceptions where the search engines can extract some limited data, such as ID3 tags within MP3 files,
      • Search engines also cannot read any content contained within a program
    • Moving Ahead with AJAX
      • technology that can present significant human-readable content that the search engines cannot see is AJAX.
      • AJAX is a JavaScript-based method for dynamically rendering content on a web page after retrieving the data from a database, without having to refresh the entire page. This is often used in tools where a visitor to a site can provide some input and the AJAX tool then retrieves and renders the correct content.
    • Positive Ranking Factors
      • Keyword use in title tag
      • Anchor text of inbound link
      • Global link authority of site
      • Age of site
      • Link popularity within the site’s internal link structure
      • Topical relevance of inbound links
      • Link popularity of site in topical community
      • Keyword use in body text
      • Global link popularity of sites that link to the site
    • Negative Ranking Factor
      • Server is often inaccessible to crawlers
      • Search engines want their users to have good experiences. If your site is subject to frequent outages, by definition it is not providing a good user experience. So, if the search engine crawler frequently is unable to access your web pages, the search engine will assume that it is dealing with a low-quality site.
      • Content very similar to or duplicate of other web pages
      • External links to low-quality/spam sites
      • Participation in link schemes or actively selling links
      • Duplicate titles/meta tags on many pages
    • Other Ranking Factors
      • Rate of acquisition of links
      • Usage Data
      • User Data
      • Google sandbox
    • Have Some “Google Caffeine”
      • a next-generation architecture for Google’s web search
      • Focus :
      • A ranking system that heightens the importance of page load speeds
      • A more focused relevance on real-time search data
      • Stricter spam controls
    • Changes with Google Caffeine
      • Changes in how Google stores the massive amount of data gathered by their robots.
      • This is a direct response to the rise in new digital media such as streaming videos, blog posts, social media content ( Twitter, facebook ). The old Google infrastructure was built to handle data by way of Collection > Quality Ranking  > Sandbox > Indexing. However with the explosion of real-time content, search engines are faced with the daunting task of filtering all this content to provide a real-time search.
      • Changes in how the Google collects its data
      • Google uses robots that crawl through the web for data ( googlebot ), this is traditionally data that may not change or update in real-time. The caffeine update must include changes to the robot to cater for real-time content. The theory currently is Google has developed several types of robots that differ in its indexing rate and craw rate to cater for different media content.
    • Google New Algorithm “ Caffeine”
      • an increased weighting on domain authority & some authoritative tag type pages ranking (like Technorati tag pages + Facebook tag pages), as well as pages on sites like Scribd ranking for some long tail queries based mostly on domain authority and sorta spammy on page text
      • perhaps slightly more weight on exact match domain names
      • perhaps a bit better understanding of related words / synonyms
      • tuning down some of the exposure for video & some universal search results
      • the new search engine improves the index size, the speed of the queries and most importantly, changes the value of search engine rankings.
    • A search for on the new infrastructure, for instance, returns video and news results midway down the page .
    • A search on the existing infrastructure, however, returns news at the top, video in the middle, and images at the bottom of the page.
    • Tools to evaluate speed of the site.
      • Page Speed: An open source Firefox/Firebug add-on that evaluates the performance of web pages and gives suggestions for improvement.
      • Yslow: A free tool from Yahoo! that suggests ways to improve website speed.
      • Webpage test: Shows a waterfall view of your pages’ load performance plus an optimization checklist.
      • In Webmaster Tools, Labs > Site Performance shows the speed of your website as experienced by users around the world as in the chart below.
      • We’ve also blogged about site performance.
    • Contact details: Website: www.mosaic-service.com e-mail id: info.mosaic-service.com Direct no: 0120-4626501,0120-4626508