Understanding Seo At A Glance
Upcoming SlideShare
Loading in...5
×
 

Understanding Seo At A Glance

on

  • 1,264 views

SEO Tutorial Presentation

SEO Tutorial Presentation

Statistics

Views

Total Views
1,264
Views on SlideShare
1,260
Embed Views
4

Actions

Likes
0
Downloads
3
Comments
0

2 Embeds 4

http://www.linkedin.com 3
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Understanding Seo At A Glance Understanding Seo At A Glance Presentation Transcript

  • Understanding SEO at a Glance E-mail us: presales@mosaic-service.com
  •  
  •  
  •  
  •  
  • Necessity of SEO… Online advertising drives $6 offline (in stores) for every $1 spent online. Search marketing has a greater impact on in-store sales lift than display advertising—three times greater, in fact 74% of respondents used search engines to find local business information versus 65% who turned to print Yellow Pages, 50% who used Internet Yellow Pages, and 44% who used traditional newspapers. 86% surveyed said they have used the Internet to find a local business, a rise from the 70% figure reported the year before. 80% reported researching a product or service online, then making that purchase offline from a local business
  • “ Iprospect and Jupiter” Research… 62% of search engine users click on a search result within the first page of results, and 90% within the first three pages. 41% of search engine users who continue their search when not finding what they seek report changing their search term and/or search engine if they do not find what they’re looking for on the first page of results; 88% report doing so after three pages. 36% of users agree that “seeing a company listed among the top results on a search engine makes me think that the company is a top one within its field.”
  • Searches Breakdown
  • How do “Search Engines” work ?
    • Defining “Search Engine” : system which collects , organizes & presents a way to select Web documents based on certain words, phrases, or patterns within documents
      • Model the Web as a full-text DB
      • Index a portion of the Web docs
      • Search Web documents using user-specified words/patterns in a text
  • Categories of “Search Engines”
      • general-purpose search engine, e.g. Yahoo !, AltaVista and Google
      • special-purpose search engines (or Internet Portals), e.g. LinuxStart (www.linuxstart.com)
  • Components of “Search Engines”
    • Two main components:
    • web crawler (spider), which collects massive Web pages.
    • large database , which stores and indexes collected Web pages.
    • Ranking has to be performed without accessing the text, just the index
  • “ Search Engine” Models
    • Information Retrieval (IR) is a key to search engine or Web Search.
    • Most commonly – used models:
      • Boolean Model
      • Vector Space Model (VSM)
      • Probability Model
      • their variations
  • Google Obsession…
  • Google “PageRank”…
    • The PageRank in Google is defined as follow:
    • Assume page A has pages P 1 ...P n which point to it. The parameter d is a damping factor which can be set between 0 and 1. Also C ( P i ) is defined as the number of links going out of page P i . The PageRank of a page A is given as follows:
    • PR ( A ) = ( 1 - d ) + d ( PR ( P 1 )/ C ( P 1 ) + ... + PR ( P n )/ C ( P n ))
    • Usually the parameter d is set to 0.85. PageRank or PR ( A ) can be calculated using a simple iterative algorithm.
    • Other features: anchor text processing, location information management and various data structures, which fully make use of the features of the web.
  •  
  • Ranking result pages
    • Based on content
      • Based on content
      • Number of occurrences of the search terms
    • Based on link structure
      • Backlink count
      • PageRank
    • And more.
    • (http://www.cs.duke.edu/~junyang/courses/cps296.1-2002-spring/)
  • Problems with content-based ranking
    • Many pages containing search terms may be of poor quality or irrelevant
      • Example: a page with just a line “search engine”.
    • Many high-quality or relevant pages do not even contain the search terms
      • Example: Google homepage
    • Page containing more occurrences of the search terms are ranked higher; spamming is easy
      • Example: a page with line “search engine” Repeated many times
  • Based on link structure
    • Hyperlinks among web pages provide new web search opportunities.
    • Our focus
    • - PageRank
      • HITS
  • Backlink
    • A backlink of a page p is a link that points to p
    • A page with more backlinks is ranked higher
    • Each backlink is a “vote” for the page’s importance
    • Pages pointed by high-ranking pages are ranked higher
    • Definition is recursive by design
  • PageRank
    • Web can be viewed as a huge directed graph G(V, E)
      • where V is the set of web pages (vertices) and E is the set of hyperlinks (directed edges).
    • Each page may have a number of outgoing edges (forward links) and a number of incoming links (backlinks).
    • Each backlink of a page represents a citation to the page.
    • PageRank is a measure of global web page importance based on the backlinks of web pages.
  • “ Crawlers” or “Spiders” in Web… The link structure of the Web serves to bind together all of the pages that were made public as a result of someone linking to them. Through links, search engines’ automated robots, called crawlers or spiders can reach the many billions of interconnected documents.
  •  
  • Hindering “Spiders”…
    • Tools used:
    • Robots.txt : to prevent search engines from crawling your pages.
    • NOINDEX : prevent content appearing in search results by adding "NOINDEX" to robots meta tag.
    • .htaccess : to password protect directories
    • Google Webmaster tools : remove content that has already been crawled
  • Google “Crawl Budget”…
  • Latest Update by “ Matt Cutts”…
    • Factors effecting Crawl Budget :
    • 1) PageRank : the number of pages that Google crawl is roughly proportional to the PageRank" : The pages that get linked to a lot tend to get discovered and crawled quite quickly
    • 2) Host load : It refers to the maximum number of simultaneous connections that a particular web server can handle.
    • Low host load – Allows only one page to be fetched at time.
    • Social Network sites like Facebook, or Twitter have a very high host load because they can take a lot of simultaneous connections.
    • 3) Content : Crawlers discard web pages with duplicate content.
    • Use 301 Redirects for duplicate URLs to merge those together into one single URL.
    • Note : “301 Redirects may result in certain PageRank loss”.
  • How to Search Engines Rank Websites ?
  • How Search Engine evaluate “trust in a Website”…
    • Key Factor : Click distance between your website and the most trusted websites.
    “ Your website” “ Most trusted website” click distance
  • Search Engine “Retrieval and Ranking” Aspects…
    • Relevance : Degree to which the content of the documents returned in a search matches the user’s query intention and terms.
    • Importance or popularity : Relative importance, measured via citation (the act of one work referencing another, as often occurs in academic and business documents) of a given document that matches the user’s query.
    • Relative authority of the site, and the trust the search engine
  • How to determine “Relevancy and Importance ”
    • IR scientists realized that two critical components comprised the majority of search functionality: relevance and importance
    • Combination of relevance and importance determines the ranking order.
    • Popularity and relevance aren’t determined manually
    • Algorithm used : “Ranking factors” or “ Algorithmic ranking criteria” .
  • Analyzing Relevancy and Importance
    • Document analysis (including semantic analysis of concepts across documents)
    • Link (or citation) analysis.
  • Document Analysis
    • Theories/ Concepts Used :
    • Semantic Connectivity
    • Fuzzy Logic Theory
    • Latent Semantic Indexing (LSI)
  • What does “Semantic Connectivity” refers…
    • Semantic connectivity or Co-occurrence refers to words or phrases that are commonly associated with one another.
    • For example, if you see the word aloha you associate it with Hawaii, not Florida.
  • Why to care about “Co-occurrence” ?
    • Keyword-brand associations.
    • Brand visibility across search engines.
    • Co-citation of products and services.
    • Search volume co-occurrence (Co-Volume).
    • Positioning of documents in search results pages ( SERPs).
    • Keywords research and terms discovery.
    • Analysis of seasonal trends.
    • Design of thematic sites.
  • Under-standing Co-occurrence
    • Global; extracted from databases
    • Local; extracted from individual documents
    • Fractal; extracted from self-similar, scaled distributions
  • What matters when working with co-occurrence data…
    • scope; i.e., whether the words behave as broader or narrower terms in a given context.
    • type; i.e., whether we are dealing with nouns, verbs, adjectives, stems, etc
    • synonymity; i.e., whether we are dealing with synonyms .
    • architecture; i.e., whether the documents reside in a horizontal , topic-specific vertical, or regional directory
    • seasonality; i.e., whether we are dealing with repositories containing seasonal trends and periodic fluctuations.
    • sequencing; i.e., the order in which terms are queried or appear in documents.
    • polysemy; . i.e., whether we are dealing with terms with multiple meanings
    • cognates; i.e., whether we are dealing with different terms with same meaning in different languages.
    • query modes; i.e., the retrieval modes used.
  • “ Broader” and “Narrower” Terms…
    • For search query “ dog pet” or “ dog canine”
    • scenario 1: k1 = dog, k2 = canine
    • scenario 2: k1 = dog, k2 = pet
    • As of 06/16/05, searches in Google for these terms return
    • 53,400,000 results for dog
    • 55,800,000 results for pet
    • 3,570,000 results for canine
    • Observations :
    • Dog and pet ( Broader terms) returns more results then Canine ( Narrower Term)
    • Interpretation :
    • Canine is considered as narrower term because :
    • there is a synonymity relationship between "canine" and "dog" but not between "canine" and "pet" or "pet" and "dog".
    • "canine" has different meanings ( polysemy ). According to WordNet, "canine" can be used as a noun or adjective, each having different meanings.
    • "canine" is one of those terms that posses a meaning within a meaning. The terms behave as having a scope within a scope (or context within a context (fractality)] such as Canine of a canine
  • “ Global” Co-occurrence…
    • In Google search engine, the default query mode is “AND”
    • As of 06/16/05 searches in Google for these terms return
    • scenario 1: 12,800,000 for the query, k12 = k1 + k2 = dog pet
    • scenario 2: 1,710,000 for the query, k12 = k1 + k2 = dog canine
    • Observations :
    • both queries return less number of documents
    • new set of results n12 and containing k1 and k2 must be a subset of n1 and n2; i.e., the sets containing k1 only or k2 only.
    • Interpretations :
    • The term "dog" is more frequently co-cited with "pet" than with "canine" since:
    • in scenario 1 we are combining two broader terms.
    • in scenario 1 the terms are not synonyms.
    • in scenario 2 we are combining a broader term with a narrower term.
    • in scenario 2 the terms are synonyms and synonyms rarely occur together but appear in similar contexts.
  • “ Normalized” Co-occurence
    • Also known as Co-Occurrence Index" or C-index.
    • co-citation frequency between two and only two terms k1 and k2, the C-index is given by
    •  
    • where
    • c12 = 0 when n12 = 0; i.e., k1 and k2 do not co-occur (terms are mutually exclusive).
    • c12 > 0 when n12 > 0; i.e., k1 and k2 co-occur (terms are non mutually exclusive).
    • c12 = 1 when n12 = n1 = n2; i.e., k1 and k2 co-occur whenever either term occurs.
  • Applying it to precious example …
    • scenario 1: (12,800,000/(53,400,000 + 55,800,000 - 12,800,000))*1000 = 132.7801 = 133 ppt
    • scenario 2: (1,710,000/(53,400,000 + 3,570,000 - 1,710,000))*1000 = 30.9446 = 31 ppt
  • “ Syntagmatic and Paradigmatic Association” theory
    • Syntagmatic associations are terms that frequently occur together.
    • Paradigmatic associations are terms with high semantic similarity.
    • These type of associations allow us to understand why synonyms do not tend to co-occur together. This has a lot to do with contextuality or lexical neighborhoods.
  • Fuzzy Set Theory…
    • Discovers the semantic connectivity between two words .
    • e.g. . both oranges and bananas are fruits , but both oranges and bananas are not round .
    • a machine knows an orange is round and a banana is not by scanning thousands of occurrences of the words banana and orange in its index and noting that round and banana do not have great concurrence , while orange and round do.
  • Latent Semantic Indexing (LSI)
    • LSI (Latent Semantic Indexing) based on Fuzzy Logic theory uses semantic analysis to identify related web pages .
    • e.g , the search engine may notice one page that talks about doctors and another one that talks about physicians, and determine that there is a relationship between the pages based on the other words in common between the pages.
  • Common types of searches in the IR field.
  • Link Analysis…
    • Semantic Analysis
    • Identifying Authority of Links.
    • Identifying Relevancy of Links.
    • Link neighborhood : concept of grouping sites based on their relevance is referred to as a link neighborhood .
    • Placement of Links
  • Top elements of SEO
    • Content
    • Title tag
    • Meta keyword tag
    • Alt attribute for images : alt attribute was originally intended to allow something to be rendered when viewing of the image is not possible
    • noscript tag: users do not allow JavaScript to run when they load a web page. For those users, nothing would be shown where the JavaScript is on the web page, unless the page contains a noscript tag.
  • Evaluating Content…
    • Content : that defines what a page is about.
    • Act as navigational elements for the search engines during crawl and to do a detailed analysis of each web page
    • search engine performs detailed analysis of all the words and phrases that appear on a web page, and then building a map of that data for it to consider showing your page in the results when a user enters a related search query. This map is referred as semantic map.
  • Semantic Map
    • Defines the relationships between web pages so that the search engine can better understand how to match the right web pages with user search queries.
  • Google working on new techniques
    • search engines are able to detect that you are displaying an image, they have little idea what the image is a picture of, except for whatever information you provide them in the alt attribute
    • search engines will not recognize any text rendered in the image
    • Optical Character Recognition (OCR): to extract text from images
    • Search engines are beginning to extract information from Flash
    • A third type of content that search engines cannot see is the pictorial aspects of anything contained in Flash.
    • when text is converted into a vector-based outline (i.e., rendered graphically), the textual information that search engines can read is lost.
    • Audio and video files are also not easy for search engines to read. There are a few exceptions where the search engines can extract some limited data, such as ID3 tags within MP3 files,
    • Search engines also cannot read any content contained within a program
  • Moving Ahead with AJAX
    • technology that can present significant human-readable content that the search engines cannot see is AJAX.
    • AJAX is a JavaScript-based method for dynamically rendering content on a web page after retrieving the data from a database, without having to refresh the entire page. This is often used in tools where a visitor to a site can provide some input and the AJAX tool then retrieves and renders the correct content.
  • Positive Ranking Factors
    • Keyword use in title tag
    • Anchor text of inbound link
    • Global link authority of site
    • Age of site
    • Link popularity within the site’s internal link structure
    • Topical relevance of inbound links
    • Link popularity of site in topical community
    • Keyword use in body text
    • Global link popularity of sites that link to the site
  • Negative Ranking Factor
    • Server is often inaccessible to crawlers
    • Search engines want their users to have good experiences. If your site is subject to frequent outages, by definition it is not providing a good user experience. So, if the search engine crawler frequently is unable to access your web pages, the search engine will assume that it is dealing with a low-quality site.
    • Content very similar to or duplicate of other web pages
    • External links to low-quality/spam sites
    • Participation in link schemes or actively selling links
    • Duplicate titles/meta tags on many pages
  • Other Ranking Factors
    • Rate of acquisition of links
    • Usage Data
    • User Data
    • Google sandbox
  • Have Some “Google Caffeine”
    • a next-generation architecture for Google’s web search
    • Focus :
    • A ranking system that heightens the importance of page load speeds
    • A more focused relevance on real-time search data
    • Stricter spam controls
  •  
  • Changes with Google Caffeine
    • Changes in how Google stores the massive amount of data gathered by their robots.
    • This is a direct response to the rise in new digital media such as streaming videos, blog posts, social media content ( Twitter, facebook ). The old Google infrastructure was built to handle data by way of Collection > Quality Ranking  > Sandbox > Indexing. However with the explosion of real-time content, search engines are faced with the daunting task of filtering all this content to provide a real-time search.
    • Changes in how the Google collects its data
    • Google uses robots that crawl through the web for data ( googlebot ), this is traditionally data that may not change or update in real-time. The caffeine update must include changes to the robot to cater for real-time content. The theory currently is Google has developed several types of robots that differ in its indexing rate and craw rate to cater for different media content.
  • Google New Algorithm “ Caffeine”
    • an increased weighting on domain authority & some authoritative tag type pages ranking (like Technorati tag pages + Facebook tag pages), as well as pages on sites like Scribd ranking for some long tail queries based mostly on domain authority and sorta spammy on page text
    • perhaps slightly more weight on exact match domain names
    • perhaps a bit better understanding of related words / synonyms
    • tuning down some of the exposure for video & some universal search results
    • the new search engine improves the index size, the speed of the queries and most importantly, changes the value of search engine rankings.
  • A search for on the new infrastructure, for instance, returns video and news results midway down the page .
  • A search on the existing infrastructure, however, returns news at the top, video in the middle, and images at the bottom of the page.
  • Tools to evaluate speed of the site.
    • Page Speed: An open source Firefox/Firebug add-on that evaluates the performance of web pages and gives suggestions for improvement.
    • Yslow: A free tool from Yahoo! that suggests ways to improve website speed.
    • Webpage test: Shows a waterfall view of your pages’ load performance plus an optimization checklist.
    • In Webmaster Tools, Labs > Site Performance shows the speed of your website as experienced by users around the world as in the chart below.
    • We’ve also blogged about site performance.
  • Contact details: Website: www.mosaic-service.com e-mail id: info.mosaic-service.com Direct no: 0120-4626501,0120-4626508