Seo and analytics basics
Upcoming SlideShare
Loading in...5
×
 

Seo and analytics basics

on

  • 1,502 views

My recent presentation on SEO and Web Analytics.

My recent presentation on SEO and Web Analytics.

Statistics

Views

Total Views
1,502
Views on SlideShare
1,502
Embed Views
0

Actions

Likes
0
Downloads
21
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Seo and analytics basics Seo and analytics basics Presentation Transcript

  • Sreekanth Narayanan
    SEO and Analytics
  • SEO and Analytics
    SEO Introduction
    Analytics Introduction
    Search Engine basics
    Analytics – methods
    Technology Considerations
    Tools for Analytics
    Tweaking your Content
    Some Key Terminologies
    Promoting Web Pages
    Tools for Web Masters
  • SEO and Analytics
    SEO Introduction
    View slide
  • SEO – what’s that???
    Search Engine Optimization has been a buzz word since the advent of major search engines
    SEO deals with best practices outlined to make it easier for search engines to crawl, index and understand the content on your web page.
    View slide
  • SEO and Analytics
    Search Engine basics
  • How do search engines work?
    Spiders (Also called Robots) comb the web by following links
    Search engine formats the data is finds and stores in its database.
    All the search engines maintain extensive and highly indexed databases.
  • SEO – what’s that???
    All trademarks belong to respective owners
  • Indexing of the results is based on complex algorithms based on a number of complex parameters.
    Due to the years of expertise gained by Web masters in analyzing the behaviors of the major Search Engines, there is a considerable knowledgebase on what makes pages more Search Engine Friendly.
    SEO – what’s that???
  • Paid and Organic Search Results
    Many Search engines have launched paid services like the Google Ad Words
    The Organic Search results are the ones which are not influences by paid or sponsored programs
    SEO applies to the organic results. It normally has no impact on the results shown from sponsored links.
  • SEO and Analytics
    Technology Considerations
  • User-Agent HTTP Header
    Most web sites heavily make use of the UserAgent HTTP header to determine who the requestor of the page is.
    Often the Web sites behavior is altered depending on what is passed on the user agent field.
    Typical applications of this is changing the CSS for IE and Firefox - The (in)famous browser incompatibility issues
    Forwarding a user to a Mobile version of the Web Site if the user agent happens to be a Mobile Device.
  • The common Robot user agents
    The following are the most famous Robot user agent strings
  • Cloaking
    Cloaking has been a very popular methodology used in the earlier days for SEO
    It is a simple way disguising your website in to another text based (with a lot of keywords sprinkled all over) web site when a request is coming from a Web Robot (Spider).
    Most Spiders are identifiable by their User Agent headers.
    For e.g. the Google Robot is called the “Googlebot”
    As search engines strengthened their spam detection technologies, they often started penalizing “Cloaked” web sites by removing them altogether from their indices.
    As of today, cloaking is not considered a recommended practice and should be avoided in all scenarios.
  • URL Structure
    Simple-to-understand URLs will convey content information easily
    It is easier for the user as well as the crawlers to organize.
    Crawlers typically try to reduce priority of indexes of urls containing arbitrary numbers and characters.
    PageRank (TM – Google Inc.) algorithm gives a lot of weightage to the number of pages which link to your page.
    If your URLs are simpler it is easier for users to link your page.
    If your URL contains relevant words, this provides users and search engines with more information about the page than an ID or oddly named parameter would
  • URL best practices
    Avoid using lengthy URLs with unnecessary parameters and session IDs
    Avoid choosing generic page names like "page1.html"
    Keep the directory nesting as simple as possible
    Keep the directory names relevant to the content provided in the directory. Avoid using numbers for directory names
    Do not mix up capital case in urls – like CreateOrder.html? – Users always prefer a single case (and lower case always)
  • URL best practices
    Web sites should be as flat as possible, with content relating to highly competitive keywords implemented on pages high on the hierarchy.
    Rewrite URLs on the Server side to make them simpler and less nested.
    Note that Search engines always assign a lower relevance score to data which is found deep nested inside the Website. The Content on the top folders are considered much more relevant.
  • Canonical URL
    More than often, there are multiple ways to reach a same page on a Website.
    Canonicalization is the process of picking the best URL when there are several choices, usually referring to the homepage of a website.
    For e.g. consider http://www.google.com and http://google.com. Both URLs provide same content. Another example of this is “domain.com/aboutus.htm” and “blog.domain.com/aboutus.htm”
    More than often search engines are intelligent enough to recognize that the content on the pages is the same, and they would pick one of the URLs, which might not be out preferred one.
  • Canonical URL – best practices
    There are a few ways to ensure that the proper URL is indexed:
    When linking to your homepage always point to the same URL
    When requesting links from other sites, always point to the same URL
    Redirect the non‐www homepage to the www version of the homepage, use 301 Permanent redirects. A 301 redirect example (JSP) is shown below.
    <%response.setStatus(301);response.setHeader( "Location", "http://www.new-url.com/" );response.setHeader( "Connection", "close" );%>
  • HTTP 301 &HTTP 302
    302 is a temporary redirect
    301 is the permanent redirect
    As far as possible use only 301 for redirection. (Explained on previous slide)
    Always redirect from the server (Sample on previous slide)
    302 redirects indicate that the content is temporary and will be changed in the near future. Popularity attained by the previous site or page will not be passed on to the new site.
    301 Permanent Redirects should be used when the change is long‐term or permanent, which allows Page Rank and link popularity to transfer. This is taken care by the indexing engines of all major search engines.
  • Name Value pairs in URLs
    Name Value pairs are used on urls to provide information necessary to produce dynamic content.
    Urls tend to become lengthy with name value pairs
    They contain numbers which are typically treated as junk by Search engines.
    Further “prod_code” does not make any sense to a common user. A Product name would have been better
    Use valuable keywords in the name‐value pairs whenever possible and keep the quantity of pairs to no more than three.
  • User Input Fronting Screens
    Many sites have a front page where you need to enter your location or your details before it could give you information about products.
    Search engines cannot input information, or make selections from form drop downs. This means search engine spiders are effectively locked out of relevant content and cannot index or rank the content.
    Another problem is having a splash screen with a country chooser which does not allow people to go beyond that page without selecting the country to choose the locale.
    It is better to have a default locale and go inside and then give an option to change it. The Robot will be able to index your pages with such a design.
  • Using mostly text for navigation
    Lot of sites use flash or JavaScript to do navigations.
    Search engine spiders are unable to follow Java Script or Flash navigation and are therefore unable to find pages accessible only through Java Script or Flash navigation.
    Flash might not be supported on all browsers.
    User might not have installed the plug-in or could have disabled JavaScript.
    Only use HTML based navigation
    You might have seen that most web 2.0 sites include a full sitemap on the footer. This is done to make sure that all the flash/script navigation links are replicated in HTML form for the spiders to make use of.
  • The Web 2.0 footer
    Page copyright mint.com
  • Provide alternative to flash content
    Spiders cannot read flash content
    All links embedded in flash is never navigated or indexed
    If you cannot do away with flash due to usability reasons, implement a site with the same links in HTML
    Implement user‐agent detection to deliver the HTML site to spiders and the Flash version to human visitors.
  • Excessive In page Scripting
    All Web crawlers limit the amount of content they index from a page
    Typically this is limited to 100 KB of data.
    If you have too much in-page scripting, the only thing the search engine might see is the script on your page
    Some of the content on your page will be ignored if the limit is reached. Crawlers ignore the <script> tag, but the total content read (100KB) includes the scripts as well.
    It is always sensible to have your scripts on a different file and included on to your page. This way, you are not risking running out of the crawlers content limitations and still write a lot of code for dynamic behavior.
  • Excessive In page Scripting
    Following example shows the right way of doing this
    <link href="${ctx}/content/css/style.css" rel="stylesheet" type="text/css" />
    <script type="text/javascript" src="${ctx}/js/jquery-1.4.2.js"></script>
    <script type="text/javascript" src="${ctx}/js/jquery.ui.core.js"></script>
    <script type="text/javascript" src="${ctx}/js/jquery.dataTables.js"></script>
    <script type="text/javascript" src="${ctx}/js/highcharts.js"></script>
    <script type="text/javascript" src="${ctx}/content/js/page.mypage.js"></script>
    function setDefaults()
    {
    $('#genericError').hide();
    $("#catgErr").hide();
    $("#allCatgs").attr('checked', false);
    $.ajax({url:"../callsomething",
    type : "POST",
    async:false,
    success:function(data){
    varlen = data.map.entry.length;
    for (i =0 ; i < len; i++)
    {
    //do something
    }
    }}
    );
    }
  • Session Ids on the URL
    A web server assigns a unique session ID variable within the URL for each visit for tracking purposes.
    Search engine spiders revisiting a URL will be assigned a different session ID each visit, which will result in each visit to a page appearing as a unique URL and causing indexing inconsistencies, and possibly duplicate content penalties.
    Should implement user‐agent detection to remove the session ID’s for search engine visits.
  • “nofollow” settings
    Setting the value of the "rel" attribute of a link to "nofollow" will tell search engine robots that certain links on your site shouldn't be followed or pass your page's reputation to the pages linked to
    Very true for all the pages which allow user comments.
    Say you a famous company and allow people to post feedback on your blog. Always set the “nofollow” to avoid the scenario like the following !
    Sample : <a href="http://www.cheapdrugs123.com" rel="nofollow">Comment by a spammer</a>
  • 404 pages
    Pages or content that is moved, removed, or changed can result in errors, such as a 404 Page Not Found.
    Having a custom 404 page that kindly guides users back to a working page on your site can greatly improve a user's experience
    Your 404 page should probably have a link back to your root page and could also provide links to popular or related content on your site.
    NEVER EVER allow your 404 pages to be indexed in search engines
    Do not use a design for your 404 pages that isn't consistent with the rest of your site
    Repair all broken links as soon as possible
  • SEO and Analytics
    Tweaking your Content
  • The <title> tag
    Most Search Engines give a lot of weightage to what is the content in the <title> HTML tag
    A title tag tells both users and search engines what the topic of a particular page is.
    The <title> tag should be placed within the <head> tag of the HTML document
    Ideally, you should create a unique title for each page on your site.
  • <title> tag tips
    Always put a sensible title for every page. Do not repeat the text in all the pages or a group of pages unless it makes sense .
    Make sure all your important business are reflected on the title
    Never choose a title that has no relation to the content on the page
    Never use default or vague titles like "Untitled" or "New Page 1“
  • <title> tag tips
    Always put a sensible title for every page. Do not repeat the text in all the pages or a group of pages unless it makes sense .
    Make sure all your important business are reflected on the title
    Never choose a title that has no relation to the content on the page
    Never use default or vague titles like "Untitled" or "New Page 1“
    Google displays 63 characters from the page title on the search results, which means the first 63 characters should contain all relevant detail you needed.
  • <meta> tags
    A page's description meta tag gives search engines a summary of what the page is about
    Limit descriptions to 250 characters
    •Include all targeted key phrases
    •Copy should be written with users in mind (description copy appears in search results)
    •Create a unique meta description for every page
  • <meta> keywords tag
    Keywords are mentioned in the head section of the html.
    Google gives very little importance to this
    Bing and Yahoo searches give some importance to this (Still makes sense to specify this).
    The search engine normally does not display these content in the search results.
    Use only relevant phrases on this tag. Use distinct phrases for the pages.
  • Header tags <h1>, <h2>, <h3>
    A lot of importance is given by the Search engines to what content appears inside the header tags.
    Strictly one <h1> tag per page. This should be used for the most important heading on the page.
    <h2> and <h3> tags also should be used for the most relevant headings
    Always keep the natural hierarchy. First h1, second h2 and then h3.
  • Importance of Anchor text
    Anchor text is the clickable text that users will see as a result of a link, and is placed within the anchor tag <a href="..."></a>.
    e.g. <a href="http://www.mydomain.com/articles/our-prices.htm">Lowest prices on earth for international calls</a>
    This text tells search engines something about the page you're linking to.
    Avoid writing generic anchor text like "page", "article", or "click here"
    Avoid using text that is off-topic or has no relation to the content of the page linked to
    Avoid using CSS or text styling that make links look just like regular text
  • Duplication of Content
    Duplicate content exists when two or more pages within a website, or on different domains, share identical content.
    Different domain names do not create distinct content. company.com/aboutus.html blog.company.com/aboutus.html
    Major search engines consider duplicate content to be spam and are continually improving their spam filtering process to penalize and remove offenders.
    Avoid duplication of content as far as possible
    Use 301 permanent redirects to inform search engines of the proper URL to utilize.
  • Optimizing image content
    Images form an integral part of any website
    The "alt" attribute allows you to specify alternative text for the image if it cannot be displayed for some reason
    This is a very important usability aspect as the “screen reader” program used by blind people will identify and read out the alt text for them.
    Another reason is that if you're using an image as a link, the alt text for that image will be treated similarly to the anchor text of a text link.
    Optimizing your image filenames and alt text makes it easier for image search projects like Google Image Search to better understand and rank the images on your website.
  • The robots.txt file
    Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.
    A sample can be seen here : http://www.robotstxt.org/robotstxt.html
    All major search engine robots scan this file to see what pages are relevant to be crawled.
    The Disallow tags specify which pages should be ignored by the crawler.
    The robots.txt typically has the such information
    Disallow: /residential/customerService/
    Disallow: /residential/customerService/contacts.html
    Disallow:/residential/customerService/contactus/billing.html
  • The robots.txt file
    There are some important considerations when using /robots.txt:
    Robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
    The /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.
    You could put all the files you don't want robots to visit in a separate sub directory, make that directory un-list-able on the web (by configuring your server), then place your files in there, and list only the directory name in the /robots.txt. Now an ill-willed robot won't traverse that directory unless you put a direct link on the web to one of your files, and then it's not /robots.txt fault.
  • SEO and Analytics
    Promoting Web Pages
  • Linking your websites
    Internal linking between pages within a web site, such as navigational elements or a site map, plays an important role in how search engines perceive the relevancy and theme of both web pages.
    Proper intra‐site linking will help facilitate effective spidering, in addition to increasing relevancy of pages
    Maintain a sitemap.
    Keep sitemap pages to less than 100 links per page
    Sitemaps should be linked directly from homepage and other major pages throughout the web site
  • Promotion through external channels
    Effectively promoting your new content will lead to faster discovery by those who are interested in the same subject
    Increasing back-linking to your site is one option, but it should be done properly.
    Social Media site (e.g. the facebook like) adds to your link count. Typically it is not advised to link every small update in this fashion, as search engines now-a-days even understand those patterns.
    You could include your updates to a RSS feed.
    You could link it from Blogs of people in the related community.
    Search engines of today, do not only go by page rank for determining the relevance. It also depends on traffic and content.
  • SEO and Analytics
    Tools for Web Masters
  • Webmaster tools
    Every major search engine has launched their own set of Web master tools
    Google: http://www.google.com/webmasters/
    Yahoo: http://siteexplorer.search.yahoo.com/
    Bing: http://www.bing.com/toolbox/webmasters/
    We will examine some of the most important tools which Google provides.
  • Webmaster tools
    Google provides the following services:
    see which parts of a site Googlebot had problems crawling
    notify Google of an XML Sitemap file
    analyze and generate robots.txt files
    remove URLs already crawled by Googlebot
    specify your preferred domain
    identify issues with title and description meta tags
    understand the top searches used to reach a site
    get a glimpse at how Googlebot sees pages
    remove unwanted site links that Google may use in results
    receive notification of quality guideline violations and request a site reconsideration
  • SEO and Analytics
    Analytics Introduction
  • Web Analytics - Introduction
    Web analytics is the measurement, collection, analysis and reporting of internet data for purposes of understanding and optimizing web usage.
    It is a very important tool for Business and market research
    Web analytics provides data on the number of visitors, page views, etc. to gauge the traffic and popularity trends which helps doing the market research.
    Predominantly 2 Types
    Off-site
    On-site
  • Web Analytics - Introduction
    Off-site web analytics refers to web measurement and analysis regardless of whether you own or maintain a website. It includes the measurement of a website's potential audience (opportunity), share of voice (visibility), and buzz (comments) that is happening on the Internet as a whole
    On-site web analytics measure a visitor's journey once on your website. This includes its drivers and conversions; for example, which pages encourage people to make a purchase. On-site web analytics measures the performance of your website in a commercial context.
  • SEO and Analytics
    Analytics – methods
  • Methods for measuring
    Log file analysis
    All Web servers record most of their transactions in a log file. (Access log for Apache)
    Was the most prominent method when the web evolved in late 90s.
    This involved running a tool to identify the hits to a page from the log file and determine statistics from the same
    Became very inaccurate in later times as there are a thousands of “non-human” actors on the web today. Googlebot is an example
    Log File analysis also failed when users enabled their browser caches. This resulted in pages being cached on the browser and when the user requested for the same pages, no hit was made on to the Web server.
  • Methods for measuring
    Log file analysis
    All Web servers record most of their transactions in a log file. (Access log for Apache)
    Was the most prominent method when the web evolved in late 90s.
    This involved running a tool to identify the hits to a page from the log file and determine statistics from the same
    Became very inaccurate in later times as there are a thousands of “non-human” actors on the web today. Googlebot is an example
  • Methods for measuring
    Log file analysis – contd..
    The tools adapted to the robots by measuring the hits based on cookie tracking and ignoring the known robots
    This is not practical as robots are not only written by search engines, but also by spammers
    Log File analysis also failed when users enabled their browser caches. This resulted in pages being cached on the browser and when the user requested for the same pages again, no hit was made on to the Web server and content was delivered from the cache.
  • Methods for measuring
    Page tagging
    Developed during later stages of the web
    Embeds a Java Script code segment on the page
    When a tracking operation is triggered, data from the HTTP Request, browser/system info and cookies are collected by the Script
    The Script submits the data as parameters attached to a image request sent to the analytics server. (Single pixel image)
    For e.g. take a look at the Google analytics data collection request which gets sent out.
    http://www.google-analytics.com/__utm.gif?utmwv=4&utmn=769876874&utmhn=example.com&utmcs=ISO-8859-1&utmsr=1280x1024&utmsc=32-bit&utmul=en-us&utmje=1&utmfl=9.0%20%20r115&utmcn=1&utmdt=GATC012%20setting%20variables&utmhid=2059107202&utmr=0&utmp=/auto/GATC012.html?utm_source=www.gatc012.org&utm_campaign=campaign+gatc012&utm_term=keywords+gatc012& …..etc…..
  • Methods for measuring
    Page tagging contd..
    After the invent of the XHR (XmlHttpRequest) some of the page tagging scripts have used a AJAX submission of user data on to the collection server.
    This is often bound to fail due to restrictions on the XHR (Domain of Origin) on most of the modern browsers.
    As the page tagging approach Involves downloading a one pixel image from a domain (like Google) this adds an additional DNS (Domain Name System) lookup to your page which is sometimes looked upon as obstructive to page loading.
  • Page tagging is the new Analytics
    Page tagging is the de-facto standard followed as of today
    It has a significant advantage that it works even for pages hosted on the cloud, meaning that you do not need to have dedicated web servers and monitor their logs
    Analytics today is mostly an outsourced service. There are many specialist providers like Google and Adobe. And page tagging is the only method supported there.
  • SEO and Analytics
    Tools for Analytics
  • Major tools – Web Analytics
    Google Analytics
    Free from Google (5M page view cap per month for non AdWords advertisers.)
    Uses Page Tagging as Analytics Method
    User embeds a Script in to the page
    The Script collects information on the page actions and submits the same to the Analytics Server by using the data as parameters on an image fetch
    Detailed reports are presented to the user by logging into your Google account
  • Google Analytics Results
  • Major tools – Web Analytics
    Omniture Fusion (Adobe)
    Uses page tagging for information collection
    You include a Script snippet on all the pages which are tracked.
    The information is submitted through Script call, almost same as what Google does, as parameters to “1px x 1px” transparent image request
    <body>
    <script language="javascript" src="INSERT-DOMAIN-AND-PATH-TO-CODE/s_code.js" type="text/javascript"></script>
    <script language="javascript" type="text/javascript">
    <!--
    /* Copyright 1997-2004 Omniture, Inc. */
    s.pageName="“
    var s_code=s.t();if(s_code)document.write(s_code)
    //--></script>
    </body>
    </html>
  • Onmiture Reports
  • SEO and Analytics
    Some Key Terminologies
  • Web Analytics KPIs
    KPIs are those metrics which give information on what changes could drive more effectiveness on your website
    All KPIs are metrics, but not all metrics are KPIs.
    In Web Analytics it becomes very critical to measure the right things.
  • First and Third Party Cookies
    First-party cookies are cookies that are associated with the host domain.
    Third-party cookies are cookies from any other domain.
    You go to the site http://yahoo.com
    There is a banner ad on this site for http://youbuy.com
    Both yahoo.com and youbuy.com place cookies on your browser
    So for you, the cookie from yahoo.com is a First Party cookie and the one from youbuy.com is a Third Party cookie.
  • First and Third Party Cookies
    So if I had placed the Google analytics Script on our page http://mozvo.com, and it had placed a cookie for the domain “google.com”, then that would have been a third party cookie
    Third party cookies are widely discouraged as there are quite a few sites which plant tracker cookies.
    A lot of users (about 40%) disable third party cookies
    All of the analytics providers have switched to using first party cookies to track information.
    Which means that the user will see only cookies from mozvo.com even though the Google analytics code is embedded on the page.
  • Bounce Rate and Click through rate
    The Bounce Rate : The bounce rate for the homepage, or any other page through which visitors enter your site, tells you how many people 'bounce' away (leave) from your site after viewing one page.
    Hence having a low bounce rate is preferred.
    Click Through Rate : Click-through rate (or click-thru rate) tells you how many people are clicking through to your site from a third-party. For example from a link, search engine, banner, advertising or email campaign.
    A Higher Click Through rate is preferred.
  • Click Stream Analysis
    Clickstreams, also known as clickpaths, are the route that visitors choose when clicking or navigating through a site.
    A clickstream is a list of all the pages viewed by a visitor, presented in the order the pages were viewed, also defined as the ‘succession of mouse clicks’ that each visitor makes.
    A clickstream will show you when and where a person came in to a site, all the pages viewed, the time spent on each page, and when and where they left.
    The most obvious reason for examining clickstreams is to extract specific information about what people are doing on your site..
  • References
    http://static.googleusercontent.com/external_content/untrusted_dlcp/www.google.com/en/us/webmasters/docs/search-engine-optimization-starter-guide.pdf
    http://www.bing.com/community/site_blogs/b/webmaster/archive/2009/09/03/search-engine-optimization-for-bing.aspx
    http://help.yahoo.com/l/us/yahoo/search/indexing/ranking-02.html;_ylt=AiB.kJ7SxMRMNktmvnsyomX.YHhG
    http://www.bivings.com/thelab/presentations/SEO_Basics.pdf
  • Thank You !
    Thank you !
    http://nsreekanth.blogspot.com/