This thesis analyzes personalization in search engines and how it impacts search engine optimization (SEO). It presents experiments on how personalization affects search rankings. The thesis proposes new SEO methods that leverage personalization. It contains 5 chapters that discuss personalized search operations, analyzing user behavior data, the effect of personalization on SEO metrics, an SEO guide, and a proposed system for global personalization. The thesis was supervised by Dr. Dariusz Król at the Faculty of Computer Science and Management.
As a Search Quality Rater, you will work on many different types of rating projects. These guidelines cover just one type of search quality rating – URL rating.
Please take the time to carefully read through these guidelines. The ideas presented here are important for other types of rating. When you can do URL rating, you will be well on your way to becoming a successful Search Quality Rater!
This document provides reference information about the Gecko DOM (Document Object Model) API. It includes sections that describe the DOM window, DOM document, DOM elements, DOM events, DOM styles, and DOM ranges. The document is intended for web developers and explains how to access and manipulate web page content and structure using the DOM.
SIGURFREYR.COM: E-Magazine, Video-Gallery, Forum, Blog & SurveyGudmundur Sigurfreyr
The document summarizes a student project to redesign the Icelandic website Sigurfreyr.com using WordPress. The project involved creating four digital multimedia productions for Sigurfreyr.com including an e-magazine, video gallery, forum/blog, and survey. The students followed the Core Process workflow model to structure their work. They researched themes, plugins and widgets for WordPress, defined the target user group, developed a communication strategy, designed visual interfaces and integrated interaction elements like plugins. The goal was to draw traffic, encourage user involvement, and create a monetizable site through the redesign.
The document provides an overview of the Bing Phone Book Service, which is meant to support caller ID functionality for Skype Dialer. The service infers names for phone numbers, identifies spam callers through crowd-sourced blocking data, and stores users' address books and blocked lists. Key aspects include:
1) The service relies on registered user data, business listings, and inferred names from address book uploads to provide caller ID lookups.
2) Users can block numbers, and blocked numbers contribute to spam inferences for other users.
3) The service has workflows for registration, address book syncing, caller ID lookups, and blocking numbers.
4) Performance targets and scale assumptions are provided to guide
The document provides a user's guide for SMART Notebook SE. It introduces the key features and components of the SMART Notebook SE interface, including how to start the software, work with files and pages, create and manipulate objects, and use tools like the toolbar, gallery, page sorter and properties tab. It also provides instructions for presenting files, activating and updating the software, and accessing customer support resources.
Building a humane CMS for Plone: updated tutorialVincenzo Barone
Out of the box, Plone can be difficult for content managers to use, especially if they are infrequent contributors or non-technical users. This frequently leads to problems like wrong choices for content types, content places in wrong places, 'forgotten', abandoned content. This talk looks at tested best practices for making your Plone site easier to use for these content editors, and is appropriate for intranets and public sites. It demonstrates how to disable complex features you may not want, and suggests techniques that will allow your users to understand concepts like where to put content, how to tag it, and how to approve it. This talk was presented at the Plone Conference 2006, and is updated and being presented now as a hands-on tutorial, demonstrating how to apply this techniques on a real site.
This document outlines design principles, navigation paradigms, and patterns for a software interface. It describes fundamentals like copyright, patterns for displaying different types of information (e.g. fact sheets, guided activities), and specific interface patterns like calendars, images, lists, and filters. The document provides guidance for implementing a consistent and user-friendly interface.
This document introduces building a cookbook application in Ruby on Rails. It discusses configuring the database to use SQLite and generating a recipes interface using scaffolding. Scaffolding automatically generates the database migration file, model, views, and controller to quickly develop the interface. The chapter also covers testing the application and how Rails provides these features through its conventions and components.
As a Search Quality Rater, you will work on many different types of rating projects. These guidelines cover just one type of search quality rating – URL rating.
Please take the time to carefully read through these guidelines. The ideas presented here are important for other types of rating. When you can do URL rating, you will be well on your way to becoming a successful Search Quality Rater!
This document provides reference information about the Gecko DOM (Document Object Model) API. It includes sections that describe the DOM window, DOM document, DOM elements, DOM events, DOM styles, and DOM ranges. The document is intended for web developers and explains how to access and manipulate web page content and structure using the DOM.
SIGURFREYR.COM: E-Magazine, Video-Gallery, Forum, Blog & SurveyGudmundur Sigurfreyr
The document summarizes a student project to redesign the Icelandic website Sigurfreyr.com using WordPress. The project involved creating four digital multimedia productions for Sigurfreyr.com including an e-magazine, video gallery, forum/blog, and survey. The students followed the Core Process workflow model to structure their work. They researched themes, plugins and widgets for WordPress, defined the target user group, developed a communication strategy, designed visual interfaces and integrated interaction elements like plugins. The goal was to draw traffic, encourage user involvement, and create a monetizable site through the redesign.
The document provides an overview of the Bing Phone Book Service, which is meant to support caller ID functionality for Skype Dialer. The service infers names for phone numbers, identifies spam callers through crowd-sourced blocking data, and stores users' address books and blocked lists. Key aspects include:
1) The service relies on registered user data, business listings, and inferred names from address book uploads to provide caller ID lookups.
2) Users can block numbers, and blocked numbers contribute to spam inferences for other users.
3) The service has workflows for registration, address book syncing, caller ID lookups, and blocking numbers.
4) Performance targets and scale assumptions are provided to guide
The document provides a user's guide for SMART Notebook SE. It introduces the key features and components of the SMART Notebook SE interface, including how to start the software, work with files and pages, create and manipulate objects, and use tools like the toolbar, gallery, page sorter and properties tab. It also provides instructions for presenting files, activating and updating the software, and accessing customer support resources.
Building a humane CMS for Plone: updated tutorialVincenzo Barone
Out of the box, Plone can be difficult for content managers to use, especially if they are infrequent contributors or non-technical users. This frequently leads to problems like wrong choices for content types, content places in wrong places, 'forgotten', abandoned content. This talk looks at tested best practices for making your Plone site easier to use for these content editors, and is appropriate for intranets and public sites. It demonstrates how to disable complex features you may not want, and suggests techniques that will allow your users to understand concepts like where to put content, how to tag it, and how to approve it. This talk was presented at the Plone Conference 2006, and is updated and being presented now as a hands-on tutorial, demonstrating how to apply this techniques on a real site.
This document outlines design principles, navigation paradigms, and patterns for a software interface. It describes fundamentals like copyright, patterns for displaying different types of information (e.g. fact sheets, guided activities), and specific interface patterns like calendars, images, lists, and filters. The document provides guidance for implementing a consistent and user-friendly interface.
This document introduces building a cookbook application in Ruby on Rails. It discusses configuring the database to use SQLite and generating a recipes interface using scaffolding. Scaffolding automatically generates the database migration file, model, views, and controller to quickly develop the interface. The chapter also covers testing the application and how Rails provides these features through its conventions and components.
This document provides an overview of IBM Watson Content Analytics and how it can be used to gain insights from unstructured content. It discusses the architecture of Content Analytics, which includes ingesting and processing unstructured data using natural language processing techniques. It then provides several use case examples where Content Analytics has been applied, such as for customer insights, healthcare, and investigations. The document also covers best practices for designing Content Analytics solutions and understanding the types of analysis that can be performed.
This document provides an overview of IBM Watson Content Analytics and describes how it can be used to gain insights from unstructured content. It discusses the product's history and key features in version 3.0. Some main capabilities include performing automated content analysis, discovering patterns and correlations in data, and gaining insights to improve products and services. The document also provides examples of how Content Analytics has been applied in various use cases, such as customer service, healthcare, and investigations.
This document provides an overview and instructions for installing and configuring Slackware Linux. It covers getting and installing Slackware, system requirements, partitioning disks during setup, selecting and compiling kernels, configuring network hardware and network protocols like TCP/IP, and setting up dial-up connections using PPP. The table of contents provides additional details on topics like file system layout, finding files on the system, and managing kernel modules.
The document contains architectural drawings for the Masjid Quwwatul Islam Al Banjari mosque in Yogyakarta, Indonesia from 2015. It includes site plans, floor plans for the basement through third levels, roof plan, elevations, sections, and detail drawings. The plans and drawings provide documentation of the design and layout of the mosque building.
This document provides documentation for the LibraryAdmin library management software. It outlines sections for administrators, librarians, and users. The administrator guide covers setting up the library catalog by creating entities like users, books, categories and locations. The user guide explains functions for librarians such as issuing books, returning books, and generating reports. Overall, the document serves as a manual to configure and operate the LibraryAdmin software.
The document provides an overview of MongoDB, including instructions for installing MongoDB on different operating systems like Mac OS, Windows, and Linux. It also includes explanations of basic MongoDB terminology and commands for interacting with MongoDB using the mongo shell.
This document provides instructions for using the Scrapbook+ application version 2.2. It describes how to install Scrapbook+, start the application, capture pages by autopasting from the clipboard or taking screen captures, save Scrapbook+ files, open existing files, start new files, export pages to individual files, and cut, copy, and paste pages. The manual also contains information on configuring Scrapbook+, deleting pages, moving between pages, and finding pages.
Author Sioux Cumming, INASP
Date October 2010
Summary These notes are intended as a guide to managing and publishing a
journal using the JOL system
The guidelines may be updated at any time
If any users find any errors, or would like to make any suggestions for
improvements, please send these to scumming@inasp.info
This document is the user manual for EMS Data Export 2010 for MySQL version 3.3. It contains information about the product's features and how to use its wizard application to export data from MySQL databases to various file formats like Excel, Access, Word, and HTML. The manual has sections covering the wizard's steps for setting connection options, selecting tables and queries, choosing an export format and fields, and configuring format-specific export options.
This chapter introduces Joomla, an open source content management system (CMS) used to build websites. Joomla is derived from the Swahili word meaning "all together" and is the result of a discussion between the Mambo Foundation and its development team in 2005. Joomla has seen major version releases over the years including 1.0, 1.5, 1.6, 1.7, and the upcoming long term support (LTS) version 2.5. Joomla is used worldwide on simple homepages and complex corporate websites due to its ease of installation, management, and reliability. While many existing users have upgraded over the years, there are still those unfamiliar with the system.
Conceptualising and Measuring the Sport Identity and Motives of Rugby League ...Daniel Sutic
This thesis examines the sport identities and motivations of rugby league supporters. It aims to understand supporters' identities as spectators, fans or members, what motivates their support, and the factors influencing transitions between identities. An online survey and interviews were conducted with supporters of an NRL club. The survey measured identity and motivations using established scales. Interviews identified themes influencing support, including family, friends, player behavior and location. Results showed excitement, esteem and watching performance were key motivations. Family, social circles and the experience at matches can impact involvement levels and identity transitions. These findings provide insights for clubs to increase membership and participation.
SchoolAdmin - School Fees Collection & Accounting SoftwareRanganath Shivaram
MarvelSoft SchoolAdmin Lite Edition is a school fee collection software. You can manage student information, fees defaulters, Generate & print fees and student reports in word, pdf and excel
B4X is a set of cross platform RAD development tools that target Android, iOS, desktops, servers and IoT devices.
More information: https://www.b4x.com
Report includes, but not limited to: Pest, swot, and 5 Forces analysis, as well as summary of business operations, assets (tangible and intangible) and suggestions & recommendations for the company based on opportunities for future growth and development of the business.
This document provides an overview of the Microsoft Office 2003 Editions product guide. It discusses the key goals of connecting people, information, and business processes. It outlines new collaboration features in Office 2003 for sharing workspaces and documents, integrating web discussions and meeting workspaces. It also covers features for managing team time like presence integration and team calendars. The document details enhancements in Outlook 2003 for streamlining email workflows and organizing contacts and calendar items. It explores using XML in Office 2003 to free up critical business information and integrate business processes across applications like Word, Excel and Access.
Here are the key steps for browsing component classes using the PCB editor panel:
- Select "Component Classes" from the drop down box at the top of the panel. This will list all component classes in the PCB.
- Click on a component class name to select it. The components that belong to that class will then be listed in the lower scroll box.
- You can click on a component name to highlight it in the MiniViewer.
- Click the "Edit" button to open the Edit Component Class dialog box for the selected class. This allows you to modify the class properties.
- Double clicking a component name will also open the Edit Component dialog box for that specific component.
-
The document provides an overview of the DotNetNuke platform and its features. It discusses the DotNetNuke Corporation and highlights of the DNN 7.0 release including new features for content editors, enterprises, web designers, and developers. The document also covers the DotNetNuke platform, mobile optimization, SharePoint integration, editions comparison, social features, training, resources, and limitations. It includes appendices on installation and bibliography.
This document outlines the official basketball rules as approved by FIBA for 2014. It contains 8 rules that cover: 1) the game, court, and equipment; 2) teams and substitutions; 3) playing regulations; 4) violations; 5) fouls; 6) general provisions; 7) officials, table officials, and commissioner duties; and 8) officials' signals. Diagrams are also included showing the regulation court, restricted areas, and 2/3-point lines. The rules provide the framework for how the game is to be played at the competitive level internationally.
Martin Gregory - Capability Statement 2013Martin Gregory
Martin Gregory is a specialist in facilities management strategy with over 24 years of experience. He provides consultancy services for major developments across the MENA region, including feasibility studies, master planning, design reviews, operational strategies, financial modeling, and procurement strategies. Some of his current and past projects include developments in Damascus, Beirut, and Abu Dhabi.
This document specifies the Linked Media Layer architecture and describes its key components. The architecture includes a repository layer for media storage and metadata, an integration layer, and a service layer. It also describes modules for unstructured search using Apache Nutch/Solr, media collection from social networks, searching media resources with latent semantic indexing, and participation in the MediaEval 2013 benchmarking initiative for video search and hyperlinking tasks.
This document provides an introduction and overview of data structures and algorithms. It discusses linked lists, binary search trees, heaps, sets, queues, and the AVL tree data structure. It also covers sorting algorithms like merge sort, quicksort, and insertion sort as well as numeric algorithms for primality testing, base conversions, finding greatest common denominators, and more. The goal is to provide annotated references and examples of how to implement and use various common data structures and algorithms.
This document is Steven Edward Atkin's dissertation submitted to Florida Institute of Technology in partial fulfillment of the requirements for a Doctor of Philosophy in Computer Science. The dissertation proposes a new multi-layered architecture called Metacode for multilingual information processing. It analyzes existing character encoding schemes and algorithms like bidirectional reordering and character normalization, and argues they lack a coherent architectural framework. The dissertation then introduces abstractions and mechanisms to organize these elements into the Metacode architecture.
This document provides an overview of IBM Watson Content Analytics and how it can be used to gain insights from unstructured content. It discusses the architecture of Content Analytics, which includes ingesting and processing unstructured data using natural language processing techniques. It then provides several use case examples where Content Analytics has been applied, such as for customer insights, healthcare, and investigations. The document also covers best practices for designing Content Analytics solutions and understanding the types of analysis that can be performed.
This document provides an overview of IBM Watson Content Analytics and describes how it can be used to gain insights from unstructured content. It discusses the product's history and key features in version 3.0. Some main capabilities include performing automated content analysis, discovering patterns and correlations in data, and gaining insights to improve products and services. The document also provides examples of how Content Analytics has been applied in various use cases, such as customer service, healthcare, and investigations.
This document provides an overview and instructions for installing and configuring Slackware Linux. It covers getting and installing Slackware, system requirements, partitioning disks during setup, selecting and compiling kernels, configuring network hardware and network protocols like TCP/IP, and setting up dial-up connections using PPP. The table of contents provides additional details on topics like file system layout, finding files on the system, and managing kernel modules.
The document contains architectural drawings for the Masjid Quwwatul Islam Al Banjari mosque in Yogyakarta, Indonesia from 2015. It includes site plans, floor plans for the basement through third levels, roof plan, elevations, sections, and detail drawings. The plans and drawings provide documentation of the design and layout of the mosque building.
This document provides documentation for the LibraryAdmin library management software. It outlines sections for administrators, librarians, and users. The administrator guide covers setting up the library catalog by creating entities like users, books, categories and locations. The user guide explains functions for librarians such as issuing books, returning books, and generating reports. Overall, the document serves as a manual to configure and operate the LibraryAdmin software.
The document provides an overview of MongoDB, including instructions for installing MongoDB on different operating systems like Mac OS, Windows, and Linux. It also includes explanations of basic MongoDB terminology and commands for interacting with MongoDB using the mongo shell.
This document provides instructions for using the Scrapbook+ application version 2.2. It describes how to install Scrapbook+, start the application, capture pages by autopasting from the clipboard or taking screen captures, save Scrapbook+ files, open existing files, start new files, export pages to individual files, and cut, copy, and paste pages. The manual also contains information on configuring Scrapbook+, deleting pages, moving between pages, and finding pages.
Author Sioux Cumming, INASP
Date October 2010
Summary These notes are intended as a guide to managing and publishing a
journal using the JOL system
The guidelines may be updated at any time
If any users find any errors, or would like to make any suggestions for
improvements, please send these to scumming@inasp.info
This document is the user manual for EMS Data Export 2010 for MySQL version 3.3. It contains information about the product's features and how to use its wizard application to export data from MySQL databases to various file formats like Excel, Access, Word, and HTML. The manual has sections covering the wizard's steps for setting connection options, selecting tables and queries, choosing an export format and fields, and configuring format-specific export options.
This chapter introduces Joomla, an open source content management system (CMS) used to build websites. Joomla is derived from the Swahili word meaning "all together" and is the result of a discussion between the Mambo Foundation and its development team in 2005. Joomla has seen major version releases over the years including 1.0, 1.5, 1.6, 1.7, and the upcoming long term support (LTS) version 2.5. Joomla is used worldwide on simple homepages and complex corporate websites due to its ease of installation, management, and reliability. While many existing users have upgraded over the years, there are still those unfamiliar with the system.
Conceptualising and Measuring the Sport Identity and Motives of Rugby League ...Daniel Sutic
This thesis examines the sport identities and motivations of rugby league supporters. It aims to understand supporters' identities as spectators, fans or members, what motivates their support, and the factors influencing transitions between identities. An online survey and interviews were conducted with supporters of an NRL club. The survey measured identity and motivations using established scales. Interviews identified themes influencing support, including family, friends, player behavior and location. Results showed excitement, esteem and watching performance were key motivations. Family, social circles and the experience at matches can impact involvement levels and identity transitions. These findings provide insights for clubs to increase membership and participation.
SchoolAdmin - School Fees Collection & Accounting SoftwareRanganath Shivaram
MarvelSoft SchoolAdmin Lite Edition is a school fee collection software. You can manage student information, fees defaulters, Generate & print fees and student reports in word, pdf and excel
B4X is a set of cross platform RAD development tools that target Android, iOS, desktops, servers and IoT devices.
More information: https://www.b4x.com
Report includes, but not limited to: Pest, swot, and 5 Forces analysis, as well as summary of business operations, assets (tangible and intangible) and suggestions & recommendations for the company based on opportunities for future growth and development of the business.
This document provides an overview of the Microsoft Office 2003 Editions product guide. It discusses the key goals of connecting people, information, and business processes. It outlines new collaboration features in Office 2003 for sharing workspaces and documents, integrating web discussions and meeting workspaces. It also covers features for managing team time like presence integration and team calendars. The document details enhancements in Outlook 2003 for streamlining email workflows and organizing contacts and calendar items. It explores using XML in Office 2003 to free up critical business information and integrate business processes across applications like Word, Excel and Access.
Here are the key steps for browsing component classes using the PCB editor panel:
- Select "Component Classes" from the drop down box at the top of the panel. This will list all component classes in the PCB.
- Click on a component class name to select it. The components that belong to that class will then be listed in the lower scroll box.
- You can click on a component name to highlight it in the MiniViewer.
- Click the "Edit" button to open the Edit Component Class dialog box for the selected class. This allows you to modify the class properties.
- Double clicking a component name will also open the Edit Component dialog box for that specific component.
-
The document provides an overview of the DotNetNuke platform and its features. It discusses the DotNetNuke Corporation and highlights of the DNN 7.0 release including new features for content editors, enterprises, web designers, and developers. The document also covers the DotNetNuke platform, mobile optimization, SharePoint integration, editions comparison, social features, training, resources, and limitations. It includes appendices on installation and bibliography.
This document outlines the official basketball rules as approved by FIBA for 2014. It contains 8 rules that cover: 1) the game, court, and equipment; 2) teams and substitutions; 3) playing regulations; 4) violations; 5) fouls; 6) general provisions; 7) officials, table officials, and commissioner duties; and 8) officials' signals. Diagrams are also included showing the regulation court, restricted areas, and 2/3-point lines. The rules provide the framework for how the game is to be played at the competitive level internationally.
Martin Gregory - Capability Statement 2013Martin Gregory
Martin Gregory is a specialist in facilities management strategy with over 24 years of experience. He provides consultancy services for major developments across the MENA region, including feasibility studies, master planning, design reviews, operational strategies, financial modeling, and procurement strategies. Some of his current and past projects include developments in Damascus, Beirut, and Abu Dhabi.
This document specifies the Linked Media Layer architecture and describes its key components. The architecture includes a repository layer for media storage and metadata, an integration layer, and a service layer. It also describes modules for unstructured search using Apache Nutch/Solr, media collection from social networks, searching media resources with latent semantic indexing, and participation in the MediaEval 2013 benchmarking initiative for video search and hyperlinking tasks.
This document provides an introduction and overview of data structures and algorithms. It discusses linked lists, binary search trees, heaps, sets, queues, and the AVL tree data structure. It also covers sorting algorithms like merge sort, quicksort, and insertion sort as well as numeric algorithms for primality testing, base conversions, finding greatest common denominators, and more. The goal is to provide annotated references and examples of how to implement and use various common data structures and algorithms.
This document is Steven Edward Atkin's dissertation submitted to Florida Institute of Technology in partial fulfillment of the requirements for a Doctor of Philosophy in Computer Science. The dissertation proposes a new multi-layered architecture called Metacode for multilingual information processing. It analyzes existing character encoding schemes and algorithms like bidirectional reordering and character normalization, and argues they lack a coherent architectural framework. The dissertation then introduces abstractions and mechanisms to organize these elements into the Metacode architecture.
This document proposes a system to allow a robot to automatically find a path to a predefined goal in uncontrolled environments. The system has three main modules: 1) An artificial vision module that obtains a quantified representation of the robot's vision using local feature detection and visual words. 2) A reinforcement learning module that receives the vision input and sensor data to compute the state and reward. The state is a normalized vector and sensor data, and reward is based on distance to the goal. 3) A behavior control module. The robot is tested using Sony Aibo to seek the goal and change behavior based on experience, but does not find the optimal route.
This document provides a software architecture design for a collaborative problem solver called ProjectPlace. It describes the modules, data structures, and interfaces that will be used to implement the project. The design follows a three-tier architecture pattern with modules for the client applet, server, logger, common room, project room, and plugins. The modules are decomposed into concurrent processes on the client and server. Data sharing and storage is also described at a high level. This architecture aims to fulfill the essential requirements set out in the system requirements specification.
This document provides a software architecture design for a collaborative problem solver called ProjectPlace. It describes the modules, data structures, and interfaces that will be used to implement the project. The design follows a three-tier architecture pattern with modules for the client applet, server, logger, common room, project room, and plugins. The modules are decomposed into concurrent processes on the client and server. Data sharing and storage is also described at a high level. This architecture aims to fulfill the essential requirements set out in the system requirements specification.
This document describes the software architecture design for ProjectPlace. It outlines a three-tier architecture with modules for the client applet, server, logger, common room, project room, and plugins. The document scope is the architecture design and product scope is ProjectPlace. It provides high-level descriptions of each module and their inputs/outputs.
This document describes the software architecture design for ProjectPlace. It outlines a three-tier architecture with modules for the client applet, server, logger, common room, project room, and plugins. The document scope is the architecture design and product scope is ProjectPlace. It provides high-level descriptions of each module and their inputs/outputs.
This document describes the software architecture design for ProjectPlace. It outlines a three-tier architecture with modules for the client applet, server, logger, common room, project room, and plugins. The document scope is the architecture design, which reflects the requirements from the SRS and serves as the basis for more detailed design. It defines the inputs, outputs, and responsibilities of each module.
This document provides a software architecture design for a collaborative problem solver called ProjectPlace. It describes the modules, data structures, and interfaces that will be used to implement the project. The design follows a three-tier architecture pattern with modules for the client applet, server, logger, common room, project room, and plugins. The modules are decomposed into concurrent processes on the client and server. Data sharing and storage is also described at a high level. This architecture aims to fulfill the essential requirements set out in the system requirements specification.
The document is a thesis submitted by Maliththa S. S. Bulathwela for the degree of Master of Science in Computational Statistics and Machine Learning at University College London. The thesis explores building a self-adaptive topic engine to extract insights from customer feedback data. Initial work uses supervised support vector machines for topic classification and adapts trust modeling techniques to enhance the reliability of crowd-sourced labeled data. Latent Dirichlet allocation is then used to detect emerging topics from unlabeled data. The results were promising, suggesting further work could build self-adapting topic engines using techniques from the thesis.
This document provides an overview and introduction to dimensional modeling for business intelligence. It discusses how dimensional modeling differs from traditional SQL and E/R modeling by focusing on query performance and ease of analysis rather than data storage and transactions. The document also outlines some key concepts in dimensional modeling like fact tables, dimension tables, and grains. It emphasizes that dimensional modeling helps optimize data access and analysis for business intelligence activities.
This document outlines lecture notes on machine learning. It introduces machine learning and discusses different paradigms of learning including assigning parameters, rote learning, knowledge acquisition, concept learning from examples, and neural networks. It covers topics such as concept learning, languages for learning, version space learning, induction of decision trees, covering strategies, searching generalization graphs, inductive logic programming, Bayesian approaches, minimum description length principle, unsupervised learning, and explanation-based learning.
XAdES Specification based on the Apache XMLSec Project Krassen Deltchev
This B.Sc. project thesis is presented to the
Department of Electrical Engineering and Information Sciences
of the Ruhr-University of Bochum
Chair of Network and Data Security
of the Ruhr-University of Bochum,
Horst-Görtz Institute,
Prof. Jörg Schwenk
Abstract:
XML Advanced Electronic Signature (XAdES) provides basic authentication and integrity protection, and
satisfies the legal requirements for advanced electronic signatures.There are several implementations of
XAdES, but most of them are not OpenSource, or are partialy proprietary software. Great project concerned
with Digital Electronic Signatures is the OpenSource Apache XML Security Project. For the developer and
common user there is an implementation for the XMLDSIG specification, but still no one for XAdES.
The free source code implemetations of XAdES threat this project as a separate one and there is no interface,
which can explicit assemble them into the Apache XML Sec. That’s why, the scope of our project is to create
a library, that implements XAdES into the OpenSource Apache XML Security- to extend its functionality
and level of security, so using the Apache XML Sec, gives the opportunity to handle Advanced Electronic
Signatures, which is a standard of security nowadays.
The library is developed in Java, because shouldn’t be any kind of OS platform - dependencies, using it as a
plug-in to the Security Project of Apache.
More detailed, to validate the signing and verifying of signatures, and also test our code, we use the text-
based test suite of JUnit.
This document provides best practices for using IBM Blueworks Live to conduct process discovery. It discusses starting a process discovery project in Blueworks Live, creating a "Discovery Map" to identify milestones and activities, developing a process diagram using BPMN notation, and using the "Analyze Mode" feature to analyze bottlenecks and opportunities for improvement. The overall goal is to uncover an as-is view of key business processes in order to identify areas that could benefit from process optimization or automation.
This document provides best practices for using IBM Blueworks Live to conduct process discovery. It discusses starting a process discovery project in Blueworks Live, creating a "Discovery Map" to identify milestones and activities, developing a process diagram using BPMN notation, and using the "Analyze Mode" feature to analyze bottlenecks and opportunities for improvement. The overall goal is to uncover an as-is view of key business processes in order to identify areas that could benefit from process optimization or automation.
Information extraction systems aspects and characteristicsGeorge Ang
This document provides a survey of information extraction systems and techniques. It discusses the main components and design approaches of information extraction, including manual and automatic pattern discovery. It also reviews several important prior information extraction systems and approaches to wrapper generation, including both supervised and unsupervised methods. The document serves to describe the state of the art in information extraction and provide an overview of the field.
This document is a master's thesis that examines localization techniques in wireless sensor networks. It provides background on wireless sensor networks and how they emerged from military applications but are now used in various civil applications. The thesis focuses on developing and analyzing new localization algorithms. It presents the results of experiments measuring received signal strength indication (RSSI) from wireless sensor nodes, which indicate significant fluctuations that could limit the reliability of localization schemes. Overall, the thesis evaluates localization methods and develops new algorithms to improve positioning accuracy in wireless sensor networks.
This document outlines the steps to build a blog system using the Yii PHP framework. It begins with installing Yii and creating a skeleton application. It then discusses requirements analysis and overall design. Subsequent chapters cover initial prototyping including setting up the database, scaffolding models and CRUD operations, and user authentication. Later chapters customize the blog models, controllers and views for posts and comments. It also covers creating reusable portlets and finalizing the application.
This document is a 241-page tutorial on the Perl 5 programming language. It was written by Chan Bernard Ki Hong and published in 2003. The document contains an introduction to programming concepts, getting started with Perl, manipulating data structures in Perl like scalars, lists, arrays and hashes. It also covers operators, conditionals, loops, subroutines and other Perl concepts. The author provides their contact information and requests feedback to improve the quality of the publication.
Similar to Analysis Of The Modern Methods For Web Positioning (20)
Analysis Of The Modern Methods For Web Positioning
1. Faculty of Computer Science and Management
field of study: Computer Science
specialization: Software Engineering
Master thesis
Analysis of the Modern Methods
for Web Positioning
Paweł Kowalski
keywords:
search engine, SEO, personalization, optimization, web positioning
Thesis contains an analysis of personalization of search results mechanism in
popular search engines. It presents experiments and considerations about impact of
personalization on search rankings and how it affects on Search Engine Optimization
(SEO). There are proposed some new SEO methods that take an advantage of
personalization in search engines.
Supervisor: dr inż. Dariusz Król ............................. .............................
name and surname grade signature
Do celów archiwalnych pracę dyplomową zakwalifikowano do:*
a) kategorii A (akta wieczyste)
b) kategorii BE 50 (po 50 latach podlegające ekspertyzie)
*
niepotrzebne skreślić
Stamp of the institute
Wrocław 2010
4. Abstract
Modern search engines are constantly improved. The most recent big step
introduced into their algorithms concern personalization mechanism. Its goal is to
extract information about user’s preferences implicitly from its search behaviour
and also such factors as location, phrase language and search history. This
information is the basis for building a user’s search profile. Motivation of this
process is to provide more relevant search results for specific user and its interests.
Thesis concern details of this personalization mechanism and try to examine
how various factors affect search results. Author also analyse the methods for
collecting behavioural data by search engines. He approaches to define possible
impact of customization of search results on the Search Engine Optimization
(SEO) issues like metrics, spam filtering or changes in significance of website
optimization factors. Then author tries to evaluate the possibility of personalized
search rankings manipulation through proposed system for generating human-like
web traffic.
Streszczenie
Nowoczesne wyszukiwarki internetowe są ciągle ulepszane. Ostatni duży krok na-
przód wprowadzony w ich algorytmach dotyczy mechanizmu personalizacji. Jego
zadaniem jest zdobycie informacji na temat preferencji użytkownika z jego za-
chowania podczas wyszukiwania informacji w wyszukiwarce pod względem takich
czynników jak jego lokalizacja, język wyszukiwanej frazy i historia wyszukiwań.
Te informacje są podstawą do utworzenia profilu użytkownika. Celem tych dzia-
łań jest zwrócenie konkretnemu użytkownikowi rezultatów wyszukiwania bardziej
odpowiadających jego zainteresowaniom. W pracy tej znajduje się szczegółowa
analiza mechanizmu personalizacji oraz próba zbadania jak poszczególne czyn-
niki wpływają na wyniki wyszukiwań. Autor analizuje także metody pozyskiwania
danych na temat zachowań użytkowników przez wyszukiwarki. Podejmuje próbę
określenia możliwego wpływu dostosowywania wyników wyszukiwania do użyt-
kownika na tematy związane z pozycjonowaniem witryn internetowych takie, jak
metryki, filtrowanie spamu lub zmiany w znaczeniu poszczególnych czynników
w optymalizacji stron WWW. Następnie autor próbuje ocenić możliwość mani-
pulowania spersonalizowanymi wynikami wyszukiwania poprzez zaproponowany
system służący do generowania naturalnie wyglądającego ruchu sieciowego.
iii
5. Chapter 1
Introduction
Before the Web and present-day search engines, searching meant simple matching the
terms in a query to the exact appearance of these terms in a database filled with textual
documents. Some database searches let you only locate documents where certain words
appeared within a defined distance from other specified words from the same document.
Sorting documents by relevance or importance would have been a monumental task, if
possible at all.
1.1. Beginning of the SEO Concept
When the Internet was introduced, it revolutionized the worldwide share of information.
Free access for everyone to this web without any restrictions is the reason why the
Internet is considered to be one of the greatest inventions of the 20th century. But this
freedom in the Internet has serious implication – many problems with the organization
of this enormous set of information.
Hyperlinks turned out to be insufficient for the issue. This is why the first search engines
was introduced. They quickly became the main source of visits in the commercial web-
sites. A good search results started to be very important issue for content publishers.
Those moment was the beginning of the SEO1 concept which is still the major element
of the Internet marketing.
The early search engines like AltaVista or Lycos were launched around 1994–1995 [7].
Their algorithms have only been analysing the content of websites and keywords in
meta tags. It was easy to circumvent these algorithms by placing false information
into keywords tags. Another popular fraud was filling website content with irrelevant
text, which was visible only for search engine robots, but not for the user. As a result,
search engine result pages (hereafter SERPs) contained websites filled with spam and
inappropriate content [21].
1. Search Engine Optimization (SEO) – the process of improving the volume or quality of traffic to
a website from search engines. It is also used to take Web Positioning term as a synonym.
6. Chapter 1. Introduction 2
1.2. Search Engines Evolution
However, the relevance of the search results to the query is still based on keyword
matching. But search engines started to understand differences in the importance of
words located in different parts of a page. For example, if you searched for a certain
phrase, pages containing those words in their titles and headlines might be considered
more relevant than other pages where those words also appeared, but not in those
“important” parts of pages.
Google company, which started in September 1999, revolutionized search engines. Its
co-founders, Larry Page and Siergiej Brin, developed PageRank algorithm [3]. This al-
gorithm redefined search issue. Content of the websites became slightly less significant.
Instead of text content, PageRank rates the websites mainly on the basis of quantity
and quality of links leading to these websites. With help of such improvements, the
Internet works as a kind of voting system. Every link is a vote for a website which leads
to.
Relevance was also found by indexing words that link to other pages. If a link leading to
a page used the phrase ”american basketball” as anchor text2 , the page being pointed
to would be considered relevant to the American basketball. The existence of links to
pages also has been used to help define the perceived importance of a page. Information
about the quality and quantity of links to a page can be used by search engines to get
a sense of implied importance of the page being linked.
Nevertheless, after a short time, the techniques [21] spoiling the results of PageRank
have also been discovered. Basically, most of them work in such a way to increase the
number of links leading to a particular website to enhance its PageRank value. Such
activity in a large scale is usually called linkbaiting. There are many scripts and web
catalogs to facilitate and to automate such activity. However, Google is also constantly
working to improve their search engine algorithm. They try to make it resistant to
linkbaiting. According to [11], many new factors are being introduced into websites
evaluation process, in order to reduce the impact of linkbaiting which is a sort of
spam.
Besides, there is a limit to the effectiveness of this type of keyword matching. When
two people perform a search at one of the major search engines, there is a chance that
even if they use the same search terms, they might be looking for something completely
different. For example, when an anthropologist searches for phrase ”jaguar” he expects
websites with information about big cats as a result. But he can also receive a collection
of websites about Jaguar cars instead.
As search engines progressed and users were given more and more websites with valu-
able information, the engines needed to respond with a refined approach to search.
The main idea to improve relevance of the search results was to better understand
the user’s intent and expectation then he types a certain phrase into a search box. So
it seems that the next step in search engines improvement is tracking regular users
in the Internet. Collecting data on their activity might give useful information about
2. Link label or link title, text in a hyperlink visible and clickable by the user.
7. Chapter 1. Introduction 3
which websites are valuable for them. The major engines such as Google, Yahoo and
Bing guard their search secrets closely, so one can never be absolutely certain how they
are operating. But they are evolving, and personalization seems to be the wave of the
future.
1.3. The Goal
It seems to be quite clear that search engines fitted with personalization mechanism
would have two main benefits:
1. Improvement of search results relevance for specific user.
2. Decrease of number of spam entries in SERPs.
Google, the leader in search engine market, is already the first steps behind in this area.
Such information reaching us from the official company blog [13]. Moreover they already
has several patents connected with the personalization mechanism. For this reason this
thesis will be mainly concerned with the Google Search. But high competition in the
Internet market suggests, that other popular search engines like Yahoo and Bing are
also being improved in this direction.
The goal of this thesis is to analyse the possible aspects of the personalization mech-
anism in Google Search, on the basis of available information. There will be sev-
eral factors taken into account, which can have an influence on the changeability of
SERPs:
• geolocation
• language of the query
• web search history
• query complexity
• search behaviour (e.g. bounce rates3 , time of visits)
It will be the base for several experiments which should determine how advanced is
the current level of personalization introduced into considered search engine. This re-
search also includes an analysis of the data used to describe users’ search behaviour –
particularly the methods of collecting these data and types of them.
The obtained results will be used to specify the potential impact on SEO and its
metrics, in particular:
• the possibility of using personalization of search engine to create a new SEO
techniques
3. Bounce rate is a term used in website traffic analysis. It essentially represents the percentage of
initial visitors to a site who ”bounce” away to a different site, rather than continue on to other
pages within the same website.
8. Chapter 1. Introduction 4
• usability of website ranking in search results as the measure of success in SEO
activity
Personalization is the opportunity for search engines to make spam less significant for
search results and make SEO workers’ life harder. But it is only the spam in the sense of
collection of websites with content of no value for human and irrelevant hyperlinks. But
personalization also opens door for a another kind of information noise – behavioural
data spam. The thesis presents the architecture of the distributed system generating
artificial web traffic, therefore, imitating search activity of a real user. However, using
such system can be seen as unethical, so the thesis contains only a conception and a
design. Author has no intention to implement such system, but he tries to examine
with available tools if building it would be reasonable. In this way might be indicated
possible harmful actions, whom search engines should be protected against.
After that, there is a short analysis of the known up-to-date information about signif-
icant factors in web positioning. Together with results of the personalization research,
they helped to prepare a collection of advices how to build a website attractive for
search engines. It is a sort of a guide for webmasters.
At the end of thesis there is short conclusion. It contains author’s few thoughts about
future trends in search engines and SEO.
9. Chapter 2
Personalized Search
Pretschner [27] in 1999 wrote: With the exponentially growing amount of information
available on the Internet, the task of retrieving documents of interest has become in-
creasingly difficult. Search engines usually return more than 1,500 results per query,
yet out of the top twenty results, only one half turn out to be relevant to the user. One
reason for this is that Web queries are in general very short and give an incomplete
specification of individual users’ information needs.
To be more specific, Speretta [31] in 2005 wrote: [...] most common query length sub-
mitted to a search engine (32.6%) was only two words long and 77.2% of all queries
were three words long or less. These short queries are often ambiguous, providing little
information to a search engine on which to base its selection of the most relevant Web
pages among millions.
According to Wikipedia, Google in 2006 has indexed over 25 billion web pages, 400
million queries per day, 1.3 billion images, and over one billion Usenet messages. The
Internet grows very quickly. For this reason search accuracy is crucial area for con-
stant improvement in modern search engines. One of the major solutions to meet the
challenge is personalization.
Personalized search is simply an attempt to deliver more relevant and useful results
to the end user (searcher) and minimize less useful results. Personalization mechanism
uses information about user’s past actions and behaviour to specify his profile and
match relevant search result to this profile. It should provide more useful set of results
or a set of results with less irrelevant or spam entries. For this reason personalized
search seems to be desirable to the end user.
Google puts it in this way: Search algorithms that are designed to take your personal
preferences into account, including the things you search for and the sites you visit,
have better odds of delivering useful results [13]. The goal is simple: to reduce spam
and to deliver better results. This looks like dangerous weapon against SEO workers
which are major offenders in generating spam.
10. Chapter 2. Personalized Search 6
Official information [13], [25], [38] indicates that Google is the only major search engine
that already introduce personalization mechanism. First personalized search result ap-
peared almost 5 years ago [13], and from that time it has been constantly evolving.
2.1. Operation Principles
Of course details of Google’s search algorithms are not public. But it can be expected,
that the main principles are based on the ideas which can be found in the scientific
literature.
According to [31], personalization can be applied to search in two different ways:
1. by providing tools that help users organizing their own past searches, preferences,
and visited URLs
2. by creating and maintaining sets of user’s interests, stored in profiles, that can be
used by retrieval process of a search engine to provide better results.
His research proved, that user profiles can be implicitly created out of the limited
amount of information available to the search engine itself. The profiles are built on
the basis of the user’s interactions with a particular search engine. Google has applied
this second approach in their search engine, because they do not provide any additional
tools like toolbars or browser add-ons for personalizing search.
After [31]: In order to learn about a user, systems must collect personal information,
analyze it, and store the results of the analysis in a user profile. Information can be
collected from users in two ways: explicitly, for example asking for feedback such as
preferences or ratings; or implicitly, for example observing user behaviors such as the
time spent reading an on-line document.
Google Search do not provide any forms, which let users to specify their interests
and preferences. So to build user profile, this information must be collected in other
way. According to [31], user browsing histories are the most frequently used source
of information to create interest profiles. But not only browsing history (like such
presented in figure 2.1) is significant.
For example, a user after sending search query gets search results. He selects a specific
entry seemed to be interesting. He clicks on it and the website is saved in his browsing
history. However, user quickly realizes, that the selected website did not fit to his
interests and goes back to the search results after few seconds. Such visit should be
qualified rather negatively. So not only browsing history, but also a user’s behaviour
should be taken into consideration in the personalization mechanism.
Also studying a series of searches from the same user may offer a glimpse into mod-
ified search behaviour. How does an individual change their queries after receiving
unsatisfactory results? Are search terms shortened, lengthened or combined with new
terms? There is much other information that a search engine might collect about a user
when a search is performed – location, language preferences indicated in their browser
or the type of device they are using (mobile phone, handheld or desktop). But how
11. Chapter 2. Personalized Search 7
Figure 2.1. Google Web History panel
such behavioural data can be collected by search engine? The answer is in the next
section.
2.2. Methods for the Analysis of Behavioural Data
Search engine robots, hereafter crawlers [26] continuously gather information from
almost every website on the Internet. It is well known that Google collects enormous
amount of data through this process. The reason for this is that these data have the
greatest significance for search engine algorithm, thus classic web positioning methods
are based on links maintenance, mainly the acquisition of them.
Google processes these data and sorts the websites according to its value to the user.
User sends queries to the search engine and gets appropriate SERPs. Because Google
knows what people search, they are able to determine the popularity of specific in-
12. Chapter 2. Personalized Search 8
formation on the Internet. But eventually, it is the user who decides which website
is valuable for him and which is not. The value of a particular website is reflected in
users’ activity – websites whose links have been clicked and time between these actions.
These information is called behavioural data.
It is reasonable to make all this data useful for search engine. Certainly, Google knows
that, too. Probably this is the reason why they collect enormous amount of behavioural
data in addition to data collected by crawlers. This kind of information is what this
study is the most interested in.
2.2.1. Methods of behavioural data collecting
The entire web is based on HTTP protocol which generates requests containing follow-
ing information:
• IP address of the user making request which can be used in geolocation of this
user,
• date and time of request,
• language spoken by the user,
• operating system of the user,
• browser of the user,
• address of the website which redirected user by the link to the requested website.
These HTTP requests are being used by Google in:
Click tracking – Google logs all of its users clicks on all of its services,
Forms – Google logs every piece of information typed into every sending form,
Javascript executing – requests and sometimes even more data are sent when a
user’s browser executes the script embedded in website,
Web Beacons – small (1 pixel by 1 pixel) transparent images in its websites, which
cause sending a requests every time while user’s browser tries to download such
images,
Cookies – small text information stored on user’s computer which lets Google track
users’ movement around the web every time when they get on any page that has
Google advertisement.
But these all elements has to be placed on websites being indexed by Google’s crawlers.
Fortunately for Google, they have a lot of services, very useful for Internet publishers.
Because they are mostly free to use, webmasters gladly use them in their websites.
Google Analytics
One of these attractive services is Google Analytics. It generates detailed statistics
about:
13. Chapter 2. Personalized Search 9
• the visitors to the website,
• the previous website of the visitor,
• activity (navigation) of the user in the website.
Figure 2.2. Google Analytics main panel
It is the most popular and one of the most powerful tool to examine the web traffic on
our website. It gives the owner a lot of useful information about visitors on his website.
Figure 2.2 shows a few features of this piece of software. The information which it
provides is certainly very interesting for the Google itself. For this reason there is a
lot of discussions between SEO workers about possible disadvantages of using Google
Analytics in SEO campaigns. It is because poor results of particular website indicated
through Google Analytics can inform Google’s search engine to decrease value of this
website in search ranking. But this is only an unconfirmed speculation.
Google Toolbar
Another Google’s tool which provides them with even more valuable data is Google
Toolbar. It is the plug-in adding a few new features to popular browsers, mainly a
quick access to Google’s services. One of these features is checking the PageRank value
14. Chapter 2. Personalized Search 10
for currently viewing webpage. This gives Google the information about every website
which the users with installed Toolbar are viewing.
Google AdSense
There is also Google AdSense - context advertising program for website publishers.
There are millions of websites which uses this service to generate some financial profits
for their authors. The effect of this is that all these websites are displaying ads published
by Google’s servers. It can provide to Google a similar information as Google Analytics
and Google Toolbar.
Google Public DNS
The latest service launched by Google is Public DNS (Domain Name System) [23]. It
is said to be faster and more secure than others and this is how Google encourages us
to start using their DNS. It can generate massive amount of information about web
traffic. Every single query to the DNS can be analyzed by its provider. So the more
popular their DNS will be, the better for Google. It can provide a lot of information
helpful on defining websites popularity.
But because of DNS caching mechanisms [23], Google do not get all the desired infor-
mation. DNS client sends query only when he or she wants to visit a domain for the
first time. After he gets the IP address of this domain from DNS, he caches it for an
interval determined by a value called time to live (TTL). Every next visit during this
interval does not send any query. Consequently, Google is still in need for other services
to gather desired information about the activity of particular website’s visitors.
Other Google Services
Google has other very popular services like for example YouTube, Google Maps (Fig.
2.3) etc. They allow users to embed objects like videos or maps on their own websites.
There is also Google Reader which can indicate the popularity of particular websites
by counting their RSS1 subscribers.
There are many other ways for Google to gain the useful data [9]. In fact Google itself
admits to the use of all described techniques in its privacy policy [12]. Most of these
data are probably used by them to improve accuracy of their search engine and quality
of their services.
2.2.2. Process of tracking user
Described services can be a great source of behavioural data. It us no doubt on that. But
tracking user’s search activity process would be incomplete without data provided by
search engine. The next few sections present how the tracking process looks like.
1. RSS (most commonly expanded as Really Simple Syndication) is a family of web feed formats used
to publish frequently updated works – such as blog entries, news headlines, audio, and video – in
a standardized format
15. Chapter 2. Personalized Search 11
Figure 2.3. Google Maps example screen
Starting the session
When the user is opening the search engine site (typing the www.google.com address
in a browser), he sends an HTTP Request [37] to the server. This request contains the
IP address of the user’s computer. Thanks to this information the search engine has
the ability to relate following search queries with particular users. Each of them has
been assigned a unique session identifier, stored on the server. This is the beginning of
the user’s search session.
The identifier expires after a certain period of user’s inactivity. In this way the search
session is being terminated.
Sending the search query
The view presented on figure 2.4 should be known by every Internet surfer. This is the
place where user cane type his search query.
After the search query is sent, it is followed by two facts:
1. The query is stored into a database and connected with user’s session identifier.
Then the personalization mechanism takes advantage from it.
2. The query is analysed and used by search algorithm to provide relevant search
results to the user.
16. Chapter 2. Personalized Search 12
Figure 2.4. Google Search main screen
After that user receives an HTTP Response [37] with the search results as the HTML
code.
Result selection
In the figure 2.5 there is presented one of the results of ”query example” with the
hyperlink highlighted.
Figure 2.5. Example of the search result
Commonly click on the link sends HTTP Request to the server which link leads to. So
in this case, it should be sent to:
http://www.wisegeek.com/what-is-query-by-example.htm
But when you look into source code of the SERP, you will find URL like this:
http://www.google.com/url?sa=t&source=web&cd=6&ved=0CDMQFjAF&
url=http://www.wisegeek.com/what-is-query-by-example.htm&
ei=CekOTLT4EpHu0gTYitWXDg&usg=AFQjCNE3t34-kSehUAK8TFNwh5CV9K-OWg&
sig2=PdwrnqnhLhowpC8t5-06bw
The most important thing which can be noticed is that the links in the SERPs lead to
Google’s server. But the chosen website finally appears on the user’s screen, because
Google’s server is doing URL redirection (forwarding). This technique bears the down-
side of the short delay caused by the additional request to search engine server.
However, in this way search engine can log every user’s click in SERPs. What is more,
there are some additional data in the result’s URL which probably provide some extra
information. For example, the value of the cd parameter is the place number in the
17. Chapter 2. Personalized Search 13
current search ranking. What is more, this URL is can be seen by target server, because
browser are placing it into HTTP Request data as Referer field [37]. This fact is used by
software like Google Analytics to aggregate traffic sources of the website with Analytics
scripts. Thanks to this, the website owner can get the information of:
• the most popular search phrases that result in visits to his website
• the place in ranking of his website for particular search phrase and particular user
(it can vary due to personalization mechanism)
Of course the same information is being taken into consideration by search engine.
Behavioural data extraction
According to many research [1], [6], [7], [10], [20] and [36], more than half of a website’s
visits comes from the SERPs. In case of e-commerce websites, this value is even higher,
because such services usually do not have regular visitors. They mostly come from
search engines (even 90% of visits) or from ad appearing on other websites.
According to the analysis of real users web traffic [29], typical user spends about 2
hours per session and 5 minutes per page.
These statistics concerns a website of good quality, relevant to the user’s interests. Visit
on website with poor content would be terminated as soon as after few seconds – so
called bounce. Such visit should indicate irrelevance of website selected by user, so it
would be desirable if it wouldn’t appear in the search results on particular phrase.
Not only what you select/interact with from a given set of search results (or the Ads
served with them), but what you do not select or have minimal interactions with
(bounce rates) can have an effect. These metrics can be used to create a greater prob-
ability model for future search result sets.
What is more, a mechanism based on cookies have been recently introduced on Google
Search. It allows to learn about every user’s (also not logged in any Google’s service)
history of search queries from the last 180 days. Officially it is used to personalize
SERPs according to past interests of the user. Google [13] says about that: Because
many people might search from a single computer, the browser cookie may be associated
with more than one person’s search activity. For this reason, we don’t provide a method
for viewing this signed-out search activity.
The diagram in figure 2.6 shows the process of tracking user which has not been
signed-in to Google Account.
This is what Google [14] says about personalized search for signed-out users:
When you search using Google, you get more relevant, useful search results, recom-
mendations, and other personalized features. By personalizing your results, we hope to
deliver you the most useful, relevant information on the Internet.
In the past, the only way to receive better results was to sign up for personalized search.
Now, you can get customized results whenever you use Google. Depending upon whether
18. Chapter 2. Personalized Search 14
User Search Engine
Enter search engine via URL Extract profile information
[Else] Type search query Show profile based search page
Save cookie Search for relevant documents
[Else]
Look for interesting website Prepare search results
[Found history cookies]
[Found something interesting] Re-rank found documents
Click chosen website Log phrase-selection
Visit the website Redirect to selected website
[Else]
[Curiosity satisfied]
[Else]
Log bounce
Go back to search engine
[Visit longer
than 3 minutes]
Log visit
Close browser
Figure 2.6. Activity diagram of a visit session
or not you’re signed in to a Google Account when you search, the information we use
for customizing your experience will be different:
Signed-in personalization: When you’re signed in, Google personalizes your search
experience based on your Web History. If you don’t want to receive personalized
results while you’re signed in, you can turn off Web History and remove it from
your Google Account. You can also view and remove individual items from your
Web History.
Signed-out customization: When you’re not signed in, Google customizes your search
experience based on past search information linked to your browser, using a cookie.
Google stores up to 180 days of signed-out search activity linked to your browser’s
cookie, including queries and results you click.
19. Chapter 2. Personalized Search 15
Table 2.1. Information used by Google Search in personalization
Signed-in Personalized Signed-out Personalized
Search Search
Place of data Web History, linked to Google On Google’s servers, linked to
storage Account an anonymous browser cookie
Time interval Indefinitely or until remove it Up to 180 days
of data storage
Searches used Only signed-in search activity, Only signed-out search activity
to customize and only if user is signed up for
Web History
2.3. Research
The goal of this section is to evaluate the current level of personalization based on
several factors. For the task should be helpful this what Google [14] says about types
of results customizations:
When you use Google to search, we try to provide the best possible results. To do that,
we sometimes customize your search results based on one or more factors:
Search history: Sometimes, we customize your search results based on your past
search activity on Google, such as searches you’ve done or results you’ve clicked.
If you’re signed in to your Google Account and have Web History enabled, these
customizations are based on your Web History. If you’re signed in and don’t have
Web History enabled, no search history customizations will be made. (Using Web
History, you can control exactly what searches are stored and used to personalize
your results. Learn about using Web History)
If you aren’t signed in to a Google Account, your search results may be customized
based on past search information linked to your browser using a cookie. Because
many people might be searching on one computer, Google doesn’t show a list of
previous search activity on this computer. Learn how to turn off these customiza-
tions
Location: We try to use information about your location to customize your search
results if there’s a reason to believe it’ll be helpful (for example, if you search for
a restaurant chain, you may want to find the one near you). If you’re signed in
to your Google Account, that customization may rely on a default location that
you’ve previously specified (for example, in Google Maps). If you’re not signed
in, the results may be customized for an approximate location based on your IP
address.
If you’d like Google to use a different location, you can sign in to or create a
Google Account and provide a city or street address. Your specific location will be
used not only for customizing search results, but also to improve your experience
in Google Maps and other Google products.
20. Chapter 2. Personalized Search 16
2.3.1. Location
While you can search at google.com just about anywhere in the world, you can also
access Google at a number of different country specific addresses, such as google.co.uk,
www.google.fr, www.google.co.in. In fact, Google automatically redirects you on the
proper domain using your IP address and determining your geolocation, Browser setting
with recommended language was clear in this experiment.
This experiment was performed in one location in Poland. However, to simulate request
from the other locations, there was used a similar software environment like described
in section 5.6.1. The used phrased ”jaguar” is multi-lingual, so the language of the
phrase does not affect the search results.
The first query was sent through three Tor hosts, where the exit host was located in
Los Angeles, California, United States. The result of the query is presented in the figure
2.7 (only several first entries).
All website in this SERP are in English, which is the standing language at the described
location. Moreover, at the near bottom there are some places indicated on the Google
Maps, which are physically close to the location of the exit host.
The second query was sent via exit host located in Erfurt, Thuringen, Germany. The
figure 2.8 presents results of this query.
In Official Google Blog [13] is written, that the same query typed in multiple countries
may deserve completely different results. Presented results clearly shows that those
words are true.
Unfortunately author failed to check if a search for the query ”football” provides dif-
ferent results in the US, the UK, and Australia, because the term refers to completely
different sports in those countries. But it is rather possible. A preferred country might
include the country of the searcher as well as other countries that searcher might find
acceptable, such as showing search results from the United States to people located in
Canada.
2.3.2. Phrase language
It is rather clear, that language of the searched phrase is significant for the results.
Search engines, despite personalization, still use matching phrases to the content of
indexed pages as the major factor for evaluation of the search relevance. For this reason,
the phrases identical in the semantic meaning but in other languages are completely
different in general.
So serving search results with pages in English about birds would be senseless if the
user typed phrase ”Vogel” in search box, which means ”bird” in German.
21. Chapter 2. Personalized Search 17
Figure 2.7. Search results of the query sent via host located in USA
2.3.3. Search history
The most interesting factor which is said to have influence in personalization mechanism
in Google search engine is user’s search history. Figure 2.9 shows search results which
was slightly modified by re-ranking based on search history. On the fifth position, right
after two video thumbnails, there is a link to the website, which was visited 4 times
(exact number of visits is visible on the right side of the hyperlink) used by the author
of this this study to gather information.
These fluctuations appear only when user is signed into Google Account, otherwise
there is no access to the web history (figure 2.1). This modified search result was not
22. Chapter 2. Personalized Search 18
Figure 2.8. Search results of the query sent via host located in Germany
the author’s intentionally. The phrase ”personalized search” was not the object of the
experiment. However, this result shows, that search history affects future search results
on similar areas of information.
To compare modified results with the original (without impact of personalization),
there are two ways to disable results customization:
1. signing-out from Google Account
2. using ”View customization” which is available on the bottom of results screen
After using one of these options, we can check the original position of the visited website
in the ranking. In this particular case, the website holds 17th position in the results
with no customization. So after personalization re-rank, there was position increase by
12 places.
But the most important is the fact that this change shows visited website on the first
SERP of the search ranking. In most cases (more than 90% of searches) users does not
go beyond first page of the results. So such change in ranking causes huge increment
of the visitors via this phrase.
23. Chapter 2. Personalized Search 19
Figure 2.9. Search results personalized by user’s search history
Unfortunately author has failed in forcing search engine to re-rank search results in-
tentionally. So after this experiment, approximation of the re-rank algorithm is impos-
sible.
2.4. Spam Issue
There is a huge amount of value from getting to the top of the search results. Especially
considering competitive phrases related to the business. This is a marketing area with
millions of dollars in it, quite often. So spammers are highly motivated, because there
is a lot of money at stake. Unfortunately regular users, searching for valuable content
are the main victims of these practices.
One of the more interesting parts about implicit/explicit user feedback during search
personalization process is that it can be very effective in dealing with spam. The more
personalized the results, the less chance that spam will will appear in search ranking.
Because in most cases, spammy websites are clickable by users (which are tricked by
link with false information about target website), but after realize the real value of
those website, they quickly go away and do not come back to them.
24. Chapter 2. Personalized Search 20
Not only will this enable them to help limit spam through personalization, it would
also be a great source of query/click analysis for Google. It is worth to consider that
the click data across multiple users shows that a given entry in a query space rarely is
clicked, or shows a high bounce rate. Google might just use that signal as a dampening
factor for spam result.
25. Chapter 3
Impact of Personalized Search on
SEO
3.1. Metrics
For quite a long time, SEO workers were using position in search ranking for particular
phrases as the indicator of the web positioning. Increase of the website position has
been always a desirable consequence of the SEO actions.
After implementation of personalization mechanism, the issue is not so simple. Al-
though customizations in rankings are still not very influential (only one entry in whole
SERP), the highly visible benefits of the personalization suggest, that the impact will
be increasing. For this reason ranking position cannot be the major metrics of success
any longer. It is because position in ranking of particular website can be different for
every user. Especially for those of them, which are regular visitors of this website.
Of course, this indicator is still measurable, because position monitors1 are not person-
alization subject (they do not use cookies and Google Account). It can also give useful
information about position seen by users searching for concerned keywords for the first
time. But it is the increase of inbound traffic which has always been the main moti-
vation of SEO actions. So the major metrics of this actions should be closely related
to this motivation. For example, those are metrics for SEO in time of personalized
search:
Number of unique visitors. Higher value indicates good result of the advertising
campaign and gaining popularity among new customers.
Previous search queries. As an example: if the searcher has been recently searching
the term ‘diabetes’ and submits a query for ‘organic food’ the system attempts
to learn and presents additional results relating to organic foods that are helpful
in fighting diabetes.
1. Software which automates monitoring of a website’s position in search rankings for given phrases
26. Chapter 3. Impact of Personalized Search on SEO 22
Previously presented results. Results that have been presented to the end user
can be omitted in future results for a given period of time in exchange for other
potentially viable results.
User query selection. Past selected or preferred documents can be analysed and
similar documents or linking documents can be used to refine subsequent results.
Furthermore, certain documents types can be seen as preferred, in what would be
a combination of Universal Search concepts. Common websites that accessed can
also be tagged as preferred locations for further weighting.
Selection and bounce rates (and user activity on website). An editorial scor-
ing can be devised from the amount of time a user spends on a page, the amount
of scrolling activity, what has been printed, or even what has been saved or book-
marked. All can be used to further refine the ‘intent’ and ‘satisfaction’ with a
given result that has been accessed.
Advertising activity. The advertisements clicked on can also begin to add to a
clearer understanding of the end users preferences and interests.
User preferences. The end user can also provide specific information as to personal
interests or location specific ranking prominence. It could also include favourite
types of music or sports, inclusive of geo-graphic preferences such as a favourite
sport in a given city.
Historical user patterns. A persons surfing habits over a given period of time (e.g.
6 months) can also play a role in defining what is more likely to be of interest
to them in a given query result. More recent information (on above factors) is
likely to be weighted more than older historical performance metrics within a set
of results.
Past visited sites. Many of the above metrics, such as time spent and scrolling on a
given web page or historical patterns and preferred locations can also be collected
in a variety of ways (invasive or non-invasive). Cookies actually save resources for
the Search Engine, an added benefit.
The advices how to improve values of such metrics are presented in the next chap-
ter.
Higher position in rankings not always implicate more visitors. Moreover, there is no
significant difference for positions between 6 and 10. Very often the proper website
optimization of page’s title and description visible in a SERP is more important and
brings more visitors than higher position. Better website titles and meta-descriptions
would have an advantage as getting the user to engage with the SERP listing upon
initial presentation would be at a premium. Quality content as well would begin to take
on a more meaningful role than it has in the past, as bounce rates and user satisfaction
now starts to play into actual search results rankings.
27. Chapter 3. Impact of Personalized Search on SEO 23
3.1.1. Areas for Consideration
Author’s experience in commercial SEO, which is closely related to the topic of mar-
keting, is rather small. Thus this section is based on [16].
Demographics
It should be ensure to leverage any obvious demographics that may apply to your site.
If it is geographic, topical (sports, politics) or even a given age group, ensuring that this
is targeted effectively is important in that the ‘topical’ nature of personalized search
can group results prior to even ranking them. If the particular website is not clear in
each of these areas, it risks less weighting to tighter demographic starting document
sets. Even your off site activities (link building, Social Media Marketing etc.) should
be as tightly targeted as possible.
Relevance profile
Of particular interest is potential categorization in terms of topical relevance. Ensur-
ing that your site provides a strong relevance train would be particularly valuable.
Much like phrase based indexing and retrieval concepts, probabilities play a large role.
When refining results the search engine looks at related probable matches. Through
a concerned effort with on-site and off-site relevance strengthening, you increase the
odds of making it to a given set of results in a world of ‘flux’. It never hurts to review
the concepts surrounding Phrase Based Indexing and Retrieval as many of the related
patents addressed deriving concepts/topics from phrases.
One would also have to imagine that tightening up the relevance profile in your Social
Media Marketing efforts would also be beneficial to a tighter topical link profile. Fur-
thermore, many topically targeted visitors that enter a site may bookmark (or passive
collection) your site which ads to the organic search profile without ever being included
in a search result. As such, there are many exterior opportunities to be had beyond
the traditional off-site SEO.
Keyword Targeting
Building out from your core terms will be important as far as understanding search
behaviour. The long tail as we know it would be targeted towards potential query
refinements on a given subset of searcher types. Building out logical phrase extensions
and potential query refinements would be something to look at. Furthermore, with
changeable personalized ranks we would measure SEO success in actual traffic and
conversions which puts term targeting into a new light as far as nailing money terms
and having a cohesive plan that targets query refinement long-tail opportunities.
Quality Content
In considering the value of a website, user interaction becomes a consideration as far
as bounce rates, time spent on page and scrolling activities are concerned. Producing
28. Chapter 3. Impact of Personalized Search on SEO 24
compelling and resourceful content would be at a premium to best leverage these
tendencies of the system. If a searcher has selected and interacted with your site on
multiple occasions your site would be given weight in their personal rankings as well
as related topical and searcher types. The more effective a resource the greater the
ranking weight increase.
Search result conversion
Working with the page title, meta-description and snippets takes on a more important
role in your SEO efforts when adjusting for personalized search. I dare say using ana-
lytics and a form of split testing would be a great advantage as far as satisfying what
not only ranks, but converts.
Freshness
Another area which may be important is document freshness in that people could be
able to set default date ranges or the system could passively begin to see a pattern of
a user accessing more current content. Valuable website that has been ranking well for
a year that may no longer be getting all the traffic that is has been used to. It should
be looked at updating such pages with fresh information, or creating new related pages
and pass the flow via internal links. Depending on the nature of the content (searcher
group profile) more current content may be more popular over the larger data set and
thus newer content would be weighted more overall.
Site Usability
From a crawler or the end user perspective, having logical architecture and a quality
end user experience is also at a premium. If similar searcher types embark on similar
pathways and related actions (bookmark, print, navigate, and subscribe to RSS) then
this will give greater value to those target pages within that community of search types.
This also furthers the relevance profile.
Analytics
It can be noticed there is a strong need for the use of analytics in understanding
traffic flows, understanding common pathways, bottlenecks, the paths to conversion,
and much more. This data will be of immeasurable use in dealing with many of the
factors that can affect Personalized Web Positioning. This issue is closely connected
with psychology (particularly behavioural targeting).
29. Chapter 4
SEO Guide
This chapter presents a set of areas for consideration during a process of website opti-
mization. The prepared advices concern only areas which may have positive result in
gaining popularity of the website. They should be helpful for achieve higher position in
search rankings which should increase the number of visitors. They also should make the
website more attractive for users. This fact probably will decrease bounce rate which
which has negative impact on the website in personalization re-ranking process.
Concerned areas in this guide do not take into account any SEO techniques which are
connected with external actions. So those that require contact with other websites,
such as:
• linking (the acquisition of links), free or paid
• advertising
• presell pages1
Listed methods are closely connected with generating spam. Due to this, they reduce
the rate of quality content in the Internet. So the Internet surfers has no benefits from
them.
This chapter is based mainly on the information from [13], [15], [21] and [36].
1. Presell page – page created only for SEO purpose. Text on such page is only a surrounding for link
leading to positioned website. Content has no value for human reader because. It is only prepared
to look like natural for crawlers, not to be filtered as spam.
30. Chapter 4. SEO Guide 26
4.1. Website Presentation in Google Search
4.1.1. Title
Title is the first information about particular website in SERP. It is also one of the
main factors in with impact on the website ranking. An example of such title in html
code looks like this:
<title>Jaguars, Jaguar Pictures, Jaguar Facts -
National Geographic</title>
Such title presented in SERP looks like in the figure 4.1.
Figure 4.1. Presentation of a website title in Google Search
These are the issues connected with website title, which are significant in SEO:
Length up to about 65 characters
Longer titles can be also indexed by crawler, but title with 65 characters is rather
optimal and it entire fits in SERP. Longer titles are shortened with ellipsis.
Diversity of titles
Each of the website pages relates slightly different information (e.g. product page,
contact form etc.). The title should be prepared individually for each of them.
Keywords
There are 3 principles related to creating a title:
1. Keywords should be distributed on all the pages. Each of the pages must be
optimized for only 3–4 keywords. Front page title should have most general ex-
pression, titles of product pages should contain words characterizing the type of
these products etc. Sticking to this rule is very important, because in other case
the pages of the website could be treated by crawler as duplicated content.
2. The most important keywords should be place at the beginning of title.
31. Chapter 4. SEO Guide 27
3. Google can connect keywords from title into different phrases. But these which
appear one after another have the greatest impact on their position in ranking.
Due to this fact, key phrase should not be separated.
4.1.2. Description
Title is the second information about particular website, presented next after title in
SERP. Such description presented in SERP looks like in the figure 4.2.
Figure 4.2. Presentation of a website description in Google Search
Description presentation in SERP can be generated from following sources:
• description metatag, for example:
<meta name="description" content="Learn all you wanted to know
about jaguars with pictures, videos, photos, facts,
and news from National Geographic." />
• a fragment of the website content (in case the description metatag is too long or
there is no such one in the source code)
Here are some tips on the description page in metatag:
Length up to about 150 characters
Longer descriptions will not be presented in SERP as they have been written.
Diversity of descriptions
Just as titles, description of a particular page should be slightly different from the
other. It should be specific for the information presented in the page.
Keywords
Description should contain keywords concerned by the SEO strategy. When it does, the
keywords will be bold in the search results for query phrase based on such keywords. It
should call users’ attention on our website. However, the description should be prepared
in the way to encourage users to visit the website.
32. Chapter 4. SEO Guide 28
4.1.3. Sitelinks
Sitelinks are links leading to other pages of the same website. The can be presented in
SERPs in 2 ways:
1. Horizontally – 4 links in 1 row (presented in figure 4.3)
2. Vertically – 8 links in 2 columns
Figure 4.3. Presentation of a website sitelinks in Google Search
There is no manual way for publishers to force sitelinks presenting in SERP. It depends
on how the website was indexed. But it can be made easier for crawler to make it
correctly. There are two things which can be done:
1. First of all it must be well designed source code related to navigation on our
website. Its syntax must be very clear.
2. Prepare a sitemap of website (e.g. in XML format). This issue will be described
later.
4.2. Website Content
4.2.1. Unique content
The basis of the proper content optimization is its uniqueness. This means that the
same text or its larger fragments should not be reproduced on other websites or on
different pages of our website.
In order to verify the degree of uniqueness of our content, it can be used this tool:
http://www.copyscape.com
4.2.2. Keywords
It is very important make search engines able to relate our website to specific theme
and keywords. In order to make it possible, keywords must be considered not only
in website title and description design process. Keywords must be also contained in
website content.
In preparing the text for the website it suggested to stick following principles:
33. Chapter 4. SEO Guide 29
Repetition
Keywords should be repeated several times on every page. But it cannot be forgotten
that the text should be written primarily for users. The task is to find a compromise
between attractive text for users and good for SEO. Too high density of keywords on
particular page can be treated by search engine as an abuse. In such situation our
website will be penalize by ranking exclusion.
Variations and synonyms
The website content will be more natural, if contained keywords are used in many
variations (grammatical). The proficiency of modern search engines can also detect
using synonyms. For this reason we can use for example word ”drug” in the content
being optimized for ”medicine” keyword.
Location
Keywords should located on whole page with similar density. This will give a better
result in positioning than accumulation of keywords for example only at the beginning
of the page.
4.3. Source Code
Website’s source code has not direct influence on the position in search ranking. How-
ever, some errors can cause problem with proper indexing by search engine robots. For
this reason it is worth to ensure that the code contains no errors and it is compatible
with current WWW standards.
Very useful is the code validation tool, provided by the World Wide Web Consortium
(W3C). It can by found here:
http://validator.w3.org
4.3.1. Headers
HTML headers tags (h1–h6) are very significant for proper indexation of the website
content. Right usage of them is very important in desing of a website. There couple
issues which must be considered form SEO point of view.
Hierarchy
Headers tags are designed to separate particular sections of a document. They must
be used in the correct order and only when there is a need to use.
34. Chapter 4. SEO Guide 30
Repetition
Header of the first degree (h1 tag) by current HTML standards may occur only once
in whole document. Other headers can be used repeatedly.
Keywords
It is suggested to put keywords into header tags. It is because they have more ”posi-
tioning power” than a regular text. This power is probably respective to the headers
hierarchy, so the most important keywords should be placed in h1 tag.
4.3.2. Highlights
Keywords can be distinguished from the rest of text by using tags ¡strong¿ (bold)
and ¡em¿ (italics). In this way, keywords are highlighted either for users end crawlers.
Although it should be done with restraint. Not every occurrence of the keyword should
be highlighted but only the most important of them.
4.3.3. Alternative texts
Sometimes there are some images placed in the document. It is recommended to include
alternative texts to those images. It can be done in this way:
<img src="path/to/image" alt="alternative text" />
It is displayed on the screen in the case the browser can not display images (e.g. when
they are unavailable on the server).
These alternative texts are also interpreted by search engine robots. These data is then
used in search for images (when search engine has such option).
4.3.4. Layout
Well indexing website should have clear and minimalistic layout. The content is the
most important factor, so even ratio of text amount to html code is significant. The
higher this value is the better and more valuable website in the search engine point of
view
4.4. Internal Linking
Quite important issue in website optimization is internal linking. Internal link is the
hyperlink which leads to another page of the same website. There are some recommen-
dation connected with this link type.
35. Chapter 4. SEO Guide 31
4.4.1. Distribution
Each of the pages should be available in 3 or 4 clicks at most. If not, the website
navigation must be re-designed. Attention should be given especially to the links on
the main page. The structure of the website must be clear.
What is more, it is also very important for usability of the website. Complicated navi-
gation can discourage user to continue the visit.
4.4.2. Links anchors
Link anchor is the clickable text. It is displayed for the user on a website, instead of
plain URL which is rather unreadable for human. It looks like this:
<a href="some_url.html">Anchor text</a>
Anchors should describe the content of pages which their links lead to. If links are
located among the other text, it should match the context of whole text. For example,
it is not advised to write ”click here” like it was popular couple years ago.
4.4.3. Broken links
Very important thing in website positioning is to beware of links which lead to unavail-
able URLs. Such issue is very annoying and discouraging for visitors.
The website with broken links will be also less valuable for search engines, because
robots crawl the web using links. After indexing a page robot uses one of the links
placed on this page to go to another page. When such link is broken, crawler can
interrupt indexing process. It will cause the situation where not every page of the
website will be indexed.
4.4.4. Nofollow attribute
Nofollow is an HTML attribute value used to instruct some search engines that a
hyperlink should not influence the link target’s ranking in the search engine’s index.
This is example of such hyperlink:
<a href="some_url" rel="nofollow">Some website</a>
It is intended to reduce the effectiveness of certain types of search engine spam, thereby
improving the quality of search engine results and preventing indexing particular web-
site as spam. Nofollow attribute is used commonly in outbound links2 , for example in
paid advertising.
2. Links which target at other websites
36. Chapter 4. SEO Guide 32
4.4.5. Sitemap
A sitemap is a list of pages of a website accessible to crawlers or users. This helps
visitors and search engine bots find pages on the website.
Sitemap for users
It can be prepared a page, where will be placed links leading to all website’s pages or
only the most important ones. Thanks to this, users having problems with navigation
will be able to find quickly what they are looking for. The example of such sitemap
located in footer is presented in figure 4.4.
Figure 4.4. Example of sitemap for visitor
Sitemap for robots
Sitemap for crawlers must be easy to automatic processing. Such sitemap is mostly
being prepared in the XML document format. This is how example looks like:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.example.com/catalog?item=12</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>http://www.example.com/catalog?item=73</loc>
<lastmod>2004-12-23</lastmod>
37. Chapter 4. SEO Guide 33
<changefreq>weekly</changefreq>
</url>
<url>
<loc>http://www.example.com/catalog?item=74</loc>
<lastmod>2004-12-23T18:00:15+00:00</lastmod>
<priority>0.3</priority>
</url>
<url>
<loc>http://www.example.com/catalog?item=83</loc>
<lastmod>2004-11-23</lastmod>
</url>
</urlset>
As it can be noticed, such document contains some information about each link:
loc: URL to particular page
lastmod: time of last modification of the page
changefreq: average period time between changes in the page
priority: value of priority for crawler to index particular page
Such information are welcome by crawlers. It can profits to publisher with faster in-
dexing by crawler.
In most cases such documents are being prepared using software tools as like:
http://www.xml-sitemaps.com/
After when the sitemap document is prepared, the search engine must be notified about
its existence by special form.
4.5. Addresses and Redirects
Among previously described factors used in rank algorithm, search engines also consider
form of indexed website’s URLs and information included in HTTP Responses.
4.5.1. Friendly addresses
Search engines give higher rank value to those websites whom pages have URLs more
readable for human. For example, address like this:
http://www.example.com/index.php?page=product&num=5
can be written in this way:
http://www.example.com/product/5
Such effect can be achieved using mod rewrite. It is module to the Apache Server, which
allow to create regular expression patterns for mapping URLs to particular pages. Such
38. Chapter 4. SEO Guide 34
possibility have also modern web frameworks, such as Django Framework or Ruby on
Rails.
Moreover, such possibility gives another opportunity for placing keywords. Due to this
fact, page being optimized for particular keyword should have this keyword in its URL.
If it is phrase with couple words, it is suggested to separate them with dash.
4.5.2. Redirect 301
Redirect 301 is the constant redirect from one address to another. After using it:
• visitors writing into the address bar in browser the old address, will be redirected
into the new one
• some search engines will switch the old address in the database into new one
So it is very useful after domain change.
Earlier it author said, that content of the website should not be duplicated. It is often
forgotten, that allowing to entry a website through several addresses has the same
result. Sometimes the same website is downloaded via:
• example.com
• www.example.com
• example.com/index.html
• www.example.com/index.html
• example.com/default
• www.example.com/default
In such case it must be decided if the main address of the website will have the ”www”
prefix. If it will, it should be placed .htaccess file in the main folder of the server, with
such content:
RewriteCond %{HTTP_HOST} ^example.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
Similarly we can manage the redirection from the index.html file:
RewriteCond %{REQUEST_FILENAME} index.html
RewriteRule ^(.*)$ http://www.example.com [R=301,L]
There are also many other possibilities which can be managed likewise.
4.6. Other Issues
There is couple other things, which have some influence on the website ranking.
39. Chapter 4. SEO Guide 35
4.6.1. Information for robots
Sometimes publishers do not want robots to index some website’s pages, but keep them
still available for regular visitors. For example:
• results from internal search engines
• data sort results
• print version of pages
• pages which should not be indexed, like login page to administration panel
To manage this issue it can be prepared robots.txt file with the content like this:
User-agent: *
Disallow: /admin-panel/
It should prevent robots from indexing pages whom URLs start with www.example.com/admin-panel/.
4.6.2. Performance
One of the most recent factors introduced Google search engine algorithm is website
performance value. Google promotes websites with short time period of download pro-
cess. It is not as significant as internal linking or quality of content. Big information
portals are very complex, so they can not be downloaded as fast as e.g. small blog. But
valuable information is the most important factor.
However, good performance can increase rank value of the website compared to an
another one with similar content but not so efficient.
To improve website performance it can be used several tools, like PageSpeed by Google:
http://code.google.com/speed/page-speed
It provides the analysis of the website downloading efficiency and gives some tips on
how to improve the performance. Next, will be presented the most suggestion being
given by this software.
Gzip compression
Modern web browser allow to use gzip compression mechanism to reduce the size of
website files (images, CSS files, Javascript files). If there is such possibility, it is recom-
mended to use it.
Number of DNS lookups
DNS caching mechanism [23] causes there is no need to look for IP address matching
to particular domain several times. For this reason, when every file used by the website
is located on the same server (or another server in the same domain), there is only on
DNS lookup in during download process.
40. Chapter 4. SEO Guide 36
So it should be avoided placing media files (images, CSS, Javascript) on the different
domain without clear need.
External files
Information commonly located in the external files, like CSS Style Sheets or Javascript,
can be also located inside the HTML document. However it should be avoided, because
it causes parsing of source code process more complex. Thus the browser need more
time to display website on the screen.
4.7. Summary
Tips presented in this chapter should significantly increase the rank value of every
website. With the higher ranking in the search engine will be it will be more inbound
traffic. In other words, the website will gain more popularity.
Making the website more attractive to visitors should implicate better results of per-
sonalization re-ranking. The assumptions of the impact of personalized search results
on global ranking are very likely. So improving the quality of website on the basis of
presented guide should increase website’s ranking in general. Both through the per-
sonalization impact and collecting inbound links as the effect of increase in popular-
ity.
41. Chapter 5
The System for Global
Personalization
The goal of this chapter is to propose a method for improving the website’s search
ranking through affecting the personalization mechanism in search engine. The idea of
this method is to generate artificial behavioural data. The author of this thesis is the
co-author of the article [19], which this chapter is based on.
In section 2.2 of this thesis it has been shown that a lot of data goes into Google
and a lot of useful manipulated data comes out. But we can only guess what happens
in between or try to learn from the observation of the data coming out of Google.
Evans wrote [10] that identifying the factors involved in a search engine ranking algo-
rithm is extremely difficult without a large dataset of millions of SERPs and extremely
sophisticated data-mining techniques.
That is why, only an observation, experience and common sense are the main source
of knowledge on Search Engine Optimization (SEO) methods. It was according to this
knowledge that Search Engine Ranking Factors [11] was created. The last edition of it
assumes that traffic generated by the visitors of a website has 7% of importance in the
Google’s evaluation of the website value. It is, after links to the specific website and its
content value, the most significant factor in website evaluation process. On the basis of
the previous editions of the ranking, one can notice that the importance of this factor
is increasing.
Because these all are only reasonable assumptions, the intention is to evaluate the
validity level of the described factor in web positioning efforts. For this purpose we
need a simulation tool which will generate necessary human-like traffic on a tested
website. The tool is going to be a multi-agent system (MAS) which will imitate real
visitors of the websites.
5.1. Problems to Solve
Fig. 5.1 presents the main reason why the system must be distributed. A few queries
to Google, have been sent frequently one after another from the same IP address, are
42. Chapter 5. The System for Global Personalization 38
Figure 5.1. Information displayed by Google on the abuse detection
detected by Google and treated as abuse. Google suspects an automated activity and
requires completion of the captcha form in order to continue searching. In case of using
the distributed system, the queries would be sent from many different IP addresses. It
should guarantee, that Google will not consider this issue abusing. This issue cannot
be solved by using a set of public proxy servers. Google has probably put them into
their black list. Every single query to Google via such proxy server leads to the same
end – captcha request.
What is more, after Tuzhilin [35] we can say that Google puts a lot of reasonable
effort into invalid clicks on advertisement filtering. There is a big chance, that some of
those mechanism Google uses in the analysis of the web traffic. This is the reason why
generating behavioural data should be our concern. Recognized artificial web traffic
could be treated by Google as an abuse and cause being punished (decline of website
position).
5.2. Objectives
5.2.1. Web Positioning
The main goal of the system is to improve the position of a website by generating traffic
related to the website. The only activity which can be visible for Google the system
should care of. It shows that there is no need to download all content from particular
website. It would only waste the bandwidth. The system should only send to Google
services requests used by a particular website, for example:
• links to the website on SERPs,
• Google Analytics scripts,
• Google Public DNS queries,
43. Chapter 5. The System for Global Personalization 39
• Google media embedded on the website like AdSense advertisement, maps, YouTube
videos, calendars etc.
5.2.2. Cooperation
The whole idea of the system is to spread positioning traffic into world wide IP ad-
dresses. As a result of this distributed character, the system require a large group of
cooperating users. Nobody will use the system if there is no benefits to him. A mech-
anism which will let the system users to share their Internet connections in order to
help themselves in web positioning must be introduced. What is more, the mechanism
must treat all users equally-fairly. It means it should not allow to take benefits without
any contribution.
5.2.3. Control
According to [36], web positioning is not a single action, but a process. This process
should be able to be controlled. Otherwise it could be destructive, instead of improving
the website position. For this reason, the system should allow users to:
• control the impact of the system activity on their websites,
• check the current results of the system activity (changes in the website position
on SERPs),
• check the current state of the website in the web positioning process.
5.3. Architecture
Fig. 5.2 presents the architecture of the system which take under consideration all
specified problems and objectives. Server is necessary to control the whole process of
generating the web traffic by specified algorithm. It gives the orders for clients to start
generating traffic on the specified websites. It also gets the information from clients
with amount of requests sent to particular Google’s services on the website account.
Database serves as storage for process statistics. They can be presented to clients via
web interface. They are also useful to server for creating the orders in accordance with
the algorithm.
Clients are the agents of the presented MAS. They take orders from the server with
particular webiste registered in the database to be processed. Processing the website
is to mimic its real visitor. Client performs this autonomously using the visitor session
algorithm described in the next section.
44. Chapter 5. The System for Global Personalization 40
Clients Cloud
Website 1
Client 1
Google Website 2 All Google
Search Services
Client 2
Server
Database Website 3
Client 3
Figure 5.2. System architecture
5.4. Visitor Session Algorithm
According to many research [1], [6], [7], [10], [20] and [36], more than half of a web-
site’s visits comes from the SERPs. That is why starting the single visitor session (the
sequence of requests considering single website registered in the system) with querying
Google Search sounds reasonably. However, only if currently considered website appears
on one of the first few SERPs. Otherwise, visitor session should be started directly on
the processed website or should refer to an incoming link, but it must be existing, if
there is such one. Because of the likely Google’s actions in order to detect abuses, the
visitor session should be possibly human-like. The analysis of real users web traffic [29]
is very useful at this moment. According to it, a typical user:
• visits about 22 pages in 5 websites in one sitting,
• follows 5 links before jump to a new website,
• spends about 2 hours per session and 5 minutes per page.
These statistics clearly indicate that typical visit session concerns a website of good
quality. Visit on a poor website would be aborted as soon as after few seconds. Such
visit could have a negative impact on the website quality evaluation by Google.
1 – Server retrieves from the database information about the next website to be pro-
cessed.
2 – Task assignation to the client.
3 – Client starts the visitor session.
45. Chapter 5. The System for Global Personalization 41
Database 6
1 4
Visit session
2 3 5
Google Website All Google
Search Services
Client
Server
7
Figure 5.3. Visitor session algorithm
4 – Searching on SERPs for a link to the processed website.
5 – If a link has been found – click on the link, otherwise direct request.
6 – Processing the visit session.
7 – Request to the server for another website to process.
5.5. Task Assignation Algorithm
Task assignation algorithm helps server to build a queue of registered websites ordered
by the visitor session priority. The website with the highest value of the priority is the
next one to start visitor session. In other words, client always receive the website with
the highest priority value to process.
The priority value P V is calculated using the function:
v(α)
P V (α) = r(α) · t(α) · (5.1)
T (α)
where
α — record in the system (website with phrase for web positioning)
r(α) — returns current position in the search engine ranking for the α (returns
0 if there is no α in the ranking)
t(α) — returns time since the end of the last visitor session on the α (in
seconds)
v(α) — returns number of visitor sessions made by α owner’s client
T (α) — returns time since the registration of α in the system (in days)
46. Chapter 5. The System for Global Personalization 42
Presented function gives the highest ”power” to the ranking factor. The reason for this
is that websites with high ranking value should have more real visitors, so the system
efforts will not be so crucial for its popularity.
Time of participation in the system is not very significant. Novice participants have
equal chance to gain attention for their websites as the senior ones. However, function
promotes continuous activity of the clients.
Worth to consider is also the possibility to dynamically modify weights of individual
factors depending on the results. Because of the fact that websites queue is built by
the server, it is possible to change whole function during the system activity.
5.6. Proof Study
Presented system requires a large number of users to work properly. In other case, the
generated traffic would not be distributed enough, thus would look unnaturally. As it
was shown, centralized series of queries are being seen as abuse. Unfortunately, thesis
author’s resources have been insufficient for this purpose. However, a simulation has
been performed, which had to prove proposed concept.
5.6.1. Tools
The idea was to use Tor application (http://www.torproject.org) to make the single
host (the author’s computer) generate distributed traffic. In such way, the behavioural
data of one real user could be seen by search engine as multi-user traffic.
Tor is a free software enabling Internet anonymity by thwarting network traffic analysis.
Tor aims to conceal its users’ identity and their network activity from traffic analysis.
Operators of the system operate an overlay network of onion routers which provides
anonymity in network location as well as anonymous hidden services.
Users of a Tor network run an onion proxy on their machine. The Tor software peri-
odically negotiates a virtual circuit through the Tor network. Application like browser
may be pointed at Tor, which then multiplexes the traffic through a Tor virtual circuit.
Once inside a Tor network, the encrypted traffic is sent from one host to another,
ultimately reaching an exit node at which point the decrypted packet is available and
is forwarded on to its original destination. Viewed from the destination, the source of
the traffic appears to be at the Tor exit node.
As the figure 5.4 shows, Tor has became quite popular, so its network involves large
number of users. This makes Tor fit to the objective in this study. Mozilla Firefox
browser have been used, connected with the Tor. Additionally has been also installed
iMacros plug-in in order to automate executing of visitor sessions.
For analysis of the behavioural data being received by Google during the study, Google
Analytics (shown in figure 2.2) software has been used. It was installed on every ex-
amined website.
47. Chapter 5. The System for Global Personalization 43
Figure 5.4. Tor interface screen
5.6.2. Results
Distributing traffic issue ended with success. After opening the Google Search main
page (www.google.com), server redirects to the domain belonging to the country which
the Tor exit node of particular session was located in. For example, when exit node was
in Germany, Google server redirected browser from google.com to google.de address.
There was displayed Google Search page in the appropriate language, in spite the fact,
that browser’s setting with default language was removed. After visit on examined
pages, Google Analytics also indicated, that the source of visits was not in Poland
(where the study was actually conducted) but in the countries of exit nodes of the
traffic.
However, routing the traffic through distributed Tor network appeared to be insufficient
solution. Firstly, the traffic routed by the Tor is significantly slowed down. From time
to time there were even difficulties with download complete search engine site. What
is more, despite the large number of visit sources, there are still only one browser and
one real user. Because of this, this simulation could only imitate one singed-in user or
a group of singed-out.
48. Chapter 5. The System for Global Personalization 44
Signed-in user
In the first case there is essentially no difference between visiting through Tor proxy
network or directly. Like it was described, Tor is the tool to concealing the identity of
a user. But after sign-in into Google Account, the identity is evident. From the Google
point of view, such visit is seen as regular user travelling very quickly all around the
world (metaphorically speaking). But it is still only one user, and applied personalized
search results to him should not be globally significant.
Group of signed-out users
As it was described earlier, Google introduced personalization mechanism not only for
users with Google Account. There is also personalization in search results for users
with no such profile account. It is based on storing cookies in user’s browser up to
180 days, which contain information about past search activities. But cookies are not
related to specific user, but to a browser. In this case, search results are re-ranked not
for the person but rather for the particular computer which this person uses.
This simulating system uses only one browser, so there was no possibility to evaluate the
impact of personalization re-ranking on search ranking in the global perspective.
Disabling storing of cookies option in the browser makes personalization impossible to
act, because there is no way to relate past queries in search engine to particular user.
Moreover, browser with blocked cookies is rather rare situation nowadays. Therefore
Google search engine is rather suspicious about traffic with blocked cookies and for
such requests they serve ”Sorry page” (figure 5.1).
5.7. Summary
Generating artificial traffic on the Internet seems to be not very praiseworthy as it
is dangerously close to spam appearance and causes the information noise into vis-
itors statistics. On the other hand, this is not worse than other SEO activities like
linkbaiting.
After [6], today’s search engines use mainly link-popularity metrics to measure the
”quality” of a page. It is the main idea of the PageRank algorithm [3]. This fact causes
the ”rich-get-richer” phenomenon. More popular websites appear higher on SERPs,
which brings them more popularity.
Unfortunately, it is not very beneficial for the new, unknown pages which have not
gained popularity yet. There is a possibility, that these websites contain more valuable
information than the popular ones. Despite this fact, they are ignored by search engines
because of small amount of links. These sites, in particular, need SEO efforts. Probably
classic techniques will be more effective than the one presented in this paper.
Nevertheless, the methods presented in this article are likely to improve the rate of
web positioning, because web traffic can be noticed by search engines immediately. It