A real-time Web Analytics System

                           Mahesh Patwardhan
             Digital and New Media Consulta...
Contents
1.   Introduction
2.   The Requirements
3.   The Architecture
4.   The Reports
5.   The Implementation
6.   Concl...
Introduction
   This document describes an implementation of a realtime
    web logs capture and reporting system.

   T...
Requirements
        ◦ Shortcomings of existing system
           The existing system generated reports on the previous d...
…Requirements

◦ The system was required to capture, collate, and aggregate the web-logs
  which accumulate on the web-app...
Architecture
…Architecture
    ◦ The architecture has four layers
         Collation clients (L1),
         Collation servers (L2),
 ...
…Architecture
   Each collation client in L1 will connect to one Collation server in L2.
    ◦ A maximum of 30 Collation ...
Reports

◦   Hits by time
◦   Page Views by time, by pages
◦   Visits by time, by page
◦   Unique visitor by time, by page...
…Reports

◦   Search engines
◦   Search engine keywords
◦    By search engine by keyword
◦   Browser type, version, OS
◦  ...
Implementation
   The implementation of the solution was done on
    an incremental basis. Deliverables were planned
    ...
…Implementation
   Incremental cycle 2
    ◦   Visits by time, by page
    ◦   Unique visitor by time, by page
    ◦   Re...
…Implementation

◦ Incremental cycle 4
     Country, city, state wise reports
     By country top pages
     By ISP
  ...
Conclusion

◦ This document describes an implementation of a realtime web logs capture and
  reporting system.

◦ This sys...
Upcoming SlideShare
Loading in...5
×

A Real Time Web Analytics System

1,982

Published on

Implementation of a realtime web logs capture and reporting system that was developed to provide realtime reports for measuring traffic parameters like pageviews, visits, unique visitors etc. in realtime.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,982
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "A Real Time Web Analytics System"

  1. 1. A real-time Web Analytics System Mahesh Patwardhan Digital and New Media Consultant
  2. 2. Contents 1. Introduction 2. The Requirements 3. The Architecture 4. The Reports 5. The Implementation 6. Conclusion
  3. 3. Introduction  This document describes an implementation of a realtime web logs capture and reporting system.  This system was developed to provide realtime reports for measuring traffic parameters like pageviews, visits, unique visitors etc. in realtime.  The system was designed and built to replace the batch process system which generated reports in a deferred mode  Was built to allow for realtime monitoring and action on the various online services.
  4. 4. Requirements ◦ Shortcomings of existing system  The existing system generated reports on the previous day’s logs and not real time,  the system could not be scaled up,  was not equipped to handle heavy traffic,  had no scope for adding new services  there was no scope for adding or editing logs. ◦ Requirements of the new system was to provide for  Real time web log capture from web servers at geographically dispersed locations  Building a robust web logs data warehouse  Provide extensive realtime reports from the web logs ◦ The advantages of this system would be:  Can access data in “real time”  The process can be scaled up to handle more traffic  Provision has been made to add a new service or delete an existing service, which can be accessed from the very next day  Logs can be added and modified  .
  5. 5. …Requirements ◦ The system was required to capture, collate, and aggregate the web-logs which accumulate on the web-app servers. ◦ The aggregates need to be produced in near-real time. ◦ A multi-layer architecture needed to be deployed  a layer of capture agents deployed on every web-app server  a layer of collation server applications which collate data from the capture agents  a layer of computation servers which aggregate data at high speed, needs to be implemented. ◦ This multi-layer architecture would aggregate data in industry-standard RDBMS tables, which could then be queried for viewing using user interface screens. ◦ The aggregate tables were to be updated in near-real-time
  6. 6. Architecture
  7. 7. …Architecture ◦ The architecture has four layers  Collation clients (L1),  Collation servers (L2),  Computation servers (L3),  Reporting server (L4)  A database server to store the aggregated results. ◦ By design the architecture is completely scalable in the first three layers L1, L2, L3. ◦ All the layers communicate with each other over TCP/IP. 
  8. 8. …Architecture  Each collation client in L1 will connect to one Collation server in L2. ◦ A maximum of 30 Collation clients can connect to one Collation server. ◦ Primary back-up fail-over features will be provided (If one of the collation server fails, clients connecting to that will automatically shift to other servers in the cluster).  The computation is distributed to the computation servers (L3) by service. ◦ Computation required for a service will be handled by its Computation server. ◦ Primary back fail-over is not possible in this layer. ◦ If required the architecture will allow distribution of computing by service. (for example there can be two servers performing computations for a service like e-mail).  The computed information (aggregated) is stored in a database, which is used by the L4 (Reporting) layer.
  9. 9. Reports ◦ Hits by time ◦ Page Views by time, by pages ◦ Visits by time, by page ◦ Unique visitor by time, by page ◦ Return frequency ◦ Return visit ◦ Visiting frequency by visitor ◦ Average time spent ◦ By page average time spent ◦ Referrer by domains, URL
  10. 10. …Reports ◦ Search engines ◦ Search engine keywords ◦ By search engine by keyword ◦ Browser type, version, OS ◦ Parameter analysis ◦ Country, city, state wise reports ◦ By country top pages ◦ By ISP ◦ Top entry pages ◦ Top exit pages ◦ Path reporting (across service) ◦ Directory filter based reporting ◦ Fall-out reports
  11. 11. Implementation  The implementation of the solution was done on an incremental basis. Deliverables were planned for each increment based on the requirement specified. There were five development cycles, the details of which are as specified  Incremental cycle 1 ◦ Setting up the framework for real-time log capture ◦ Health monitoring system ◦ Hits by time ◦ Page Views by time, by pages
  12. 12. …Implementation  Incremental cycle 2 ◦ Visits by time, by page ◦ Unique visitor by time, by page ◦ Return frequency ◦ Return visit ◦ Visiting frequency by visitor ◦ Average time spent ◦ By page average time spent  Incremental cycle 3 ◦ Referrer by domains, URL ◦ Search engines ◦ Search engine keywords ◦ By search engine by keyword ◦ Browser type, version, OS ◦ Parameter analysis
  13. 13. …Implementation ◦ Incremental cycle 4  Country, city, state wise reports  By country top pages  By ISP  Top entry pages  Top exit pages  Path reporting (across service) ◦ Incremental cycle 5  Directory filter based reporting  Fall-out reports ◦ The deliverables in each phase required elements of each layer to be developed, implemented, tested and deployed. For instance, a few database tables of the final aggregate table schema were needed to be designed from the first cycle itself along with the corresponding reports.
  14. 14. Conclusion ◦ This document describes an implementation of a realtime web logs capture and reporting system. ◦ This system was developed to provide realtime reports for measuring traffic parameters like pageviews, visits, unique visitors etc. in realtime. ◦ The system was designed and built to replace the batch process system which generated reports in a deferred mode and did not allow for realtime monitoring and action on the various online services. ◦ The architecture of the system consists of four layers - the Collation client agent, the Collation layer ,the Computation layer and the Reporting layer ◦ This system has overcome the shortcomings of the existing system which was not scalable and provided reports in a deferred mode. ◦ This was overcome by the present system which has a highly scalable architecture and provides reports in real time.

×