How we deployed Piwik web analytics system to handle a huge amount of unpredicted traffic, adding some cloud and modern scalability techniques. files:https://github.com/lorieri/piwik-presentation
1. Extending Piwik at r7.com Phase 1 – Collecting data Adding some cloud and modern scalability to a traditional LAMP stack leonardo lorieri, r7.com system architect, 'lorieri at gmail.com', feb/2012
2. Why Piwik ? - Open Source = flexible, understandable, free! - Great interface - Mobile app - REST API - Developers knows the market needs - Efficient in small machines - Lots of possible improvements - Lots of improvements already in the roadmap - Great and supportive community (Thank you all!)
3. Our Plan, goals and trade-offs - Don't change original code - reduces development and maintenance costs - Count only visits and page views - to be fast and focused ( even though you still can use the .js tracker, it is easy to get lost in the UI's beauty and all its functionalities) - Handle odd unexpected traffic peaks - from tv announcements - Count not only websites - media delivery, internal searches, debugs - At least 99% of accuracy - Have numbers to compare with other analytics tools - We've lost P3P for now
4.
5. Regular Piwik Setup based on Rodrigo Campos presentation http://www.slideshare.net/xinu/capacity-planning-for-linux-systes - Apache/Nginx - Php - Mysql
6. Bigger Piwik Setup based on Rodrigo Campos presentation http://www.slideshare.net/xinu/capacity-planning-for-linux-systes - Apache/Nginx - Php - MySql
7. Regular Php Scaling Piwik Setup based on Rodrigo Campos presentation http://www.slideshare.net/xinu/capacity-planning-for-linux-systes - Apache/Nginx - Php - MySql Replication (slave for backup only, piwik is not "slave ready") Load balancer/Nginx
8.
9. Asynchronous Piwik Setup based on Rodrigo Campos presentation http://www.slideshare.net/xinu/capacity-planning-for-linux-systes - Nginx - NOT even Php - MySql Master - Apache+Php for Admin UI - Archive cron Load balancer/Nginx - MySql Slave - Perl/Python worker to process logs (manages user cookies) (user cookie) - accesses logs Visits REST API <img src=> request Admin/ Reports
10.
11.
12.
13.
14.
15.
16.
17. Our setup diagram Visits ELB Elastic Load Balancer nginx autoscaling pool S3 bucket SNS Notifications SQS queues Other workers/processors for other projects worker BigAss MySql mysql connection mysql slave, apache, piwik api, python-boto, python-twisted mysql master, piwik Piwik Users one file per virtualhost per machine, for each 5 minutes one notification per s3 file Datacenter