Scalable Event Tracking


Published on

Scalable event tracking. How to be able to track user actions without slowing down the core application.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • An event is a happening inside the SPiD core.
    Ie: Signup, Login, Logout, Verify email, Purchase, etc.
    Events are triggered to be able to
    Get insight into the user behavior.
    Measure conversion of our processes.
    We need events to be able to improve our software.
  • Events are sent from the SPiD Core to our UDP logger.
    The UDP logger inserts all events into a SQS queue.
  • DataPiper retrieves events from the main SQS queue.
    Events are filtered and inserted into Redshift, Mixpanel and other SQS queues.
  • The UDP logger is written in Node.js for performance.
    We have an admin interface that monitors the server to be able to detect issues in an early stage.
    Why UDP?
    We use UDP to prevent latency inside the SPiD Core. This way we can send as many events as we like without having to think about latency.
    Package loss is no issue as long as the UDP server is located inside the same network as the core application.
  • The DataPiper is also written in Node.js.
    We use Amazon CloudWatch to monitor the DataPiper performance.
    Incoming messages, messages in queue, messages in flight, etc.
    Based on these number we can fine tune the DataPiper.
    DataPiper flow:
    Retrieve data from the main SQS queue
    Filter data.
    Insert data into Mixpanel, Redshift or another SQS queue.
  • Deployment of our servers on the Amazon cloud platform. How do we do it?
    What do we need to think about?
    We need to make sure our queues never pile up.
    Our queues needs to be up at all times. Luckily Amazon provides that with SQS.
    Our DataPiper needs to up all the time to keep our data flowing.
  • Auto Scaling provides the key to this solution.
    Auto Scaling:
    Makes sure we always have the desired amount of servers running.
    Makes it possible to scale when traffic increases and then down scale afterwards.
  • An Auto Scaling is actually an Auto Scaling Group:
    It provides the desired amount of EC2 instances based on your scaling policy.
    An Auto Scaling Group is tied to a Launch Configuration:
    AMI type : Type of predefined server image.
    Instance type : Hardware type.
    Storage : Storage size and type.
    Security group : Firewalls around this group of servers.
    User data : bash script run at launch, used to automate installation.
  • When the Auto Scaling Group fires up a new server it’s done like this:
    AMI is booted with the desired instance type, storage and security group.
    User data script installs:
    S3cmd config from public S3 bucket.
    S3cmd tools.
    Puppet via npm.
  • When the first step is done and you are able to connect to the private S3 bucket:
    User data script then downloads:
    The standalone puppet config (node less) from private S3 bucket.
    Then it executes the Puppet client (node less, no master server needed):
    Installing required packages (node.js, ppm, etc)
    Preparing software install
    Creating dirs and setting ownership
    Installing DataPiper
    Software, config, upstart, logrotate
    Starting DataPiper service.
  • No ssh login.
    No manual labor.
    All is automated - Look, no hands :)
  • How do we deploy new versions of our software?
    Software deployment can be a tedious process.
    We’re working hard to simplify this and minimize the risk of down time due to deployment.
  • This is how it’s done:
    The deployment master prepares
    A new release of our software.
    A new config file.
    All is uploaded to our private S3 bucket.
    Before proceeding please wait a few minutes and enjoy a good cup of coffee. It can be a replication delay inside the S3 platform.
  • Start the deployment of new instances:
    Number of desired instances are increased by the number of new instances you want to deploy with the new software version.
    One and one is good to be sure everything works smoothly.
  • Auto Scaling fires up new instances with our new software and config files. This usually takes a couple of minutes.
  • When these new instances are up, then you decrease the numbers of desired instances back to the original number.
    Auto scaling will destroy the old instances and you’re good to go with your new version.
  • - How do we deploy new versions of out software?
  • Scalable Event Tracking

    1. 1. SCALABLE EVENT TRACKING by Ø istein Sø rensen - Schibsted Payment
    2. 2. WHAT IS AN EVENT?
    3. 3. EVENTS UDP UDP Logger Logger SPiD Core SPiD Core Events Events file_logger file_logger aws_sqs aws_sqs Amazon SQS
    4. 4. EVENTS Mixpanel Mixpanel EC2 DataPiper Amazon SQS Amazon SQS Auto Scaling Redshift
    5. 5. UDP LOGGER
    6. 6. DATAPIPER
    8. 8. EC2 DEPLOYMENT Auto Scaling EC2 instances
    9. 9. EC2 DEPLOYMENT EC2 Ubuntu 12.04 LTS m1.medium Auto Scaling $$bash < User Data bash < User Data Auto Scaling Group Auto Scaling Group Launch Config
    10. 10. EC2 DEPLOYMENT Public S3 Bucket • EC2 Ubuntu 12.04 LTS m1.medium Puppet S3cmd S3cmd config $$bash < User Data bash < User Data
    11. 11. EC2 DEPLOYMENT Private S3 Bucket • • • EC2 Ubuntu 12.04 LTS m1.medium Node.js npm modules Puppet config DataPiper Upstart and logrotate configs $$bash < User Data bash < User Data
    12. 12. EC2 DEPLOYMENT EC2 Ubuntu 12.04 LTS m1.medium DataPiper mixpanel mixpanel redshift redshift SQS SQS
    14. 14. SOFTWARE DEPLOYMENT Upload Private S3 Bucket
    15. 15. SOFTWARE DEPLOYMENT 2 Auto Scaling
    16. 16. SOFTWARE DEPLOYMENT Auto Scaling EC2 Ubuntu 12.04 LTS m1.medium DataPiper mixpanel mixpanel redshift redshift
    17. 17. SOFTWARE DEPLOYMENT 1 Auto Scaling
    18. 18. SOFTWARE DEPLOYMENT EC2 Ubuntu 12.04 LTS m1.medium DataPiper mixpanel mixpanel redshift redshift
    19. 19. QUESTIONS?