Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Ingesting click data for analytics
Francesco Furiani, CTO @
$ whoami
Francesco Furiani (@ilfurio):
 Backend Engineer
 Roamed these halls not too long ago
Ingesting clicks data for ...
Ingesting clicks data for analytics
ClickMeter
 100k+ customers
 Getting events for customers from 10 to 3000 req/sec
Ingesting clicks data for analytics
ClickMeter
We receive data anytime someone:
 Clicks our links
 Views our pixels
 Calls our postbacks
Our customers use us:
 Insid...
We need to:
 Try not to lose the events we receive (duh)
 Show customers data for better insight on their campaigns
 Sc...
Find the size of the problem you’re trying to solve
 How much data do you expect? Rate?
 What do you have to do with it?...
Once we know how big and bad the beast is, we
need to design the ranch that will keep it in check.
Iterative process and p...
… draw too much ...
Ingesting clicks data for analytics
Design
Most of us will never have the joy (and the horror) of
creating a new stack, novel in theory and practice.
Still we need t...
A very important brick.
Elasticity of computation power, many *aaS, managed solutions are
really a great help in terms of ...
… well it’s never definitive ...
Ingesting clicks data for analytics
Design with bricks
Obviously we haven’t followed those guidelines.
One becomes savvy after crashing and burning
many times.
But still thanks ...
ClickMeter was already live and growing
It needed an overhaul in its infrastructure/backend.
The growth fueled the need to...
Already on the cloud (AWS), we thought of having a hybrid approach but it didn’t
make sense.
Review of old components alre...
Ingesting clicks data for analytics
Pretty important, they need to:
• Stay up
• Scale up/down depending on the incoming traffic
• Never lose anything
• Be as ...
Pipeline
Most of this part uses our cloud provider
technology.
This simplifies maintenance and provisioning,
keeping the f...
SQS Pipeline
Kinesis
• Events • Preprocessing
• Postprocessing
• DynamoDB
Ingesting clicks data for analytics
Tracking eng...
Combination of real-time and batch
technologies.
One of the scaling parts that actually provides
value to the customers.
C...
Ingesting clicks data for analytics
Pipeline
We employ different storage based on speed of delivery and data type.
All the data is accessible via a REST API.
This perm...
Managed services on the cloud help us a lot!
Most of the team can focus on improvements
and shipping (users are happy, so ...
Cloud is typically more expensive of your own metal.
This extra money you have to spend is actually well spent:
• Flexibil...
Creating and managing a “big data” ready infrastructure is no easy task,
but it can be done step-by-step also by startups....
Thank You
Any questions?
@il_furio
francesco@clickmeter.com
Upcoming SlideShare
Loading in …5
×

Ingesting Click Data for Analytics

789 views

Published on

The challenges of every day life as the CTO of ClickMeter. Crafting and managing a "big data" ready infrastructure is no easy task, but it can be done step-by-step also by startups. The cloud is a cool starting ground which provides you with many of the toys you'll need, so you can focus on what part of "big data" provides you with the most value.

Published in: Software
  • Get Paid To Manage Facebook Fan Pages! Facebook Fan Page Workers Required - Start Immediately. ◆◆◆ http://t.cn/AieXiXbg
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Get paid to post comments on Facebook - $25 per hour ●●● http://ishbv.com/socialpaid/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Ingesting Click Data for Analytics

  1. 1. Ingesting click data for analytics Francesco Furiani, CTO @
  2. 2. $ whoami Francesco Furiani (@ilfurio):  Backend Engineer  Roamed these halls not too long ago Ingesting clicks data for analytics Loves:  Studying new CS stuff  PlayStation / Bike / Traveling / Soccer  O RLY? books How do I make a living:  CTO @ ClickMeter  Backend Engineer @ ClickMeter  Enum.take_random(IT_ROLES,1) @ ClickMeter
  3. 3. Ingesting clicks data for analytics ClickMeter
  4. 4.  100k+ customers  Getting events for customers from 10 to 3000 req/sec Ingesting clicks data for analytics ClickMeter
  5. 5. We receive data anytime someone:  Clicks our links  Views our pixels  Calls our postbacks Our customers use us:  Inside a famous app the day of the big release ✔  Advertising on an extremely big video portal ✔  A tiny travel blog ✔  A physical device for advertising ✔ Ingesting clicks data for analytics Getting the data
  6. 6. We need to:  Try not to lose the events we receive (duh)  Show customers data for better insight on their campaigns  Scale up/down according to the incoming fluxes  Improve the product by using the data we get  Do it as fast as possible (wasn’t this ready a week ago?)  Do it as cheap as possible Ingesting clicks data for analytics The challenge
  7. 7. Find the size of the problem you’re trying to solve  How much data do you expect? Rate?  What do you have to do with it?  Do I have to do something with ALL of it?  How fast do I have to do it? Answers to these questions are a starting point. Ingesting clicks data for analytics Size
  8. 8. Once we know how big and bad the beast is, we need to design the ranch that will keep it in check. Iterative process and prone to a lot of failures, but the world is out there to help us. Think, write and draw a lot. Ingesting clicks data for analytics Design
  9. 9. … draw too much ... Ingesting clicks data for analytics Design
  10. 10. Most of us will never have the joy (and the horror) of creating a new stack, novel in theory and practice. Still we need to understand the theory behind every brick. Read the info, read the opinions, try little proof of concept of the moving parts, it helps a lot! Ingesting clicks data for analytics Which bricks should I use
  11. 11. A very important brick. Elasticity of computation power, many *aaS, managed solutions are really a great help in terms of saved manpower and fast iterations. It comes at a great cost to consider: • $$$ (ymmv) • Possible lock-ins Ingesting clicks data for analytics The cloud is a brick too
  12. 12. … well it’s never definitive ... Ingesting clicks data for analytics Design with bricks
  13. 13. Obviously we haven’t followed those guidelines. One becomes savvy after crashing and burning many times. But still thanks to those errors we got there and built, at every iteration, a better infrastructure. Ingesting clicks data for analytics How we did it
  14. 14. ClickMeter was already live and growing It needed an overhaul in its infrastructure/backend. The growth fueled the need to be ready for more power to handle more data. Obviously this had to be a tablecloth trick migration  Ingesting clicks data for analytics How we did it
  15. 15. Already on the cloud (AWS), we thought of having a hybrid approach but it didn’t make sense. Review of old components already in production to see what to kill, keep or update. Kept good stuff and designed some new layers to make them work flawlessly in the new infrastructure. Ingesting clicks data for analytics How we did it
  16. 16. Ingesting clicks data for analytics
  17. 17. Pretty important, they need to: • Stay up • Scale up/down depending on the incoming traffic • Never lose anything • Be as fast as possible in processing They’re a custom web app application that undergoes a lot of testing. We used stuff like Beanstalk, Scaling groups, Load Balancers and Health routing offered by our cloud provider to manage the webapp scaling/availability Ingesting clicks data for analytics Redirect engine aka events collector
  18. 18. Pipeline Most of this part uses our cloud provider technology. This simplifies maintenance and provisioning, keeping the focus on the value of our product. Some moving parts are custom made by us to interact with the cloud technology (might be proprietary or just repackaged known one). Ingesting clicks data for analytics Tracking engine and friends
  19. 19. SQS Pipeline Kinesis • Events • Preprocessing • Postprocessing • DynamoDB Ingesting clicks data for analytics Tracking engine and friends
  20. 20. Combination of real-time and batch technologies. One of the scaling parts that actually provides value to the customers. Computes analysis on events data from a simple count to some predictions. Check the data produced by your processing system to improve the pipeline step-by-step! Ingesting clicks data for analytics Pipeline
  21. 21. Ingesting clicks data for analytics Pipeline
  22. 22. We employ different storage based on speed of delivery and data type. All the data is accessible via a REST API. This permits to develop a frontend layer with relative ease and allows customers to take control of the data and use it in a way we may have not considered. Ingesting clicks data for analytics Storage and data delivery
  23. 23. Managed services on the cloud help us a lot! Most of the team can focus on improvements and shipping (users are happy, so is the CEO). Some of us (me) still have to be the CloudOp/DevOp. p.s.: always prepare a Plan B for when you’ll break things! Ingesting clicks data for analytics Operations
  24. 24. Cloud is typically more expensive of your own metal. This extra money you have to spend is actually well spent: • Flexibility • Easier provisioning • Easier management • Easier operations There are different types of clouds, so choose wisely. Ingesting clicks data for analytics Cloud co$t$
  25. 25. Creating and managing a “big data” ready infrastructure is no easy task, but it can be done step-by-step also by startups. The cloud is a cool starting ground providing you with many of the toys you need, so you can focus on what part of “big data” gives you value! Use the wisdom shared by the big/medium players that have already been there (and built most of the stuff you’re using). Ingesting clicks data for analytics Conclusions
  26. 26. Thank You Any questions? @il_furio francesco@clickmeter.com

×