Shimon Tolts
General Manager, Data Solutions
Atom
Data Pipeline Processing 200B events with
Node.js And Docker On AWS
About ironSource: Hypergrowth
People Reached Each Month
4200
Apps Installed Every Minute
with the ironSource Platform
Registered & Analyzed Data Events
Every Month
200B
800M
50B
0
100B
150B
200B
Jun
2015
Jul
2015
Aug
2015
Sep
2015
Oct
2015
Nov
2015
Dec
2015
Jan
2016
Feb
2016
Mar
2016
Apr
2016
May
2016
We needed a way to manage this data:
Our Business Challenge
ProcessCollect Store
Collection
● Multi region layer - Latency based routing
● Low latency from client to Atom servers
● High Availability - AWS regions does fail!
● Storing raw data + headers upon receiving
Data Enrichment
● Enrich data before storing in your Data Lake
and/or Warehouse
○ IP to Country
○ Currency conversion
○ Decrypt data
○ User Agent parsing - OS, Browser, Device...
● Any custom logic you would like! - fully
extendible
Data Targets
● Near real-time data insertion - 1 minute!
● Stream data to Google Storage and/or AWS S3
● Smart insertion of data into AWS Redshift
○ Set the amount of parallel copys
○ Configure priority on tables
● BigQuery - Streaming data using batch files
import (saves 20% cost)
Micro-Services Architecture
● Everything is a service
● Decoupling
● Distributed systems
Separate lifecycle
● Communication using RESTful /
Queue / Streams
Docker
● Linux Container
● Save provisioning time
● Infrastructure as code
● Dev-Test-Production - identical container
● Ship easily
Cloud infrastructure
● Pay as you go - (grow)
● SaaS services
● Auto-scaling-groups
● DynamoDB
● RDS *SQL
● Redshift data warehouse
Continuous Integration
● From commit to production
● Jenkins commit hook
● Git branching model
● AWS dynamic slaves
● Unit tests
● Docker builds
● Updating live environment
Diagram
Starting Point
Pre-baked images - AMIs
Supervisor
Nginx reverse proxy
Node.js * cpu-count
Provisioning time * instances
Bash provisioning scripts
Minimum Viable Product
Infrastructure as code
Nginx
Node.js * cpu-count
Supervisor
Docker Hub
No Bash scripts!
No provisioning time * instances
https://github.com/ironSource/docker-config/blob/bb6be85b97132cbdd10084305ee1ee2f414b0b50/Dockerfile
Interactive Cycle
Nginx
Supervisor
Infrastructure as code
Node.js * cpu-count
Docker Hub
No Bash scripts!
No provisioning time * instances
https://github.com/ironSource/docker-config/blob/c4bbad11a323fd6e36ff31505c43e7c8dc51b1eb/Dockerfile-iojs-cluster
User Data
https://github.com/ironSource/docker-config/blob/2f4ccc7c277850de928cc432f47b2fc58fb8732a/Dockerfile-nodejs-cluster
docker-common.yml
docker-compose.yml
https://stash.ironsrc.com/projects/INFRA-IB/repos/ironbeastcompserter/browse/docker-compose.yml
Docker Compose Example #1 (Using ‘Extends):
User Data
Docker Compose Example #2 (Using ‘links’):
10 Million
Free Monthly Events
Thank you!
ironsrc.com/atom
shimont@ironsrc.com @shimontolts

How Docker Accelerates Continuous Development at ironSource: Containers #101 Meetup

Editor's Notes

  • #2 Intro slide
  • #3 AWS-specific numbers (get from Shimon). Scale problems
  • #4 AWS-specific numbers (get from Shimon)
  • #9 Differentiate slide from slide 5 somehow