廣告系統在Docker/Mesos上的可靠性實
踐
Michael Apr.2014 聚效广告(MediaV)
Who Am I ?
Where is our system?
Where is our system?
Small Impression with Huge Computing
AD Request
10億 200億+
QPS
100萬+1萬
Latency
500ms 10ms
60 DevOps Engineers
2000+ physical server
100+ module with realtime service
99.95% service availability
Why Container?
Why Scheduler?
• 人為事故,debug,env changed etc…
• 非人為故意,Bug, Crash, OOM, memory leak, disk
full etc…
• 外部原因,ad code
• On-Call 恢復
• Scaling Service
• 資源利用率
We are in 2016
We are in 2014
2014Q4
touch lmctfy
2015Q1
try docker
with k8s
2015Q2
docker on
mesos/yarn?
2015Q3
we are running
docker/mesos
etc.
2016Q1
more
batch job &
LTS online
2015Q4
more service
ci/release
How to start?
MESOS可以為團隊帶來什麼?
典型LTS adhoc任务轻服务
Free Free
—100%
—100%
資源使用分佈DEMO
服務Docker容器化遇到的典型問題
SE7EN
1/7
1/7
“If you run SSHD in your Docker containers,
you're doing it wrong!”
https://jpetazzo.github.io/2014/06/23/docker-
ssh-considered-evil/
–Jérôme Petazzoni
2/7 where is my debug logs?
3/7 Docker Network性能差?
/machinezone.github.io/research/networking-solutions-for-kubern
4/7 如何寫本地文件?如何存儲持久化?
+
5/7 服務的註冊和發現?
We’re
OR
6/7 如何讓服務可調度性?
這是一個大問題,留給每個Dev工程師
7/7 服務器的數據加載問題?
拋棄 迎接
rsync
cp
scp
ftp
Everything API
/Thrift
Marathon Framework on MESOS
Chronos Framework on MESOS
Chronos : batch job在分布式系統上的替代品
chronos cron azkaban
distributed Yes No half
Web UI Yes No Yes
Job history Yes,Simple Manual Yes,Full
dependency Yes,simple No Yes,full
User Auth No No Yes
Resource limit
(cpu/mem/disk) Yes No No
Debug log mesos sandbox Manual web UI
Docker/Mesos實踐過程中需要注意的地方
health check
with
Marathon
on Mesos
{
"protocol": "COMMAND",
"command": { "value": "curl -f -X GET
http://$HOST:$PORT0/health" },
"gracePeriodSeconds": 300,
"intervalSeconds": 60,
"timeoutSeconds": 20,
"maxConsecutiveFailures": 3
}
{
"protocol": "COMMAND",
"portIndex": 0,
"command": {
"value": "nc localhost 8119"
},
"gracePeriodSeconds": 300,
"intervalSeconds": 60,
"timeoutSeconds": 20,
"maxConsecutiveFailures": 3,
"ignoreHttp1xx": false
}
Marathon port resource
--resources="ports(*):[8000-9000, 31000-32000]"
Dockerfile review規則
Dockerfile必須Code Review
Everything in codebase: code/config
禁止使用不穩定的wget/curl源
Port資源必須申請並註冊
Q&A ?
ye.mikez@gmail.com
zhangye@mvad.com

廣告系統在Docker/Mesos上的可靠性實踐