Dive into Sentry
The modern error logging and aggregation platform
XTao
09.19.2015 Beijing
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
徐涛
● @ 豆瓣
● (?:产品开发|运维)工程师
● (?:CODE|DevOps|Git|Python)
● 2014 PyConChina Beijing
● https://blog.xtao.me
❏ Douban: @xtaooooo
❏ Twitter: @xtao
❏ Github: @xtao
Sentry 概述
Sentry 毕业于 Disqus
https://engineering.disqus.com/
Sentry 历史
Sentry 是什么
DEMO
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
起源
● 2010
● http://disqus.com/
● django-db-log (祖父)
● tl;dr Sentry and Raven
are StarCraft 2(星际争
霸 2) units.
● driven-by-open-source
commit 3c2e87573d3bd16f61cf08fece0638cc47a4fc22
Author: David Cramer <dcramer@gmail.com>
Date: Mon May 12 16:26:19 2008 +0000
initial working code
djangodblog/__init__.py | 35
+++++++++++++++++++++++++++++++++++
djangodblog/models.py | 36
++++++++++++++++++++++++++++++++++++
2 files changed, 71 insertions(+)
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Sentry 5
● 2012
● Protocol Version 3
● branch: 5.4.x-maint
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Sentry 6
● 2013
● Protocol Version 4
● Protocol Version 5
● Alerts
● Filters
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Sentry 7
● 2014
● Organizations
● TSDB
● Rules
● Web API
● Protocol Version 6
● BIGINT
● Help Pages
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Senty 8
● 2015 ?
● Most of the
application has
been overhauled
and rewritten on
top of React and
our web API.
● beta
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Sentry 是什么
● 一个错误记录和汇聚平台
○ Server: Sentry (The Sentry Open Source Server)
○ Client: The Raven Clients.
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
为什么要用 Sentry
● 详细的错误息
○ 某一行代码 (Python)
○ 某一个变量 (Python)
● 详细错误分类
○ Tag
● 提醒
● 合理的重复错误处理
● 支持多种语言
○ 对 Python 支持好
❏ 额外的收获
❏ 入门
❏ 一个很好的 Django 项目,如果你要
学习如何使用 Django 的话,可以读
一下 Sentry 的源码
❏ 进阶
❏ Sentry 应该算是一个中型 Web 项目
了,如果你缺少 Web 项目开发经
验,也可以从源码中获取一些经验
❏ 开源
❏ 一个 Python 开源 Web 应用的示
例,数据迁移还是靠谱的
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Sentry - 服务端(7.x)
● Backend
○ Python
○ Django
○ Celery
● Frontend
○ JQuery
○ Backbone
○ Underscore
○ Bootstrap
○ Moment
● Database
○ MySQL
○ PostgreSQL
● KV
○ Cassandra
○ Riak
○ Redis
● Queue
○ Redis
○ RabbitMQ
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Raven - 官方支持的 Client
● Python
● JavaScript
● Node.js
● PHP
● Ruby
● Objective-C
● Java
● C#
● Go
DEMO
1. Hosted Sentry
a. https://www.getsentry.
com/signup/
b. 14-day Free Trial
2. Sentry On Promise
a. https://docs.getsentry.
com/on-
premise/server/installati
on/
b. Sentry Internal
Sentry 使用
如何提交错误
Raven
DSN
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Raven 101
pip install raven --upgrade
from raven import Client
client = Client('___DSN___')
try:
1 / 0
except ZeroDivisionError:
client.captureException()
def handle_request(request):
client.context.merge({'user': {
'email': request.user.email
}})
try:
...
finally:
client.context.clear()
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Raven 102
● WSGI middleware
● raven/middleware.py
```
A WSGI middleware which will
attempt to capture any
uncaught exceptions and send
them to Sentry.
>>> from raven.base import Client
>>> application = Sentry
(application, Client())
```
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
DSN 101
'{PROTOCOL}://{PUBLIC_KEY}:{SECRET_KEY}@{HOST}/{PATH}{PROJECT_ID}'
http://c44a73655e50454581da995bbedd392a:
8d29447e0e8241b9a178fd726fb07190@onimaru.intra.douban.com/10
udp://c44a73655e50454581da995bbedd392a:
8d29447e0e8241b9a178fd726fb07190@onimaru-udp.intra.douban.com:
4008/10
Sentry 特性
(๑•̀ㅂ•́)‫✧و‬ (つд⊂)
Event
Group
Protocol
Interface
TSDB
Buffer
Cache
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Event
● HTTP(DATA)
● UDP(DATA) ● EventManager ● Project
● Event
● Group
● EventMapping
○ event_id: uuid.uuid4().
hex
● UserReport
○ 用户反馈 Sentry 问题
● post_process_group.delay
● index_event.delay
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Group
● hashes
○ checksum (provided by client)
○ fingerprint / (default + fingerprint)
○ default (first interface ordered by score)
● find group
○ find group at GroupHash by hash
○ first matched group
● sample event (count, time)
● regression (resolved event)
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Protocol
● CLIENT_RESERVED_ATTRS = (
● 'project',
● 'event_id',
● 'message',
● 'checksum',
● 'culprit',
● 'fingerprint',
● 'level',
● 'time_spent',
● 'logger',
{
"event_id": "fc6d8c0c43fc4630ad850ee518f1b9d0",
"culprit": "my.module.function_name",
"timestamp": "2011-05-02T17:41:36",
"message": "SyntaxError: Wattttt!"
"sentry.interfaces.Exception": {
"type": "SyntaxError":
"value": "Wattttt!",
"module": "__builtins__"
}
}
● 'server_name',
● 'site',
● 'timestamp',
● 'extra',
● 'modules',
● 'tags',
● 'platform',
● 'release',
● )
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
UDP Protocol
"AUTHnnDATA"
● AUTH
○ "Sentry key=value, key=value, …"
● DATA
○ json string
○ zlib
○ base64
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
HTTP Protocol
● 用户认证跟 Web 复用了
● /api/store
● GET/POST DATA
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Interface
An interface is a structured
representation of data, which may
render differently than the default
``extra`` metadata in an event.
● to_python
● get_api_context
● to_json
● get_path
● get_alias
● get_hash
● get_score
● ...
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Interface - Exception
● 标准的 Python 异常
● type, value, module
● stacktrace == sentry.interfaces.
Stacktrace
>>> {
>>> "type": "ValueError",
>>> "value": "My exception value",
>>> "module": "__builtins__"
>>> "stacktrace": {
>>> # see sentry.interfaces.Stacktrace
>>> }
>>> }
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Interface - Message
● message (<= 1000)
● params
>>> {
>>> "message": "My raw message
with interpreted strings like %s",
>>> "params": ["this"]
>>> }
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Interface - HTTP
● 常用的 HTTP 参数 >>> {
>>> "url": "http://absolute.uri/foo",
>>> "method": "POST",
>>> "data": {
>>> "foo": "bar"
>>> },
>>> "query_string": "hello=world",
>>> "cookies": "foo=bar",
>>> "headers": {
>>> "Content-Type": "text/html"
>>> },
>>> "env": {
>>> "REMOTE_ADDR": "192.168.0.1"
>>> }
>>> }
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Interface - Query
● 用于记录 SQL >>> {
>>> "query": "SELECT 1"
>>> "engine": "psycopg2"
>>> }
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Interface - Template
● A rendered template (generally
used like a single frame in a
stacktrace).
● The attributes ``filename``,
``context_line``, and ``lineno`` are
required.
>>> {
>>> "abs_path": "/real/file/name.html"
>>> "filename": "file/name.html",
>>> "pre_context": [
>>> "line1",
>>> "line2"
>>> ],
>>> "context_line": "line3",
>>> "lineno": 3,
>>> "post_context": [
>>> "line4",
>>> "line5"
>>> ],
>>> }
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Interface - User
● 定义一个用户 >>> {
>>> "id": "unique_id",
>>> "username": "my_user",
>>> "email": "foo@example.com"
>>> "ip_address": "127.0.0.1",
>>> "optional": "value"
>>> }
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Interface - Stacktrace
● Python Frame
>>> {
>>> "frames": [{
>>> "abs_path": "/real/file/name.py"
>>> "filename": "file/name.py",
>>> "function": "myfunction",
>>> "vars": {
>>> "key": "value"
>>> },
>>> "pre_context": [
>>> "line1",
>>> "line2"
>>> ],
>>> "context_line": "line3",
>>> "lineno": 3,
>>> "in_app": true,
>>> "post_context": [
>>> "line4",
>>> "line5"
>>> ],
>>> }],
>>> "frames_omitted": [13, 56]
>>> }
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
TSDB - 时序数据库
● Dummy (none)
● InMemory (defaultdict)
● Redis (hashes)
Redis:
{
"TSDBModel:epoch:shard": {
"Key": Count
}
}
# rollups must be ordered from highest
granularity to lowest
SENTRY_TSDB_ROLLUPS = (
# (time in seconds, samples to keep)
(10, 360), # 60 minutes at 10 seconds
(3600, 24 * 7), # 7 days at 1 hour
(3600 * 24, 60), # 60 days at 1 day
)
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
NodeStore - KV 数据库
● riak
● cassandra
● django (node table)
● 用于和数据一起存储一些特殊信息(比如不
适合存在数据库里的大文本等)
● validate
● create
● delete
● delete_multi
● get
● get_multi
● set
● set_multi
● generate_id
● cleanup
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Example
class Event(Model):
"""
An individual event.
"""
__core__ = False
...
time_spent = BoundedIntegerField(null=True)
data = NodeField(blank=True, null=True)
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Cache
● django
○ filesystem
○ memcached
○ local memory
○ dummy
● redis
● set
● get
● delete
● redis:
○ from nydus.db import create_cluster
○ 支持 cluster
○ 重写了 rb,但还没有在已发布的版本里使用
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Example
● Cache 与 Model
○ db/models/manager.py
○ class BaseManager(Manager)
● get_from_cache
● updated by signal
● deleted by signal
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Buffer
This is useful in situations where a single event
might be happening so fast that the queue can't
keep up with the updates.
● InProcess (no buffer)
● Redis
● 降低 MySQL 数据的 QPS (写)
● 支持 Cluster Redis
● Redis 2.6.12 or newer
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Buffer Internal
● 生产者
● incr
●
● 'b:k:%s:%s' (hashmap, key_expire = 60 * 60
# 1 hour)
○ 'm'
○ 'f'
○ 'l+%s'
○ 'e+%s'
● 'b:p' (Sorted sets)
● 消费者
● process pending
● process
'flush-buffers': {
'task': 'sentry.tasks.process_buffer.process_pending',
'schedule': timedelta(seconds=10),
'options': {
'expires': 10,
'queue': 'counters-0',
}
},
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Membership
Roles
● Member *:read
● Admin *:write
● Owner *:delete
Scoping has access to all teams
● 跟 GitHub 类似的组织结构以及权限控制
● Organization - Owner, Admin, Member
● Team - (Role, Project)
● Project
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Sensitive Data
● 'password',
● 'secret',
● 'passwd',
● 'authorization',
● 'api_key',
● 'apikey',
● 'access_token',
● DEFAULT_SCRUBBED_FIELDS
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Notifications
● Rules
○ An event is first seen (the first event in a rollup)
○ An event changes state from resolved to unresolved
● State
○ Unresolved
○ Resolved
○ Muted
● Condition
● Action
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Tagging Events
● Event 分类
● We’ll automatically index all tags
for an event, as well as the
frequency and the last time a
value has been seen.
● TagValue
● GroupTagValue
● Added by buffer
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Rollups & Sampling
● Rollups
○ Raven.captureException(ex, {fingerprint: ['my', 'custom', 'fingerprint']})
○ Raven.captureException(ex, {fingerprint: ['{{ default }}', 'other', 'data']})
● Sampling
○ Count
○ Time
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Web Profile
● ?prof=1
● DEBUG
● super user
● src/sentry/utils/debug.py
def can(self, request):
if 'prof' not in request.GET:
return False
if settings.DEBUG:
return True
if hasattr(request, 'user') and
request.user.is_superuser:
return True
return False
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Sentry to Sentry
● 自举
● DISABLE_RAVEN
● default: project id == 1
● src/sentry/utils/raven.py
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
PostgreSQL & Gevent
● psycopg2
● src/sentry/utils/gevent.py
● Sentry 官方使用的应该是这个数据库,有非阻塞的 patch,支持异步
Sentry
@douban
有料
其中充斥着不少嘈点
问题
部署
监控
调优
Tips
“自己解决不了的
问题,不要指望工
具能帮你解决”
乔治@豆瓣
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Monitor
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
豆瓣
● 应用
○ Python (大部分)
○ Javascript (前端)
○ Go (少部分)
○ C++/C/Java (少量)
● 错误
○ devtools (DIY, 已废弃)
○ Sentry
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
问题
● 已经部署了一套 Sentry
● 5.x 使用 UDP 协议
● 测试以及线上,有丢错误情况
● 且比较明显
● 但是这时还没有针对 Sentry 的监控
● 开始研究黑盒
● UDP Worker CPU 使用率比较高
● UDP 是用 DNS 做负载均衡
● DNS 因为 cache 问题,导致负载不平衡
● 使用 Random 改善了 cache 带来的隐患
● 查看 Worker 代码
● Gevent / Eventlet 使用错误,没有 Monkey
Patch
● 5.x 数据库压力较大,需要做合并写
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
升级
● 5.x-maint support UDP
● but 7.x not
● We have to backport UDP to 7.x
● 幸好原来的接口还在
❏ src/sentry/conf/server.py:
❏ #socket.setdefaulttimeout(5)
❏ src/sentry/coreapi.py:
❏ insert_data_to_database_sync (async to
sync,Redis 内存放大太厉害,因为 cache
原因)
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Insert Queue
● insert_data_to_database - cache
● preprocess_event - queue
● save_event - queue
●
● insert_data_to_database_sync - queue
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
部署情况
● HTTP x 4 (默认 Sentry 是用 Gunicorn 管理 Worker 的)
● UDP x 4 (开启了 Gevent,收到包后,扔到队列)
● Celery x 4 (Task consumer, 默认是开启 CPU_NUM 个 Worker)
● Celery Beat x 1
● Cron:cleanup 21 (只保留 21 天的数据)
● HTTP 前面用 LVS + Nginx 做负载均衡
● UDP 用 DNS 做负载均衡
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
内部配置
● LDAP
○ 我们用的用户帐号系统
○ 配置一下 Sentry 即可
● MAIL
○ 配置 Sentry 邮件服务器
● IRC
○ sentry-irc
○ 因为我们使用了 ircbot,稍微改了一下这个插件代码
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Why UDP
● 快
● 应用不需要关心 Sentry 服务是否正常
● 即使 Sentry 出问题,也不影响应用
● 可以观察系统 UDP 丢包情况,判断 UDP 服务是否正常
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Celery
● 芹菜
● 还没有吃透
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
DBA
● Redis
○ Memory
○ QPS
○ CPU
○ Queue Size
● MySQL
○ QPS
■ update
■ insert
■ delete
■ select
○ thread
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
Sentry
● Statsd
○ celery worker cpu
○ udp worker cpu
○ http worker cpu
● App 内统计
○ task 执行时间
○ task 执行数量
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
UDP Received Packet (Sentry)
● udp worker
● 收包后记录
● d = sock.recvfrom(self.BUF_SIZE)
● statsd.increment(STATSD_KEY_RECV)
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
UDP Received Packet (Kernel)
● UDP Server 收到的包数 (by diamond)
● ~ $ sudo /sbin/iptables -t filter -I INPUT -i lan -p udp --dport 4008 -j ACCEPT
● ~ $ sudo /sbin/iptables -L INPUT 1 -nvx
● 52308183 414772916064 ACCEPT udp -- lan * 0.0.0.0/0
0.0.0.0/0 udp dpt:4008
● pkts (UDP 完整包数,底层已经处理了分包问题)
● bts
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
UDP Dropped Packets (Kernel)
● cat /proc/net/udp
● sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops
● 41: 00000000:80CE 00000000:0000 07 00000000:00000000 00:00000000 00000000 6561 0 4110825944 2 ffff8809c23e5e40 0
● cat /proc/net/snmp
● Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors
● Udp: 5416706536 993028 290598311 22725578190 4662160 1318
● UdpLite: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors
● UdpLite: 0 0 0 0 0 0
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
TIPS
● Webhooks:
○ 默认禁止访问内网 IP, 需要更改一下配置
● Timezone
○ SENTRY_DEFAULT_TIME_ZONE = 'Asia/Shanghai' 设置用户默认时区
● Public
○ SENTRY_PUBLIC = False 这个权限有点问题,不要开启
● Register
○ SENTRY_FEATURES['auth:register'] = False 禁止自己注册
北京/上海/广州 0xFF Life's pathetic, go Pythonic!
下一步计划
● 项目错误统计(QPS,Sentry 提供的图还不能满足需求)
● Profile 工具,有助于分析 Worker 瓶颈 (Celery)
● 应对雪崩式错误的处理方案(压测 Sentry)
● 尝试一下 MySQL + Redis + Gevent
Jobs
● 2016 校招
● always 社招
● TO: ruby@douban.com
● 当然 python 也可以
● TO: python@douban.com
● 如果你想试试 js 也可以尝试
● TO: js@douban.com
● 详情: http://jobs.douban.com
北京/上海/广州 0xFF Life's pathetic, go Pythonic!

Dive into sentry

  • 1.
    Dive into Sentry Themodern error logging and aggregation platform XTao 09.19.2015 Beijing
  • 2.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! 徐涛 ● @ 豆瓣 ● (?:产品开发|运维)工程师 ● (?:CODE|DevOps|Git|Python) ● 2014 PyConChina Beijing ● https://blog.xtao.me ❏ Douban: @xtaooooo ❏ Twitter: @xtao ❏ Github: @xtao
  • 3.
    Sentry 概述 Sentry 毕业于Disqus https://engineering.disqus.com/ Sentry 历史 Sentry 是什么 DEMO
  • 4.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! 起源 ● 2010 ● http://disqus.com/ ● django-db-log (祖父) ● tl;dr Sentry and Raven are StarCraft 2(星际争 霸 2) units. ● driven-by-open-source commit 3c2e87573d3bd16f61cf08fece0638cc47a4fc22 Author: David Cramer <dcramer@gmail.com> Date: Mon May 12 16:26:19 2008 +0000 initial working code djangodblog/__init__.py | 35 +++++++++++++++++++++++++++++++++++ djangodblog/models.py | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+)
  • 5.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Sentry 5 ● 2012 ● Protocol Version 3 ● branch: 5.4.x-maint
  • 6.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Sentry 6 ● 2013 ● Protocol Version 4 ● Protocol Version 5 ● Alerts ● Filters
  • 7.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Sentry 7 ● 2014 ● Organizations ● TSDB ● Rules ● Web API ● Protocol Version 6 ● BIGINT ● Help Pages
  • 8.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Senty 8 ● 2015 ? ● Most of the application has been overhauled and rewritten on top of React and our web API. ● beta
  • 9.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Sentry 是什么 ● 一个错误记录和汇聚平台 ○ Server: Sentry (The Sentry Open Source Server) ○ Client: The Raven Clients.
  • 10.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! 为什么要用 Sentry ● 详细的错误息 ○ 某一行代码 (Python) ○ 某一个变量 (Python) ● 详细错误分类 ○ Tag ● 提醒 ● 合理的重复错误处理 ● 支持多种语言 ○ 对 Python 支持好 ❏ 额外的收获 ❏ 入门 ❏ 一个很好的 Django 项目,如果你要 学习如何使用 Django 的话,可以读 一下 Sentry 的源码 ❏ 进阶 ❏ Sentry 应该算是一个中型 Web 项目 了,如果你缺少 Web 项目开发经 验,也可以从源码中获取一些经验 ❏ 开源 ❏ 一个 Python 开源 Web 应用的示 例,数据迁移还是靠谱的
  • 11.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Sentry - 服务端(7.x) ● Backend ○ Python ○ Django ○ Celery ● Frontend ○ JQuery ○ Backbone ○ Underscore ○ Bootstrap ○ Moment ● Database ○ MySQL ○ PostgreSQL ● KV ○ Cassandra ○ Riak ○ Redis ● Queue ○ Redis ○ RabbitMQ
  • 12.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Raven - 官方支持的 Client ● Python ● JavaScript ● Node.js ● PHP ● Ruby ● Objective-C ● Java ● C# ● Go
  • 13.
    DEMO 1. Hosted Sentry a.https://www.getsentry. com/signup/ b. 14-day Free Trial 2. Sentry On Promise a. https://docs.getsentry. com/on- premise/server/installati on/ b. Sentry Internal
  • 14.
  • 15.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Raven 101 pip install raven --upgrade from raven import Client client = Client('___DSN___') try: 1 / 0 except ZeroDivisionError: client.captureException() def handle_request(request): client.context.merge({'user': { 'email': request.user.email }}) try: ... finally: client.context.clear()
  • 16.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Raven 102 ● WSGI middleware ● raven/middleware.py ``` A WSGI middleware which will attempt to capture any uncaught exceptions and send them to Sentry. >>> from raven.base import Client >>> application = Sentry (application, Client()) ```
  • 17.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! DSN 101 '{PROTOCOL}://{PUBLIC_KEY}:{SECRET_KEY}@{HOST}/{PATH}{PROJECT_ID}' http://c44a73655e50454581da995bbedd392a: 8d29447e0e8241b9a178fd726fb07190@onimaru.intra.douban.com/10 udp://c44a73655e50454581da995bbedd392a: 8d29447e0e8241b9a178fd726fb07190@onimaru-udp.intra.douban.com: 4008/10
  • 18.
  • 19.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Event ● HTTP(DATA) ● UDP(DATA) ● EventManager ● Project ● Event ● Group ● EventMapping ○ event_id: uuid.uuid4(). hex ● UserReport ○ 用户反馈 Sentry 问题 ● post_process_group.delay ● index_event.delay
  • 20.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Group ● hashes ○ checksum (provided by client) ○ fingerprint / (default + fingerprint) ○ default (first interface ordered by score) ● find group ○ find group at GroupHash by hash ○ first matched group ● sample event (count, time) ● regression (resolved event)
  • 21.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Protocol ● CLIENT_RESERVED_ATTRS = ( ● 'project', ● 'event_id', ● 'message', ● 'checksum', ● 'culprit', ● 'fingerprint', ● 'level', ● 'time_spent', ● 'logger', { "event_id": "fc6d8c0c43fc4630ad850ee518f1b9d0", "culprit": "my.module.function_name", "timestamp": "2011-05-02T17:41:36", "message": "SyntaxError: Wattttt!" "sentry.interfaces.Exception": { "type": "SyntaxError": "value": "Wattttt!", "module": "__builtins__" } } ● 'server_name', ● 'site', ● 'timestamp', ● 'extra', ● 'modules', ● 'tags', ● 'platform', ● 'release', ● )
  • 22.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! UDP Protocol "AUTHnnDATA" ● AUTH ○ "Sentry key=value, key=value, …" ● DATA ○ json string ○ zlib ○ base64
  • 23.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! HTTP Protocol ● 用户认证跟 Web 复用了 ● /api/store ● GET/POST DATA
  • 24.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Interface An interface is a structured representation of data, which may render differently than the default ``extra`` metadata in an event. ● to_python ● get_api_context ● to_json ● get_path ● get_alias ● get_hash ● get_score ● ...
  • 25.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Interface - Exception ● 标准的 Python 异常 ● type, value, module ● stacktrace == sentry.interfaces. Stacktrace >>> { >>> "type": "ValueError", >>> "value": "My exception value", >>> "module": "__builtins__" >>> "stacktrace": { >>> # see sentry.interfaces.Stacktrace >>> } >>> }
  • 26.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Interface - Message ● message (<= 1000) ● params >>> { >>> "message": "My raw message with interpreted strings like %s", >>> "params": ["this"] >>> }
  • 27.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Interface - HTTP ● 常用的 HTTP 参数 >>> { >>> "url": "http://absolute.uri/foo", >>> "method": "POST", >>> "data": { >>> "foo": "bar" >>> }, >>> "query_string": "hello=world", >>> "cookies": "foo=bar", >>> "headers": { >>> "Content-Type": "text/html" >>> }, >>> "env": { >>> "REMOTE_ADDR": "192.168.0.1" >>> } >>> }
  • 28.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Interface - Query ● 用于记录 SQL >>> { >>> "query": "SELECT 1" >>> "engine": "psycopg2" >>> }
  • 29.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Interface - Template ● A rendered template (generally used like a single frame in a stacktrace). ● The attributes ``filename``, ``context_line``, and ``lineno`` are required. >>> { >>> "abs_path": "/real/file/name.html" >>> "filename": "file/name.html", >>> "pre_context": [ >>> "line1", >>> "line2" >>> ], >>> "context_line": "line3", >>> "lineno": 3, >>> "post_context": [ >>> "line4", >>> "line5" >>> ], >>> }
  • 30.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Interface - User ● 定义一个用户 >>> { >>> "id": "unique_id", >>> "username": "my_user", >>> "email": "foo@example.com" >>> "ip_address": "127.0.0.1", >>> "optional": "value" >>> }
  • 31.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Interface - Stacktrace ● Python Frame >>> { >>> "frames": [{ >>> "abs_path": "/real/file/name.py" >>> "filename": "file/name.py", >>> "function": "myfunction", >>> "vars": { >>> "key": "value" >>> }, >>> "pre_context": [ >>> "line1", >>> "line2" >>> ], >>> "context_line": "line3", >>> "lineno": 3, >>> "in_app": true, >>> "post_context": [ >>> "line4", >>> "line5" >>> ], >>> }], >>> "frames_omitted": [13, 56] >>> }
  • 32.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! TSDB - 时序数据库 ● Dummy (none) ● InMemory (defaultdict) ● Redis (hashes) Redis: { "TSDBModel:epoch:shard": { "Key": Count } } # rollups must be ordered from highest granularity to lowest SENTRY_TSDB_ROLLUPS = ( # (time in seconds, samples to keep) (10, 360), # 60 minutes at 10 seconds (3600, 24 * 7), # 7 days at 1 hour (3600 * 24, 60), # 60 days at 1 day )
  • 33.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! NodeStore - KV 数据库 ● riak ● cassandra ● django (node table) ● 用于和数据一起存储一些特殊信息(比如不 适合存在数据库里的大文本等) ● validate ● create ● delete ● delete_multi ● get ● get_multi ● set ● set_multi ● generate_id ● cleanup
  • 34.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Example class Event(Model): """ An individual event. """ __core__ = False ... time_spent = BoundedIntegerField(null=True) data = NodeField(blank=True, null=True)
  • 35.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Cache ● django ○ filesystem ○ memcached ○ local memory ○ dummy ● redis ● set ● get ● delete ● redis: ○ from nydus.db import create_cluster ○ 支持 cluster ○ 重写了 rb,但还没有在已发布的版本里使用
  • 36.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Example ● Cache 与 Model ○ db/models/manager.py ○ class BaseManager(Manager) ● get_from_cache ● updated by signal ● deleted by signal
  • 37.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Buffer This is useful in situations where a single event might be happening so fast that the queue can't keep up with the updates. ● InProcess (no buffer) ● Redis ● 降低 MySQL 数据的 QPS (写) ● 支持 Cluster Redis ● Redis 2.6.12 or newer
  • 38.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Buffer Internal ● 生产者 ● incr ● ● 'b:k:%s:%s' (hashmap, key_expire = 60 * 60 # 1 hour) ○ 'm' ○ 'f' ○ 'l+%s' ○ 'e+%s' ● 'b:p' (Sorted sets) ● 消费者 ● process pending ● process 'flush-buffers': { 'task': 'sentry.tasks.process_buffer.process_pending', 'schedule': timedelta(seconds=10), 'options': { 'expires': 10, 'queue': 'counters-0', } },
  • 39.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Membership Roles ● Member *:read ● Admin *:write ● Owner *:delete Scoping has access to all teams ● 跟 GitHub 类似的组织结构以及权限控制 ● Organization - Owner, Admin, Member ● Team - (Role, Project) ● Project
  • 40.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Sensitive Data ● 'password', ● 'secret', ● 'passwd', ● 'authorization', ● 'api_key', ● 'apikey', ● 'access_token', ● DEFAULT_SCRUBBED_FIELDS
  • 41.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Notifications ● Rules ○ An event is first seen (the first event in a rollup) ○ An event changes state from resolved to unresolved ● State ○ Unresolved ○ Resolved ○ Muted ● Condition ● Action
  • 42.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Tagging Events ● Event 分类 ● We’ll automatically index all tags for an event, as well as the frequency and the last time a value has been seen. ● TagValue ● GroupTagValue ● Added by buffer
  • 43.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Rollups & Sampling ● Rollups ○ Raven.captureException(ex, {fingerprint: ['my', 'custom', 'fingerprint']}) ○ Raven.captureException(ex, {fingerprint: ['{{ default }}', 'other', 'data']}) ● Sampling ○ Count ○ Time
  • 44.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Web Profile ● ?prof=1 ● DEBUG ● super user ● src/sentry/utils/debug.py def can(self, request): if 'prof' not in request.GET: return False if settings.DEBUG: return True if hasattr(request, 'user') and request.user.is_superuser: return True return False
  • 45.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Sentry to Sentry ● 自举 ● DISABLE_RAVEN ● default: project id == 1 ● src/sentry/utils/raven.py
  • 46.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! PostgreSQL & Gevent ● psycopg2 ● src/sentry/utils/gevent.py ● Sentry 官方使用的应该是这个数据库,有非阻塞的 patch,支持异步
  • 47.
  • 48.
  • 49.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Monitor
  • 50.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! 豆瓣 ● 应用 ○ Python (大部分) ○ Javascript (前端) ○ Go (少部分) ○ C++/C/Java (少量) ● 错误 ○ devtools (DIY, 已废弃) ○ Sentry
  • 51.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! 问题 ● 已经部署了一套 Sentry ● 5.x 使用 UDP 协议 ● 测试以及线上,有丢错误情况 ● 且比较明显 ● 但是这时还没有针对 Sentry 的监控 ● 开始研究黑盒 ● UDP Worker CPU 使用率比较高 ● UDP 是用 DNS 做负载均衡 ● DNS 因为 cache 问题,导致负载不平衡 ● 使用 Random 改善了 cache 带来的隐患 ● 查看 Worker 代码 ● Gevent / Eventlet 使用错误,没有 Monkey Patch ● 5.x 数据库压力较大,需要做合并写
  • 52.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! 升级 ● 5.x-maint support UDP ● but 7.x not ● We have to backport UDP to 7.x ● 幸好原来的接口还在 ❏ src/sentry/conf/server.py: ❏ #socket.setdefaulttimeout(5) ❏ src/sentry/coreapi.py: ❏ insert_data_to_database_sync (async to sync,Redis 内存放大太厉害,因为 cache 原因)
  • 53.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Insert Queue ● insert_data_to_database - cache ● preprocess_event - queue ● save_event - queue ● ● insert_data_to_database_sync - queue
  • 54.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! 部署情况 ● HTTP x 4 (默认 Sentry 是用 Gunicorn 管理 Worker 的) ● UDP x 4 (开启了 Gevent,收到包后,扔到队列) ● Celery x 4 (Task consumer, 默认是开启 CPU_NUM 个 Worker) ● Celery Beat x 1 ● Cron:cleanup 21 (只保留 21 天的数据) ● HTTP 前面用 LVS + Nginx 做负载均衡 ● UDP 用 DNS 做负载均衡
  • 55.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! 内部配置 ● LDAP ○ 我们用的用户帐号系统 ○ 配置一下 Sentry 即可 ● MAIL ○ 配置 Sentry 邮件服务器 ● IRC ○ sentry-irc ○ 因为我们使用了 ircbot,稍微改了一下这个插件代码
  • 56.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Why UDP ● 快 ● 应用不需要关心 Sentry 服务是否正常 ● 即使 Sentry 出问题,也不影响应用 ● 可以观察系统 UDP 丢包情况,判断 UDP 服务是否正常
  • 57.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Celery ● 芹菜 ● 还没有吃透
  • 58.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! DBA ● Redis ○ Memory ○ QPS ○ CPU ○ Queue Size ● MySQL ○ QPS ■ update ■ insert ■ delete ■ select ○ thread
  • 59.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! Sentry ● Statsd ○ celery worker cpu ○ udp worker cpu ○ http worker cpu ● App 内统计 ○ task 执行时间 ○ task 执行数量
  • 60.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! UDP Received Packet (Sentry) ● udp worker ● 收包后记录 ● d = sock.recvfrom(self.BUF_SIZE) ● statsd.increment(STATSD_KEY_RECV)
  • 61.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! UDP Received Packet (Kernel) ● UDP Server 收到的包数 (by diamond) ● ~ $ sudo /sbin/iptables -t filter -I INPUT -i lan -p udp --dport 4008 -j ACCEPT ● ~ $ sudo /sbin/iptables -L INPUT 1 -nvx ● 52308183 414772916064 ACCEPT udp -- lan * 0.0.0.0/0 0.0.0.0/0 udp dpt:4008 ● pkts (UDP 完整包数,底层已经处理了分包问题) ● bts
  • 62.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! UDP Dropped Packets (Kernel) ● cat /proc/net/udp ● sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops ● 41: 00000000:80CE 00000000:0000 07 00000000:00000000 00:00000000 00000000 6561 0 4110825944 2 ffff8809c23e5e40 0 ● cat /proc/net/snmp ● Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors ● Udp: 5416706536 993028 290598311 22725578190 4662160 1318 ● UdpLite: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors ● UdpLite: 0 0 0 0 0 0
  • 63.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! TIPS ● Webhooks: ○ 默认禁止访问内网 IP, 需要更改一下配置 ● Timezone ○ SENTRY_DEFAULT_TIME_ZONE = 'Asia/Shanghai' 设置用户默认时区 ● Public ○ SENTRY_PUBLIC = False 这个权限有点问题,不要开启 ● Register ○ SENTRY_FEATURES['auth:register'] = False 禁止自己注册
  • 64.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic! 下一步计划 ● 项目错误统计(QPS,Sentry 提供的图还不能满足需求) ● Profile 工具,有助于分析 Worker 瓶颈 (Celery) ● 应对雪崩式错误的处理方案(压测 Sentry) ● 尝试一下 MySQL + Redis + Gevent
  • 65.
    Jobs ● 2016 校招 ●always 社招 ● TO: ruby@douban.com ● 当然 python 也可以 ● TO: python@douban.com ● 如果你想试试 js 也可以尝试 ● TO: js@douban.com ● 详情: http://jobs.douban.com
  • 66.
    北京/上海/广州 0xFF Life'spathetic, go Pythonic!