Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dive into sentry

424 views

Published on

Pycon China 2015

Published in: Technology
  • Be the first to comment

Dive into sentry

  1. 1. Dive into Sentry The modern error logging and aggregation platform XTao 09.19.2015 Beijing
  2. 2. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 徐涛 ● @ 豆瓣 ● (?:产品开发|运维)工程师 ● (?:CODE|DevOps|Git|Python) ● 2014 PyConChina Beijing ● https://blog.xtao.me ❏ Douban: @xtaooooo ❏ Twitter: @xtao ❏ Github: @xtao
  3. 3. Sentry 概述 Sentry 毕业于 Disqus https://engineering.disqus.com/ Sentry 历史 Sentry 是什么 DEMO
  4. 4. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 起源 ● 2010 ● http://disqus.com/ ● django-db-log (祖父) ● tl;dr Sentry and Raven are StarCraft 2(星际争 霸 2) units. ● driven-by-open-source commit 3c2e87573d3bd16f61cf08fece0638cc47a4fc22 Author: David Cramer <dcramer@gmail.com> Date: Mon May 12 16:26:19 2008 +0000 initial working code djangodblog/__init__.py | 35 +++++++++++++++++++++++++++++++++++ djangodblog/models.py | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+)
  5. 5. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Sentry 5 ● 2012 ● Protocol Version 3 ● branch: 5.4.x-maint
  6. 6. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Sentry 6 ● 2013 ● Protocol Version 4 ● Protocol Version 5 ● Alerts ● Filters
  7. 7. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Sentry 7 ● 2014 ● Organizations ● TSDB ● Rules ● Web API ● Protocol Version 6 ● BIGINT ● Help Pages
  8. 8. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Senty 8 ● 2015 ? ● Most of the application has been overhauled and rewritten on top of React and our web API. ● beta
  9. 9. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Sentry 是什么 ● 一个错误记录和汇聚平台 ○ Server: Sentry (The Sentry Open Source Server) ○ Client: The Raven Clients.
  10. 10. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 为什么要用 Sentry ● 详细的错误息 ○ 某一行代码 (Python) ○ 某一个变量 (Python) ● 详细错误分类 ○ Tag ● 提醒 ● 合理的重复错误处理 ● 支持多种语言 ○ 对 Python 支持好 ❏ 额外的收获 ❏ 入门 ❏ 一个很好的 Django 项目,如果你要 学习如何使用 Django 的话,可以读 一下 Sentry 的源码 ❏ 进阶 ❏ Sentry 应该算是一个中型 Web 项目 了,如果你缺少 Web 项目开发经 验,也可以从源码中获取一些经验 ❏ 开源 ❏ 一个 Python 开源 Web 应用的示 例,数据迁移还是靠谱的
  11. 11. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Sentry - 服务端(7.x) ● Backend ○ Python ○ Django ○ Celery ● Frontend ○ JQuery ○ Backbone ○ Underscore ○ Bootstrap ○ Moment ● Database ○ MySQL ○ PostgreSQL ● KV ○ Cassandra ○ Riak ○ Redis ● Queue ○ Redis ○ RabbitMQ
  12. 12. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Raven - 官方支持的 Client ● Python ● JavaScript ● Node.js ● PHP ● Ruby ● Objective-C ● Java ● C# ● Go
  13. 13. DEMO 1. Hosted Sentry a. https://www.getsentry. com/signup/ b. 14-day Free Trial 2. Sentry On Promise a. https://docs.getsentry. com/on- premise/server/installati on/ b. Sentry Internal
  14. 14. Sentry 使用 如何提交错误 Raven DSN
  15. 15. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Raven 101 pip install raven --upgrade from raven import Client client = Client('___DSN___') try: 1 / 0 except ZeroDivisionError: client.captureException() def handle_request(request): client.context.merge({'user': { 'email': request.user.email }}) try: ... finally: client.context.clear()
  16. 16. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Raven 102 ● WSGI middleware ● raven/middleware.py ``` A WSGI middleware which will attempt to capture any uncaught exceptions and send them to Sentry. >>> from raven.base import Client >>> application = Sentry (application, Client()) ```
  17. 17. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! DSN 101 '{PROTOCOL}://{PUBLIC_KEY}:{SECRET_KEY}@{HOST}/{PATH}{PROJECT_ID}' http://c44a73655e50454581da995bbedd392a: 8d29447e0e8241b9a178fd726fb07190@onimaru.intra.douban.com/10 udp://c44a73655e50454581da995bbedd392a: 8d29447e0e8241b9a178fd726fb07190@onimaru-udp.intra.douban.com: 4008/10
  18. 18. Sentry 特性 (๑•̀ㅂ•́)‫✧و‬ (つд⊂) Event Group Protocol Interface TSDB Buffer Cache
  19. 19. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Event ● HTTP(DATA) ● UDP(DATA) ● EventManager ● Project ● Event ● Group ● EventMapping ○ event_id: uuid.uuid4(). hex ● UserReport ○ 用户反馈 Sentry 问题 ● post_process_group.delay ● index_event.delay
  20. 20. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Group ● hashes ○ checksum (provided by client) ○ fingerprint / (default + fingerprint) ○ default (first interface ordered by score) ● find group ○ find group at GroupHash by hash ○ first matched group ● sample event (count, time) ● regression (resolved event)
  21. 21. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Protocol ● CLIENT_RESERVED_ATTRS = ( ● 'project', ● 'event_id', ● 'message', ● 'checksum', ● 'culprit', ● 'fingerprint', ● 'level', ● 'time_spent', ● 'logger', { "event_id": "fc6d8c0c43fc4630ad850ee518f1b9d0", "culprit": "my.module.function_name", "timestamp": "2011-05-02T17:41:36", "message": "SyntaxError: Wattttt!" "sentry.interfaces.Exception": { "type": "SyntaxError": "value": "Wattttt!", "module": "__builtins__" } } ● 'server_name', ● 'site', ● 'timestamp', ● 'extra', ● 'modules', ● 'tags', ● 'platform', ● 'release', ● )
  22. 22. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! UDP Protocol "AUTHnnDATA" ● AUTH ○ "Sentry key=value, key=value, …" ● DATA ○ json string ○ zlib ○ base64
  23. 23. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! HTTP Protocol ● 用户认证跟 Web 复用了 ● /api/store ● GET/POST DATA
  24. 24. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Interface An interface is a structured representation of data, which may render differently than the default ``extra`` metadata in an event. ● to_python ● get_api_context ● to_json ● get_path ● get_alias ● get_hash ● get_score ● ...
  25. 25. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Interface - Exception ● 标准的 Python 异常 ● type, value, module ● stacktrace == sentry.interfaces. Stacktrace >>> { >>> "type": "ValueError", >>> "value": "My exception value", >>> "module": "__builtins__" >>> "stacktrace": { >>> # see sentry.interfaces.Stacktrace >>> } >>> }
  26. 26. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Interface - Message ● message (<= 1000) ● params >>> { >>> "message": "My raw message with interpreted strings like %s", >>> "params": ["this"] >>> }
  27. 27. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Interface - HTTP ● 常用的 HTTP 参数 >>> { >>> "url": "http://absolute.uri/foo", >>> "method": "POST", >>> "data": { >>> "foo": "bar" >>> }, >>> "query_string": "hello=world", >>> "cookies": "foo=bar", >>> "headers": { >>> "Content-Type": "text/html" >>> }, >>> "env": { >>> "REMOTE_ADDR": "192.168.0.1" >>> } >>> }
  28. 28. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Interface - Query ● 用于记录 SQL >>> { >>> "query": "SELECT 1" >>> "engine": "psycopg2" >>> }
  29. 29. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Interface - Template ● A rendered template (generally used like a single frame in a stacktrace). ● The attributes ``filename``, ``context_line``, and ``lineno`` are required. >>> { >>> "abs_path": "/real/file/name.html" >>> "filename": "file/name.html", >>> "pre_context": [ >>> "line1", >>> "line2" >>> ], >>> "context_line": "line3", >>> "lineno": 3, >>> "post_context": [ >>> "line4", >>> "line5" >>> ], >>> }
  30. 30. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Interface - User ● 定义一个用户 >>> { >>> "id": "unique_id", >>> "username": "my_user", >>> "email": "foo@example.com" >>> "ip_address": "127.0.0.1", >>> "optional": "value" >>> }
  31. 31. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Interface - Stacktrace ● Python Frame >>> { >>> "frames": [{ >>> "abs_path": "/real/file/name.py" >>> "filename": "file/name.py", >>> "function": "myfunction", >>> "vars": { >>> "key": "value" >>> }, >>> "pre_context": [ >>> "line1", >>> "line2" >>> ], >>> "context_line": "line3", >>> "lineno": 3, >>> "in_app": true, >>> "post_context": [ >>> "line4", >>> "line5" >>> ], >>> }], >>> "frames_omitted": [13, 56] >>> }
  32. 32. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! TSDB - 时序数据库 ● Dummy (none) ● InMemory (defaultdict) ● Redis (hashes) Redis: { "TSDBModel:epoch:shard": { "Key": Count } } # rollups must be ordered from highest granularity to lowest SENTRY_TSDB_ROLLUPS = ( # (time in seconds, samples to keep) (10, 360), # 60 minutes at 10 seconds (3600, 24 * 7), # 7 days at 1 hour (3600 * 24, 60), # 60 days at 1 day )
  33. 33. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! NodeStore - KV 数据库 ● riak ● cassandra ● django (node table) ● 用于和数据一起存储一些特殊信息(比如不 适合存在数据库里的大文本等) ● validate ● create ● delete ● delete_multi ● get ● get_multi ● set ● set_multi ● generate_id ● cleanup
  34. 34. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Example class Event(Model): """ An individual event. """ __core__ = False ... time_spent = BoundedIntegerField(null=True) data = NodeField(blank=True, null=True)
  35. 35. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Cache ● django ○ filesystem ○ memcached ○ local memory ○ dummy ● redis ● set ● get ● delete ● redis: ○ from nydus.db import create_cluster ○ 支持 cluster ○ 重写了 rb,但还没有在已发布的版本里使用
  36. 36. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Example ● Cache 与 Model ○ db/models/manager.py ○ class BaseManager(Manager) ● get_from_cache ● updated by signal ● deleted by signal
  37. 37. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Buffer This is useful in situations where a single event might be happening so fast that the queue can't keep up with the updates. ● InProcess (no buffer) ● Redis ● 降低 MySQL 数据的 QPS (写) ● 支持 Cluster Redis ● Redis 2.6.12 or newer
  38. 38. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Buffer Internal ● 生产者 ● incr ● ● 'b:k:%s:%s' (hashmap, key_expire = 60 * 60 # 1 hour) ○ 'm' ○ 'f' ○ 'l+%s' ○ 'e+%s' ● 'b:p' (Sorted sets) ● 消费者 ● process pending ● process 'flush-buffers': { 'task': 'sentry.tasks.process_buffer.process_pending', 'schedule': timedelta(seconds=10), 'options': { 'expires': 10, 'queue': 'counters-0', } },
  39. 39. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Membership Roles ● Member *:read ● Admin *:write ● Owner *:delete Scoping has access to all teams ● 跟 GitHub 类似的组织结构以及权限控制 ● Organization - Owner, Admin, Member ● Team - (Role, Project) ● Project
  40. 40. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Sensitive Data ● 'password', ● 'secret', ● 'passwd', ● 'authorization', ● 'api_key', ● 'apikey', ● 'access_token', ● DEFAULT_SCRUBBED_FIELDS
  41. 41. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Notifications ● Rules ○ An event is first seen (the first event in a rollup) ○ An event changes state from resolved to unresolved ● State ○ Unresolved ○ Resolved ○ Muted ● Condition ● Action
  42. 42. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Tagging Events ● Event 分类 ● We’ll automatically index all tags for an event, as well as the frequency and the last time a value has been seen. ● TagValue ● GroupTagValue ● Added by buffer
  43. 43. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Rollups & Sampling ● Rollups ○ Raven.captureException(ex, {fingerprint: ['my', 'custom', 'fingerprint']}) ○ Raven.captureException(ex, {fingerprint: ['{{ default }}', 'other', 'data']}) ● Sampling ○ Count ○ Time
  44. 44. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Web Profile ● ?prof=1 ● DEBUG ● super user ● src/sentry/utils/debug.py def can(self, request): if 'prof' not in request.GET: return False if settings.DEBUG: return True if hasattr(request, 'user') and request.user.is_superuser: return True return False
  45. 45. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Sentry to Sentry ● 自举 ● DISABLE_RAVEN ● default: project id == 1 ● src/sentry/utils/raven.py
  46. 46. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! PostgreSQL & Gevent ● psycopg2 ● src/sentry/utils/gevent.py ● Sentry 官方使用的应该是这个数据库,有非阻塞的 patch,支持异步
  47. 47. Sentry @douban 有料 其中充斥着不少嘈点 问题 部署 监控 调优 Tips
  48. 48. “自己解决不了的 问题,不要指望工 具能帮你解决” 乔治@豆瓣
  49. 49. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Monitor
  50. 50. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 豆瓣 ● 应用 ○ Python (大部分) ○ Javascript (前端) ○ Go (少部分) ○ C++/C/Java (少量) ● 错误 ○ devtools (DIY, 已废弃) ○ Sentry
  51. 51. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 问题 ● 已经部署了一套 Sentry ● 5.x 使用 UDP 协议 ● 测试以及线上,有丢错误情况 ● 且比较明显 ● 但是这时还没有针对 Sentry 的监控 ● 开始研究黑盒 ● UDP Worker CPU 使用率比较高 ● UDP 是用 DNS 做负载均衡 ● DNS 因为 cache 问题,导致负载不平衡 ● 使用 Random 改善了 cache 带来的隐患 ● 查看 Worker 代码 ● Gevent / Eventlet 使用错误,没有 Monkey Patch ● 5.x 数据库压力较大,需要做合并写
  52. 52. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 升级 ● 5.x-maint support UDP ● but 7.x not ● We have to backport UDP to 7.x ● 幸好原来的接口还在 ❏ src/sentry/conf/server.py: ❏ #socket.setdefaulttimeout(5) ❏ src/sentry/coreapi.py: ❏ insert_data_to_database_sync (async to sync,Redis 内存放大太厉害,因为 cache 原因)
  53. 53. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Insert Queue ● insert_data_to_database - cache ● preprocess_event - queue ● save_event - queue ● ● insert_data_to_database_sync - queue
  54. 54. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 部署情况 ● HTTP x 4 (默认 Sentry 是用 Gunicorn 管理 Worker 的) ● UDP x 4 (开启了 Gevent,收到包后,扔到队列) ● Celery x 4 (Task consumer, 默认是开启 CPU_NUM 个 Worker) ● Celery Beat x 1 ● Cron:cleanup 21 (只保留 21 天的数据) ● HTTP 前面用 LVS + Nginx 做负载均衡 ● UDP 用 DNS 做负载均衡
  55. 55. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 内部配置 ● LDAP ○ 我们用的用户帐号系统 ○ 配置一下 Sentry 即可 ● MAIL ○ 配置 Sentry 邮件服务器 ● IRC ○ sentry-irc ○ 因为我们使用了 ircbot,稍微改了一下这个插件代码
  56. 56. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Why UDP ● 快 ● 应用不需要关心 Sentry 服务是否正常 ● 即使 Sentry 出问题,也不影响应用 ● 可以观察系统 UDP 丢包情况,判断 UDP 服务是否正常
  57. 57. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Celery ● 芹菜 ● 还没有吃透
  58. 58. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! DBA ● Redis ○ Memory ○ QPS ○ CPU ○ Queue Size ● MySQL ○ QPS ■ update ■ insert ■ delete ■ select ○ thread
  59. 59. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! Sentry ● Statsd ○ celery worker cpu ○ udp worker cpu ○ http worker cpu ● App 内统计 ○ task 执行时间 ○ task 执行数量
  60. 60. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! UDP Received Packet (Sentry) ● udp worker ● 收包后记录 ● d = sock.recvfrom(self.BUF_SIZE) ● statsd.increment(STATSD_KEY_RECV)
  61. 61. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! UDP Received Packet (Kernel) ● UDP Server 收到的包数 (by diamond) ● ~ $ sudo /sbin/iptables -t filter -I INPUT -i lan -p udp --dport 4008 -j ACCEPT ● ~ $ sudo /sbin/iptables -L INPUT 1 -nvx ● 52308183 414772916064 ACCEPT udp -- lan * 0.0.0.0/0 0.0.0.0/0 udp dpt:4008 ● pkts (UDP 完整包数,底层已经处理了分包问题) ● bts
  62. 62. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! UDP Dropped Packets (Kernel) ● cat /proc/net/udp ● sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops ● 41: 00000000:80CE 00000000:0000 07 00000000:00000000 00:00000000 00000000 6561 0 4110825944 2 ffff8809c23e5e40 0 ● cat /proc/net/snmp ● Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors ● Udp: 5416706536 993028 290598311 22725578190 4662160 1318 ● UdpLite: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors ● UdpLite: 0 0 0 0 0 0
  63. 63. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! TIPS ● Webhooks: ○ 默认禁止访问内网 IP, 需要更改一下配置 ● Timezone ○ SENTRY_DEFAULT_TIME_ZONE = 'Asia/Shanghai' 设置用户默认时区 ● Public ○ SENTRY_PUBLIC = False 这个权限有点问题,不要开启 ● Register ○ SENTRY_FEATURES['auth:register'] = False 禁止自己注册
  64. 64. 北京/上海/广州 0xFF Life's pathetic, go Pythonic! 下一步计划 ● 项目错误统计(QPS,Sentry 提供的图还不能满足需求) ● Profile 工具,有助于分析 Worker 瓶颈 (Celery) ● 应对雪崩式错误的处理方案(压测 Sentry) ● 尝试一下 MySQL + Redis + Gevent
  65. 65. Jobs ● 2016 校招 ● always 社招 ● TO: ruby@douban.com ● 当然 python 也可以 ● TO: python@douban.com ● 如果你想试试 js 也可以尝试 ● TO: js@douban.com ● 详情: http://jobs.douban.com
  66. 66. 北京/上海/广州 0xFF Life's pathetic, go Pythonic!

×