Successfully reported this slideshow.

WebCamp Ukraine 2016: Instant messenger with Python. Back-end development

1

Share

1 of 45
1 of 45

WebCamp Ukraine 2016: Instant messenger with Python. Back-end development

1

Share

Download to read offline

Lessons learnt by the development of Instant Messenger with Python 2 and Twisted.

About the project:
* 100 000+ connected users
* 100+ nodes
* REST API for integration

Expected audience: developers who are new to development of Instant Messenger or folks who develop a system from scratch.
All the stuff based on own production experience.

Lessons learnt by the development of Instant Messenger with Python 2 and Twisted.

About the project:
* 100 000+ connected users
* 100+ nodes
* REST API for integration

Expected audience: developers who are new to development of Instant Messenger or folks who develop a system from scratch.
All the stuff based on own production experience.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

WebCamp Ukraine 2016: Instant messenger with Python. Back-end development

  1. 1. Instant messenger with Python Back-end development Viacheslav Kakovskyi WebCamp 2016
  2. 2. Me! @kakovskyi Python Developer at SoftServe Contributor of Atlassian HipChat — Python 2, Twisted Maintainer of KPIdata — Python 3, asyncio 2
  3. 3. Agenda ● What is 'instant messenger'? ● Related projects from my experience ● Messaging protocols ● Life of messaging platform ● Lessons learned ● Summary ● Further reading 3
  4. 4. What is 'instant messenger'? 4
  5. 5. What is 'instant messenger'? ● online chat ● real-time delivery ● short messages 5
  6. 6. What is 'instant messenger'? ● history search ● file sharing ● mobile push notifications ● video calling ● bots and integrations 6
  7. 7. Related projects from my experience ● Hosted chat for teams and enterprises ● Founded in 2009 by 3 students ● 100 000+ connected users ● 100+ nodes ● REST API for integrations and bots ● Built with Python 2 and Twisted 7
  8. 8. Messaging protocols Protocol is about: ● Message format ● Allowed types of messages ● Limitations ● Routine ○ How to encode data? ○ How to establish/close connection? ○ How to authenticate? ○ How to encrypt? 8
  9. 9. Messaging protocols ● OSCAR (1997) ● XMPP (1999) ● Skype (2003) ● WebSocket-based (2011) ● MQTT, MTProto, DHT-based, etc. 9
  10. 10. XMPP ● XMPP - signaling protocol ● BOSH - transport protocol ● Started from Jabber in 1999 ● XML as a message format ● Stanza - basic unit in XMPP ● Types of stanzas: ○ Message ○ Presence ○ Info/Query 10
  11. 11. XMPP ● Extensions defined by XEPs (XMPP Extension Protocols): ○ Bidirectional-streams Over Synchronous HTTP (BOSH) ○ Serverless messaging ○ File transfer and etc. 11
  12. 12. XMPP: Establishing a connection 12 Client: <?xml version='1.0'?> <stream:stream to='example.com' xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams' version='1.0'> Server: <?xml version='1.0'?> <stream:stream from='example.com' id='someid' xmlns='jabber:client' xmlns:stream='http://etherx.jabber. org/streams' version='1.0'>
  13. 13. XMPP: Sending a message 13 Client: <message from='juliet@example.com' to='romeo@example.net' xml:lang='en'> <body>Art thou not Romeo, and a Montague?</body> </message> Server: <message from='romeo@example.net' to='juliet@example.com' xml: lang='en'> <body> Neither, fair saint, if either thee dislike. </body> </message>
  14. 14. XMPP: Closing a connection 14 Client: </stream:stream> Server: </stream:stream>
  15. 15. XMPP: Pros ● Robust and standardized ● Extendable via XEPs ● Secured ● Native support of multi-sessions ● A lot of clients implementations 15
  16. 16. XMPP: Cons ● Overhead ○ Presence ○ Downloading the World on startup ● XML ○ Large documents ○ Expensive parsing 16
  17. 17. XMPP and Python ● Servers: ○ TwistedWords - good place to start ○ Tornado-based example ○ aioxmpp ○ XMPPFlask ○ Punjab - BOSH-server on Twisted 17
  18. 18. XMPP and Python ● Clients: ○ SleekXMPP - mature and solid ○ Slixmpp - asyncio-support ○ TwistedWords ○ Wokkel - Twisted-based ○ xmpp.py ● JS-client: Strophe.js 18
  19. 19. WebSocket-based solutions ● WebSocket - transport protocol ● Standardized in 2011 by W3C ● Full-duplex communication channel ● JSON as a message format ● Custom message types 19
  20. 20. WebSocket: Establishing a connection 20 Client: GET /chat HTTP/1.1 Host: server.example.com Upgrade: websocket Connection: Upgrade Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ== Origin: http://example.com Sec-WebSocket-Protocol: chat, superchat Sec-WebSocket-Version: 13 Server: HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
  21. 21. WebSocket: Sending a message 21 Client: { "type": "message", "ts": 1469563519, "user": "kakovskyi", "text": "Hello, @WebCamp!" } Server: { "type": "notification", "ts": 1469563519, "user": "WebCamp Bot", "text": "Howdy @kakovskyi?" }
  22. 22. WebSocket: Closing a connection 22 Client: 0x8 Server: 0x8
  23. 23. WebSocket: Pros ● Supported by majority of browsers ● Low latency ● Small bandwidth ● Easy to start development 23
  24. 24. WebSocket: Cons ● Needs development of signaling protocol ● Timeouts/reconnections should be additionally handled 24
  25. 25. WebSocket and Python ● Servers: ○ Autobahn - Twisted and asyncio implementations ○ aiohttp ○ Tornado ○ Flask-SocketIO ○ Flask-Sockets 25
  26. 26. WebSocket and Python ● Clients: ○ Autobahn ○ aiohttp ○ Tornado-based example ○ Vanilla websocket-client ● JS-client: SocketIO 26
  27. 27. Life of messaging platform ● Authentication ● Access control checks ● Delivery ○ Messages ○ User's presence ○ Push notifications ● History retrieval ● History search 27
  28. 28. Life of messaging platform ● Parsing ○ Protocol ○ Message content ● Dealing with file uploads ○ Security checks ○ Thumbnails distribution ● Multi-session support ● Reconnection handling ● Rate-limiting 28
  29. 29. Life of messaging platform ● Server keeps connections open for every client ● High amount of long-lived concurrent connections ● Multithreaded approach isn't efficient due to overhead ● Requires usage of a select implementation on backend: ○ poll ○ epoll ○ kqueue ● Usage of asynchronous Python frameworks is preferred for high loaded solutions 29
  30. 30. Life of messaging platform ● Authentication ○ OAuth2 ○ Run encryption operations in a separate Python thread ○ Cache users identities with Redis/Memcached ● Access-control checks ○ Make the checks lightweight and cheap ○ Raise an exception when operation isn't permitted 30 EAFP: Easier to ask for forgiveness than permission
  31. 31. Delivery ● Make message delivery fault-tolerant ● Limit size of a message ● Filter content of messages: ○ Users like to send chars that break all the things ● Reduce presence traffic, it could be a bottleneck for large chats ● Use asynchronous broker for delivery when a user is offline (email or push) ○ Celery ○ RQ ○ Amazon Simple Queue Service ○ Huey 31
  32. 32. Life of messaging platform ● Push notifications ■ Vendors ● Amazon SNS ● APNS ● Google Cloud Messaging ● Firebase Cloud Messaging ■ Python tools ● PyAPNs ● Python-GCM ● Pusher ● Be careful with device registration ● Make delivery of pushes fault-tolerant 32
  33. 33. History retrieval ● Return last messages for every chat instantly ○ Use double writes ■ In-memory queue only for last messages ■ Persistent storage for all the things ● Majority of history retrievals is for the last days ○ Let's optimize the case ● Index messages by date 33
  34. 34. History search ● ElasticSearch is the default solution for full- text search ● @a_soldatenko: What is the best full text search engine for Python? ● Add timing for search requests 34
  35. 35. Parsing ● Protocol ○ Avoid to use Pure Python parsers ■ ujson ■ lxml ○ Run benchmarks against your typical cases ● Message content ○ Be careful with regular expressions ■ re2 ■ pyre2 ○ Alternative parsers in Python 35
  36. 36. Dealing with file uploads ● Security checks ○ File upload vulnerabilities ○ Image upload ■ Decompression bomb ■ Other vulnerabilities with Pillow ○ Amazon S3 as file storage ■ boto ■ aiobotocore ■ botornado ● Thumbnails distribution ○ Delegate that to S3 ○ Requested by a client even if not needed 36
  37. 37. Life of messaging platform ● Multi-session support ○ Set expiration time ○ Be ready to handle up to 4x sessions per user simultaneously ■ Desktop ■ Mobile ■ Tablet ■ Laptop ● Reconnection handling ○ Spin a proxy layer between messaging server and clients ● Rate-limiting ○ Limit amount of operations per user/group for heavy stuff ○ Leaky bucket ○ Throttling 37
  38. 38. Lessons learned ● Bursty traffic ○ Load testing is a must, but not always enough ■ Locust ■ Yandex Tank ● Reconnect storm could be a big deal ○ We should handle that on platform and client-side ● AWS issues make bad customers experience ○ Put nodes in Multi-AZ 38
  39. 39. Lessons learned ● Incidents prevention is cheaper than resolution ○ Grab stats and metrics about your services and storages ■ Redis for per-chat stats ■ StatsD ■ Grafana ○ Be notified when something starts going wrong ■ Elastalert ■ Monit ■ DataDog 39
  40. 40. Lessons learned ● Don't stick with one language/stack ○ Python is great, but for some cases Go, Ruby or PHP are more suitable from product side ○ Avoid business logic duplication in several repos, spin a service and just call the endpoint ● Releasing new features only for certain groups makes product management easier ○ LaunchDarkly 40
  41. 41. Lessons learned ● Don’t F**k the Customer ○ Provide unit/integration tests with every PR ○ Have development environment same as prod ○ Have staging environment same as prod ○ Make deployments fast ○ Rollback faster ○ Have a fallback plan 41
  42. 42. Summary 42
  43. 43. Summary ● Select a messaging protocol which aligns with your needs ● WebSocket + JSON could be the thing for new projects ● Usage of asynchronous frameworks is preferred ● Execute blocking operations in a separate thread ● Collect metrics for common services operations ● Caching saves a lot of time ● Use C or Cython-based solutions for CPU-bound tasks ● Have fast release/deploy/rollback cycle ● Python is great, but don't hesitate to pick other tools 43
  44. 44. Further reading ● How HipChat Stores and Indexes Billions of Messages Using ElasticSearch ● @kakovskyi: Maintaining a high load Python project for newcomers ● HipChat: Important improvements to staging, presence & database storage ● HipChat and the little connection that could ● Elasticsearch at HipChat: 10x faster queries ● Atlassian: How IT and SRE use ChatOps to run incident management ● A Study of Internet Instant Messaging and Chat Protocols ● What Is Async, How Does It Work, And When Should I Use It? ● Leaky Bucket & Tocken Bucket - Traffic shaping ● A guide to analyzing Python performance ● Why Leading Companies Dark Launch - LaunchDarkly Blog ● @bmwant: Asyncio-stack for web development 44
  45. 45. Questions? 45 Viacheslav Kakovskyi viach.kakovskyi@gmail.com @kakovskyi Instant messenger with Python Back-end development

×