import rdma: zero-copy networking with RDMA and Python Andy Grover @groveronline http://groveronline.com http://blogs.orac...
Plan <ul><li>Sockets, RDMA, and RDMA Sockets
Python and performance
Issues I ran into
(Questions anytime.) </li></ul>
Socket Example <ul><li>client: sock.sendto(server, “get data.tgz”)
server: recvfrom() -> (“get data.tgz”)
server: data = open(“data.tgz”).read()
server: sock.sendto(client, “OK “ + data)
client: recvfrom() -> data </li></ul>
Sockets <ul><li>Sending data from server (S) to client (C), how many buffer copies are performed on S? On C?
2 on S, 2 on C
S: read from user buffer, write to kernel buffer by OS
S: read from kernel buffer by HW
C: write to kernel buffer by HW
C: read from kernel buffer, write to user buffer (OS) </li></ul>
So what? <ul><li>Socket interface is easy to use
Extra copy on each side consumes CPU
Also consumes 3x RAM bandwidth!
What do we do???
Direct Data Placement (RDMA) </li></ul>
Upcoming SlideShare
Loading in …5
×

import rdma: zero-copy networking with RDMA and Python

3,285 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,285
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
34
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

import rdma: zero-copy networking with RDMA and Python

  1. 1. import rdma: zero-copy networking with RDMA and Python Andy Grover @groveronline http://groveronline.com http://blogs.oracle.com/linuxnstuff
  2. 2. Plan <ul><li>Sockets, RDMA, and RDMA Sockets
  3. 3. Python and performance
  4. 4. Issues I ran into
  5. 5. (Questions anytime.) </li></ul>
  6. 6. Socket Example <ul><li>client: sock.sendto(server, “get data.tgz”)
  7. 7. server: recvfrom() -> (“get data.tgz”)
  8. 8. server: data = open(“data.tgz”).read()
  9. 9. server: sock.sendto(client, “OK “ + data)
  10. 10. client: recvfrom() -> data </li></ul>
  11. 11. Sockets <ul><li>Sending data from server (S) to client (C), how many buffer copies are performed on S? On C?
  12. 12. 2 on S, 2 on C
  13. 13. S: read from user buffer, write to kernel buffer by OS
  14. 14. S: read from kernel buffer by HW
  15. 15. C: write to kernel buffer by HW
  16. 16. C: read from kernel buffer, write to user buffer (OS) </li></ul>
  17. 17. So what? <ul><li>Socket interface is easy to use
  18. 18. Extra copy on each side consumes CPU
  19. 19. Also consumes 3x RAM bandwidth!
  20. 20. What do we do???
  21. 21. Direct Data Placement (RDMA) </li></ul>
  22. 22. RDMA? <ul><li>Target locks down user memory region and gives sender a key to reference it
  23. 23. Sender tells HW data buffer and key, Target HW uses key to place received data directly in user buffer
  24. 24. Done!
  25. 25. Way complicated </li></ul>
  26. 26. Aside: InfiniBand <ul><li>Cheap and high speed
  27. 27. Supports RDMA
  28. 28. RHEL 5.4+ supports natively </li></ul>
  29. 29. RDMA Sockets (RDS) <ul><li>Reliable Datagram Sockets
  30. 30. Full disclosure: my day job
  31. 31. Guaranteed delivery of datagrams
  32. 32. Allows RDMA ops via sendmsg() and CMSGs
  33. 33. Hides complexity of IB Verbs
  34. 34. Still pretty complex! </li></ul>
  35. 35. What is the simplest possible interface to use RDMA? <ul><li>Let's try Python
  36. 36. Learning opportunity
  37. 37. Can it be done?
  38. 38. Pythonically? </li></ul>
  39. 39. Can Python do efficient networking? <ul><li>Hell yes. Well, pretty sure
  40. 40. Interpreted, but fits in per-CPU cache
  41. 41. Many CPU cores these days
  42. 42. Shared RAM
  43. 43. Cache misses on the data </li></ul>
  44. 44. Implementation Issues.
  45. 45. Python strings <ul><li>Immutable
  46. 46. Buffers shared behind the scenes
  47. 47. Solution: mmap module </li><ul><li>map a file or anonymous memory
  48. 48. sliceable etc. </li></ul></ul>
  49. 49. Python has no pointers <ul><li>We need addresses of things </li><ul><li>To pin it
  50. 50. To map it to the hardware </li></ul><li>Solution: C extension module using “new buffer protocol” added in 2.6 </li></ul>
  51. 51. Python stdlib doesn't support sendmsg/recvmsg <ul><li>WHAT??
  52. 52. Solution: external library, python-eunuchs
  53. 53. Native support RSN </li></ul>
  54. 54. Python AF_RDS support <ul><li>Can't extend socket.socket
  55. 55. Solution: forget inheritance, just implement socket object methods </li></ul>
  56. 56. Implementing RdmaSocket <ul><li>Python as much as possible
  57. 57. ctypes used heavily
  58. 58. C Module solely to return an address </li></ul>
  59. 59. RDMA Socket Example <ul><li>client: m = mmap(-1, 8192)
  60. 60. client: cookie = sock.get_mr(m)
  61. 61. client: sock.sendto(server. “get data.tgz, my cookie is <cookieval>”)
  62. 62. server: recvmsg() -> (“get data.tgz, my cookie is <cookieval>”
  63. 63. server: m = mmap(“data.tgz”)
  64. 64. server: sock.rdma_sendmsg(client, m, cookieval, length, token, “OK”)
  65. 65. client: recvmsg() -> “OK” </li></ul>
  66. 66. OK... <ul><li>Extra overhead not worth it for small sizes
  67. 67. Copied “OK” instead of “OK”+8K, CPU and cache win
  68. 68. It worked! </li></ul>
  69. 69. Future investigations <ul><li>Actual performance data
  70. 70. Dogfood it -- simplify RDS utility apps
  71. 71. RDMA loves async: go Twisted </li></ul>
  72. 72. Summary <ul><li>Sysadmins </li><ul><li>IB is fast, cheap networking, even without RDMA </li></ul><li>DB cluster & Storage cluster developers </li><ul><li>A new tool in the toolbox coming soon, even to Ethernet </li></ul><li>Python coders </li><ul><li>Shared-something could be good if your I/O is good enough </li></ul><li>C coders </li><ul><li>Writing a Python or xyz wrapper is straightforward, and enables a much wider pool of users </li></ul></ul>
  73. 73. Thanks! http://github.com/agrover/python-rds

×