Dynomite at Erlang Factory

2,057
-1

Published on

This is my presentation from the 2009 Erlang Factory conference about Dynomite, a distributed key value store.

Published in: Technology, News & Politics
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,057
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
40
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide




















































































































































  • Dynomite at Erlang Factory

    1. 1. Dynomite Yet Another Distributed Key Value Store
    2. 2. @moonpolysoft - questions
    3. 3. A Crowded Field • Cassandra • Lightcloud • Memcachedb • Redis • Tokyo Tyrannical Cabinet Device Thing • Voldemort
    4. 4. Who here has written one?
    5. 5. Alpha - 0.6.0
    6. 6. Focus on Distribution
    7. 7. Focus on Performance
    8. 8. • Latency • Average ~ 10 ms • Median ~ 5 ms • 99.9% ~ 1 s
    9. 9. • Throughput • R12B ~ 2,000 req/s • R13B ~ 6,500 req/s
    10. 10. At Powerset • 12 machine production cluster • ~6 million images + metadata • ~2 TB of data including replicas • ~139KB average size
    11. 11. Production? • Data you can afford to lose • Compatibility will break • Migration will be provided
    12. 12. The Constants
    13. 13. Consistency Throughput Durability N – Replication
    14. 14. Max Replicas per Partition
    15. 15. Latency Consistency R – Read Quorum
    16. 16. Minimum participation for read
    17. 17. Latency Durability W – Write Quorum
    18. 18. Minimum participation for write
    19. 19. Throughput Scalability Q – Partitioning
    20. 20. Partitions = 2Q
    21. 21. QNRW – Defines your cluster
    22. 22. At Powerset • Batch writes • Online reads • Wikipedia pages are obese
    23. 23. Q – 6 N–3 R–1 W–2
    24. 24. (dynomite@galva)2> mediator:put( quot;prefs:user:mervquot;, undefined, term_to_binary( [{safe_search, false}, {results, 50}])).
    25. 25. (dynomite@galva)2> mediator:put( quot;prefs:user:mervquot;, undefined, term_to_binary( [{safe_search, false}, {results, 50}])).
    26. 26. (dynomite@galva)2> mediator:put( quot;prefs:user:mervquot;, undefined, term_to_binary( [{safe_search, false}, {results, 50}])).
    27. 27. (dynomite@galva)2> mediator:put( quot;prefs:user:mervquot;, undefined, term_to_binary( [{safe_search, false}, {results, 50}])).
    28. 28. Value is always Binary
    29. 29. (dynomite@galva)2> mediator:put( quot;prefs:user:mervquot;, undefined, term_to_binary( [{safe_search, false}, {results, 50}])).
    30. 30. (dynomite@galva)1> {ok, {Context, [Bin]}} = mediator:get(quot;prefs:user:mervquot;). {ok,{[{quot;<0.630.0>quot;,1240543271.698089}], [<<131,108,0,0,0,2,...>>]}} (dynomite@galva)2> Terms = binary_to_term(Bin). [{safe_search,false},{results,50}]
    31. 31. (dynomite@galva)1> {ok, {Context, [Bin]}} = mediator:get(quot;prefs:user:mervquot;). {ok,{[{quot;<0.630.0>quot;,1240543271.698089}], [<<131,108,0,0,0,2,...>>]}} (dynomite@galva)2> Terms = binary_to_term(Bin). [{safe_search,false},{results,50}]
    32. 32. mediator:delete/1 mediator:has_key/1
    33. 33. Up In Dem Gutz
    34. 34. Client Protocols • Native Erlang Messages • Thrift • Protocol Buffers • Ascii Protocol
    35. 35. Mediator
    36. 36. N – 3 ;R – 2
    37. 37. pmap(Fun, List, ReturnNum) -> N = if ReturnNum > length(List) -> length(List); true -> ReturnNum end, SuperParent = self(), SuperRef = erlang:make_ref(), Ref = erlang:make_ref(), %% we spawn an intermediary to collect the results %% this is so that there will be no leaked messages sitting in our mailbox Parent = spawn(fun() -> L = gather(N, length(List), Ref, []), SuperParent ! {SuperRef, pmap_sort(List, L)} end), Pids = [spawn(fun() -> Parent ! {Ref, {Elem, (catch Fun(Elem))}} end) || Elem <- List], Ret = receive {SuperRef, Ret} -> Ret end, % i think we need to cleanup here. lists:foreach(fun(P) -> exit(P, die) end, Pids),
    38. 38. Membership
    39. 39. Handles Nodes Joining
    40. 40. membership:nodes() =/= nodes().
    41. 41. • Receive join notification • Recalculate partition to node mapping • Start bootstrap routines • Stops old local storage servers • Starts new local storage servers • Install the new partition table
    42. 42. Maps Partitions To Nodes
    43. 43. Gossip Girl
    44. 44. • Gossip servers randomly wake up • Pick another membership server • Merge membership tables • Versioned by vector clocks
    45. 45. gossip_loop(Server) -> #membership{nodes=Nodes,node=Node} = gen_server:call(Server, state), case lists:delete(Node, Nodes) of [] -> ok; % no other nodes Nodes1 when is_list(Nodes1) -> fire_gossip(random_node(Nodes1)) end, SleepTime = random:uniform(5000) + 5000, receive stop -> gossip_paused(Server); _Val -> ok after SleepTime -> ok end, gossip_loop(Server).
    46. 46. Storage Server
    47. 47. Merkle Trees
    48. 48. Merkle Trees
    49. 49. O(log n)
    50. 50. • Implemented on disk as a B-Tree • Includes buddy-block allocator for key space • Extremely gnarly code
    51. 51. Minimal Transfer
    52. 52. Sync Server
    53. 53. Randomly Wakes Up
    54. 54. Choses two replicas
    55. 55. Transfers Merkle Diff
    56. 56. [cli@yourmom ~]$ dynomite console (dynomite@yourmom)1 sync_manager:running(). [{3623878657,'dynomite@yourmom','dynomite@yourd ad'}, {3892314113,'dynomite@yourmom','dynomite@cableg uy'},
    57. 57. Problem Areas
    58. 58. Membership
    59. 59. Problems • New partitions online before they are ready • A priori knowledge of what is running • Possible to dilute a cluster into uselessness • Migrations are tremendously painful
    60. 60. Direction • Storage servers rally with local membership server • Use monitors on local storage • Alerts for dangerous situations
    61. 61. Merkle
    62. 62. Erlang is good at many things
    63. 63. Low level storage ain’t one
    64. 64. Direction • Rewrite Dmerkle storage in C • Use async driver pool • Build in more crash recovery to the format
    65. 65. Web Console
    66. 66. Direction • More detailed visualizations of cluster health • Perform “safe” ops tasks
    67. 67. And other things I haven’t thought of
    68. 68. Demo!
    69. 69. Spare a patch, friend?
    70. 70. • http://github.com/cliffmoon/dynomite • http://wiki.github.com/cliffmoon/dynomite
    71. 71. Q and or A

    ×