Your SlideShare is downloading. ×
0
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Dynomite at Erlang Factory
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Dynomite at Erlang Factory

1,977

Published on

This is my presentation from the 2009 Erlang Factory conference about Dynomite, a distributed key value store.

This is my presentation from the 2009 Erlang Factory conference about Dynomite, a distributed key value store.

Published in: Technology, News & Politics
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,977
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
39
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide




















































































































































  • Transcript

    • 1. Dynomite Yet Another Distributed Key Value Store
    • 2. @moonpolysoft - questions
    • 3. A Crowded Field • Cassandra • Lightcloud • Memcachedb • Redis • Tokyo Tyrannical Cabinet Device Thing • Voldemort
    • 4. Who here has written one?
    • 5. Alpha - 0.6.0
    • 6. Focus on Distribution
    • 7. Focus on Performance
    • 8. • Latency • Average ~ 10 ms • Median ~ 5 ms • 99.9% ~ 1 s
    • 9. • Throughput • R12B ~ 2,000 req/s • R13B ~ 6,500 req/s
    • 10. At Powerset • 12 machine production cluster • ~6 million images + metadata • ~2 TB of data including replicas • ~139KB average size
    • 11. Production? • Data you can afford to lose • Compatibility will break • Migration will be provided
    • 12. The Constants
    • 13. Consistency Throughput Durability N – Replication
    • 14. Max Replicas per Partition
    • 15. Latency Consistency R – Read Quorum
    • 16. Minimum participation for read
    • 17. Latency Durability W – Write Quorum
    • 18. Minimum participation for write
    • 19. Throughput Scalability Q – Partitioning
    • 20. Partitions = 2Q
    • 21. QNRW – Defines your cluster
    • 22. At Powerset • Batch writes • Online reads • Wikipedia pages are obese
    • 23. Q – 6 N–3 R–1 W–2
    • 24. (dynomite@galva)2> mediator:put( quot;prefs:user:mervquot;, undefined, term_to_binary( [{safe_search, false}, {results, 50}])).
    • 25. (dynomite@galva)2> mediator:put( quot;prefs:user:mervquot;, undefined, term_to_binary( [{safe_search, false}, {results, 50}])).
    • 26. (dynomite@galva)2> mediator:put( quot;prefs:user:mervquot;, undefined, term_to_binary( [{safe_search, false}, {results, 50}])).
    • 27. (dynomite@galva)2> mediator:put( quot;prefs:user:mervquot;, undefined, term_to_binary( [{safe_search, false}, {results, 50}])).
    • 28. Value is always Binary
    • 29. (dynomite@galva)2> mediator:put( quot;prefs:user:mervquot;, undefined, term_to_binary( [{safe_search, false}, {results, 50}])).
    • 30. (dynomite@galva)1> {ok, {Context, [Bin]}} = mediator:get(quot;prefs:user:mervquot;). {ok,{[{quot;<0.630.0>quot;,1240543271.698089}], [<<131,108,0,0,0,2,...>>]}} (dynomite@galva)2> Terms = binary_to_term(Bin). [{safe_search,false},{results,50}]
    • 31. (dynomite@galva)1> {ok, {Context, [Bin]}} = mediator:get(quot;prefs:user:mervquot;). {ok,{[{quot;<0.630.0>quot;,1240543271.698089}], [<<131,108,0,0,0,2,...>>]}} (dynomite@galva)2> Terms = binary_to_term(Bin). [{safe_search,false},{results,50}]
    • 32. mediator:delete/1 mediator:has_key/1
    • 33. Up In Dem Gutz
    • 34. Client Protocols • Native Erlang Messages • Thrift • Protocol Buffers • Ascii Protocol
    • 35. Mediator
    • 36. N – 3 ;R – 2
    • 37. pmap(Fun, List, ReturnNum) -> N = if ReturnNum > length(List) -> length(List); true -> ReturnNum end, SuperParent = self(), SuperRef = erlang:make_ref(), Ref = erlang:make_ref(), %% we spawn an intermediary to collect the results %% this is so that there will be no leaked messages sitting in our mailbox Parent = spawn(fun() -> L = gather(N, length(List), Ref, []), SuperParent ! {SuperRef, pmap_sort(List, L)} end), Pids = [spawn(fun() -> Parent ! {Ref, {Elem, (catch Fun(Elem))}} end) || Elem <- List], Ret = receive {SuperRef, Ret} -> Ret end, % i think we need to cleanup here. lists:foreach(fun(P) -> exit(P, die) end, Pids),
    • 38. Membership
    • 39. Handles Nodes Joining
    • 40. membership:nodes() =/= nodes().
    • 41. • Receive join notification • Recalculate partition to node mapping • Start bootstrap routines • Stops old local storage servers • Starts new local storage servers • Install the new partition table
    • 42. Maps Partitions To Nodes
    • 43. Gossip Girl
    • 44. • Gossip servers randomly wake up • Pick another membership server • Merge membership tables • Versioned by vector clocks
    • 45. gossip_loop(Server) -> #membership{nodes=Nodes,node=Node} = gen_server:call(Server, state), case lists:delete(Node, Nodes) of [] -> ok; % no other nodes Nodes1 when is_list(Nodes1) -> fire_gossip(random_node(Nodes1)) end, SleepTime = random:uniform(5000) + 5000, receive stop -> gossip_paused(Server); _Val -> ok after SleepTime -> ok end, gossip_loop(Server).
    • 46. Storage Server
    • 47. Merkle Trees
    • 48. Merkle Trees
    • 49. O(log n)
    • 50. • Implemented on disk as a B-Tree • Includes buddy-block allocator for key space • Extremely gnarly code
    • 51. Minimal Transfer
    • 52. Sync Server
    • 53. Randomly Wakes Up
    • 54. Choses two replicas
    • 55. Transfers Merkle Diff
    • 56. [cli@yourmom ~]$ dynomite console (dynomite@yourmom)1 sync_manager:running(). [{3623878657,'dynomite@yourmom','dynomite@yourd ad'}, {3892314113,'dynomite@yourmom','dynomite@cableg uy'},
    • 57. Problem Areas
    • 58. Membership
    • 59. Problems • New partitions online before they are ready • A priori knowledge of what is running • Possible to dilute a cluster into uselessness • Migrations are tremendously painful
    • 60. Direction • Storage servers rally with local membership server • Use monitors on local storage • Alerts for dangerous situations
    • 61. Merkle
    • 62. Erlang is good at many things
    • 63. Low level storage ain’t one
    • 64. Direction • Rewrite Dmerkle storage in C • Use async driver pool • Build in more crash recovery to the format
    • 65. Web Console
    • 66. Direction • More detailed visualizations of cluster health • Perform “safe” ops tasks
    • 67. And other things I haven’t thought of
    • 68. Demo!
    • 69. Spare a patch, friend?
    • 70. • http://github.com/cliffmoon/dynomite • http://wiki.github.com/cliffmoon/dynomite
    • 71. Q and or A

    ×