Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Distributed unique id generation

393 views

Published on

Distributed unique id generation

Published in: Technology
  • Be the first to comment

Distributed unique id generation

  1. 1. Distributed Unique ID generation Septeni TechTalk 30/11/17 TungNT
  2. 2. What’s Distributed system ID NEED? ● Guaranteed uniqueness, across all nodes ● Fast & Highly Available ○ no coordinate between nodes ● Compact ● Optional: ○ Sortable ○ Indexable ○ No common storage 2
  3. 3. Existing solutions 3
  4. 4. DB Auto-increment ● Pros: ○ Simple ○ Fast ○ Compact ● Cons: ○ Aren’t suitable for distributed system ○ Not DDD/OOP approach friendly ○ Information disclosure ○ Not avaiable in some DBMS (Ex: Cassandra, Oracle...) 4
  5. 5. DB Ticket Servers ● Two ticket DBs (one on odd numbers, the other on even) to avoid a single point of failure. ○ Flickr uses this approach ● Pros: ○ Same as DB-autoincrement ○ Distributed ● Cons: ○ Can eventually become a write bottleneck ○ An additional couple of machines 5
  6. 6. Universally Unique IDentifier (UUID) ● Versions: ○ Version 1 (date-time and MAC address) ○ Version 2 (date-time and MAC Address, DCE security version) ○ Versions 3 and 5 (namespace name-based with MD5 hash/SHA-1 hash) ○ Version 4 (random) ● Pros: ○ Easy to use ○ Guaranteed uniqueness for all nodes ○ DBMS independent ● Cons: ○ Need larger storage (128bit) ○ Hard to indexing or ordering ○ Not human-friendly 6
  7. 7. Twitter’s Snowflake 7
  8. 8. Twitter’s Snowflake ● Fast ○ Uncoordinated (after startup) ○ Minimum 10k ids per second per process ○ Response rate 2ms (plus network latency) ● Compact ○ 64 bits ● Roughly-sorted (K-ordered) ● Distributed 8
  9. 9. Twitter’s Snowflake ● The Id is composed of: ○ Timestamp - 41 bits: ■ millisecond precision w/ a custom epoch gives us 69 years ○ Sharding - 10 bits: ■ configured machine id - gives us up to 1024 machines ○ Sequence number - 12 bits: ■ rolls over every 4096 per machine (with protection to avoid rollover in the same ms) 9
  10. 10. Disclaimers ● Some programming languages such as Javascript cannot support numbers with > 53-bits. ○ Ex: 10
  11. 11. Twitter’s Snowflake Drawback ● Not suitable for small or non-distributed system ○ Hard to embeddable ○ Need to add some component to your system’s architecture ■ Apache Zookeeper ■ Apache Thrift 11
  12. 12. SepTech’s Snowflake4S 12
  13. 13. SepTech’s Snowflake4S 13 ● Inspired by Twitter Snowflake ● Using the same encoded ID format as Twitter Snowflake ● Decentralized ● Easily embeddable
  14. 14. SepTech’s Snowflake4S ● Extendable ○ you can add your custom algorithm ● Bulk generation support ● Easy to use 14
  15. 15. references ● https://blog.twitter.com/engineering/en_us/a/2010/announcing- snowflake.html ● http://code.flickr.net/2010/02/08/ticket-servers-distributed-unique- primary-keys-on-the-cheap/ ● https://engineering.instagram.com/sharding-ids-at-instagram-1cf5a71e5a5c/ ● https://tools.ietf.org/html/rfc4122/ ● https://en.wikipedia.org/wiki/Linear_congruential_generator ● K-sorted mathematical terms: http://ci.nii.ac.jp/naid/110002673489/ 15
  16. 16. Thank You! 16

×