Distributed Unique ID
generation
Septeni TechTalk 30/11/17
TungNT
What’s Distributed system ID NEED?
● Guaranteed uniqueness, across all nodes
● Fast & Highly Available
○ no coordinate between nodes
● Compact
● Optional:
○ Sortable
○ Indexable
○ No common storage
2
Existing solutions
3
DB Auto-increment
● Pros:
○ Simple
○ Fast
○ Compact
● Cons:
○ Aren’t suitable for distributed system
○ Not DDD/OOP approach friendly
○ Information disclosure
○ Not avaiable in some DBMS (Ex: Cassandra, Oracle...)
4
DB Ticket Servers
● Two ticket DBs (one on odd numbers, the other on even) to
avoid a single point of failure.
○ Flickr uses this approach
● Pros:
○ Same as DB-autoincrement
○ Distributed
● Cons:
○ Can eventually become a write bottleneck
○ An additional couple of machines
5
Universally Unique IDentifier (UUID)
● Versions:
○ Version 1 (date-time and MAC address)
○ Version 2 (date-time and MAC Address, DCE security version)
○ Versions 3 and 5 (namespace name-based with MD5 hash/SHA-1 hash)
○ Version 4 (random)
● Pros:
○ Easy to use
○ Guaranteed uniqueness for all nodes
○ DBMS independent
● Cons:
○ Need larger storage (128bit)
○ Hard to indexing or ordering
○ Not human-friendly
6
Twitter’s Snowflake
7
Twitter’s Snowflake
● Fast
○ Uncoordinated (after startup)
○ Minimum 10k ids per second per process
○ Response rate 2ms (plus network latency)
● Compact
○ 64 bits
● Roughly-sorted (K-ordered)
● Distributed
8
Twitter’s Snowflake
● The Id is composed of:
○ Timestamp - 41 bits:
■ millisecond precision w/ a custom epoch gives us 69 years
○ Sharding - 10 bits:
■ configured machine id - gives us up to 1024 machines
○ Sequence number - 12 bits:
■ rolls over every 4096 per machine (with protection to avoid rollover
in the same ms)
9
Disclaimers
● Some programming languages such as Javascript
cannot support numbers with > 53-bits.
○ Ex:
10
Twitter’s Snowflake Drawback
● Not suitable for small or non-distributed system
○ Hard to embeddable
○ Need to add some component to your system’s architecture
■ Apache Zookeeper
■ Apache Thrift
11
SepTech’s Snowflake4S
12
SepTech’s Snowflake4S
13
● Inspired by Twitter Snowflake
● Using the same encoded ID format as Twitter Snowflake
● Decentralized
● Easily embeddable
SepTech’s Snowflake4S
● Extendable
○ you can add your custom algorithm
● Bulk generation support
● Easy to use
14
references
● https://blog.twitter.com/engineering/en_us/a/2010/announcing-
snowflake.html
● http://code.flickr.net/2010/02/08/ticket-servers-distributed-unique-
primary-keys-on-the-cheap/
● https://engineering.instagram.com/sharding-ids-at-instagram-1cf5a71e5a5c/
● https://tools.ietf.org/html/rfc4122/
● https://en.wikipedia.org/wiki/Linear_congruential_generator
● K-sorted mathematical terms: http://ci.nii.ac.jp/naid/110002673489/
15
Thank
You!
16

Distributed unique id generation

Editor's Notes

  • #7 - DCE (Distributed Computing Environment) - Java doesn’t provide the implementation for type 5 - v4 ~> deterministic algorithm using Linear congruential generator
  • #9 NTP networking protocol for clock synchronization supported
  • #12 Apache Zookeeper ~> Distributed Config Management Apache Thrift ~> RPC