ID generation



PHP London 2012-08-02
@davegardnerisme
@davegardnerisme




hailoapp.com/dave
(for a £5 discount)
MySQL auto increment



                                    DC 1



               1,2,3,4…
       MySQL              Web App
MySQL auto increment


   • Numeric IDs
   • Go up with time
   • Not resilient
MySQL multi-master replication



                                     DC 1

                1,3,5,7…
        MySQL
                           Web App
                2,4,6,8…
        MySQL
MySQL multi-master replication


   • Numeric IDs
   • Do not go up with time
   • Some resilience
DC 1

DC 2   DC 4
                       DC 5




                              DC 6
         DC 3




                 Going global…
MySQL in multi DC setup

                            DC 1

           1,2,3…   Web
   MySQL
                    App
                                   WAN LINK


                     DC 2




                            ?                 Web
                                              App
Flickr MySQL ticket server

                             DC 1


   Ticket   1,3,5…   Web                 WAN link not required to
   Server            App                          generate an ID

                                      WAN LINK


                      DC 2


                             Ticket   4,6,8…     Web
                             Server              App
Flickr MySQL ticket server


    • Numeric IDs
    • Do not go up with time
    • Resilient and distributed
    • ID generation separated from
      data store
The anatomy of a ticket server

                                        DC


    Web      Web            Web   Web
    App      App            App   App




                   Ticket
                   Server
Making things simpler



                                          DC


    Web      Web        Web      Web
    App      App        App      App

   ID gen    ID gen     ID gen   ID gen
UUIDs


   • 128 bits
   • Could use type 4 (Random) or
     type 1 (MAC address with time
     component)
   • Can generate on each machine
     with no co-ordination
Type 4 – random


                             version

 xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx
        variant (8, 9, A or B)



 f47ac10b-58cc-4372-a567-0e02b2c3d479
5.3 x          1036


possible values for a type 4 UUID
1.1 x        1019


UUIDs we could generate per second
     since the Universe began
2.1 x          1027


Olympic swimming pools filled if each
possible value contributed a millilitre
Type 1 – MAC address


 51063800-dc76-11e1-9fae-001c42000009


   • Time component is based on 100
     nanosecond intervals since
     October 15, 1582
   • Most significant bits of timestamp
     shifted to least significant bits of
     UUID
Type 1 – MAC address


   • The address (MAC) of the
     computer that generated the ID is
     encoded into it
   • Lexical ordering essentially
     meaningless
   • Deterministically unique
There are some other options…
No co-ordination needed


Deterministically unique


K-ordered (time-ordered
       lexically)
Twitter Snowflake


   • Under 64 bits
   • No co-ordination (after startup)
   • K-ordered
   • Scala service, Thrift interface,
     uses Zookeeper for configuration
Twitter Snowflake


   41 bits   Timestamp
             millisecond precision, bespoke epoch

   10 bits   Configured machine ID
   12 bits   Sequence number
Twitter Snowflake


   77669839702851584


   = (timestamp << 22)
    | (machine << 12)
    | sequence
Boundary Flake


   • 128 bits
   • No co-ordination at all
   • K-ordered
   • Erlang service
Boundary Flake


   64 bits   Timestamp
             millisecond precision, 1970 epoch

   48 bits   MAC address
   16 bits   Sequence number
PHP Cruftflake


   • Based on Twitter Snowflake
   • No co-ordination (after startup)
   • K-ordered
   • PHP, ZeroMQ interface, uses
     Zookeeper for configuration
Questions?
References
Flickr distributed ticket server
http://code.flickr.com/blog/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-
the-cheap/

UUIDs
http://tools.ietf.org/html/rfc4122

How random are random UUIDs?
http://stackoverflow.com/a/2514722/15318

Twitter Snowflake
https://github.com/twitter/snowflake

Boundary Flake
https://github.com/boundary/flake

PHP Cruftflake
https://github.com/davegardnerisme/cruftflake
private function mintId64($timestamp, $machine, $sequence)
{
    $timestamp = (int)$timestamp;
    $value = ($timestamp << 22) | ($machine << 12) | $sequence;
    return (string)$value;
}

private function mintId32($timestamp, $machine, $sequence)
{
    $hi = (int)($timestamp / pow(2,10));
    $lo = (int)($timestamp * pow(2, 22));

    // stick in the machine + sequence to the low bit
    $lo = $lo | ($machine << 12) | $sequence;

    // reconstruct into a string of numbers
    $hex = pack('N2', $hi, $lo);
    $unpacked = unpack('H*', $hex);
    $value = $this->hexdec($unpacked[1]);
    return (string)$value;
}
public function generate()
{
    $t = floor($this->timer->getUnixTimestamp()
             - $this->epoch);
    if ($t !== $this->lastTime) {
        $this->sequence = 0;
        $this->lastTime = $t;
    } else {
        $this->sequence++;
        if ($this->sequence > 4095) {
             throw new OverflowException('Sequence overflow');
        }
    }

    if (PHP_INT_SIZE === 4) {
        return $this->mintId32($t, $this->machine,
             $this->sequence);
    } else {
        return $this->mintId64($t, $this->machine,
             $this->sequence);
    }
}

Unique ID generation in distributed systems

  • 1.
    ID generation PHP London2012-08-02 @davegardnerisme
  • 2.
  • 3.
    MySQL auto increment DC 1 1,2,3,4… MySQL Web App
  • 4.
    MySQL auto increment • Numeric IDs • Go up with time • Not resilient
  • 5.
    MySQL multi-master replication DC 1 1,3,5,7… MySQL Web App 2,4,6,8… MySQL
  • 6.
    MySQL multi-master replication • Numeric IDs • Do not go up with time • Some resilience
  • 7.
    DC 1 DC 2 DC 4 DC 5 DC 6 DC 3 Going global…
  • 8.
    MySQL in multiDC setup DC 1 1,2,3… Web MySQL App WAN LINK DC 2 ? Web App
  • 9.
    Flickr MySQL ticketserver DC 1 Ticket 1,3,5… Web WAN link not required to Server App generate an ID WAN LINK DC 2 Ticket 4,6,8… Web Server App
  • 10.
    Flickr MySQL ticketserver • Numeric IDs • Do not go up with time • Resilient and distributed • ID generation separated from data store
  • 11.
    The anatomy ofa ticket server DC Web Web Web Web App App App App Ticket Server
  • 12.
    Making things simpler DC Web Web Web Web App App App App ID gen ID gen ID gen ID gen
  • 13.
    UUIDs • 128 bits • Could use type 4 (Random) or type 1 (MAC address with time component) • Can generate on each machine with no co-ordination
  • 14.
    Type 4 –random version xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx variant (8, 9, A or B) f47ac10b-58cc-4372-a567-0e02b2c3d479
  • 15.
    5.3 x 1036 possible values for a type 4 UUID
  • 16.
    1.1 x 1019 UUIDs we could generate per second since the Universe began
  • 17.
    2.1 x 1027 Olympic swimming pools filled if each possible value contributed a millilitre
  • 18.
    Type 1 –MAC address 51063800-dc76-11e1-9fae-001c42000009 • Time component is based on 100 nanosecond intervals since October 15, 1582 • Most significant bits of timestamp shifted to least significant bits of UUID
  • 19.
    Type 1 –MAC address • The address (MAC) of the computer that generated the ID is encoded into it • Lexical ordering essentially meaningless • Deterministically unique
  • 20.
    There are someother options…
  • 21.
    No co-ordination needed Deterministicallyunique K-ordered (time-ordered lexically)
  • 22.
    Twitter Snowflake • Under 64 bits • No co-ordination (after startup) • K-ordered • Scala service, Thrift interface, uses Zookeeper for configuration
  • 23.
    Twitter Snowflake 41 bits Timestamp millisecond precision, bespoke epoch 10 bits Configured machine ID 12 bits Sequence number
  • 24.
    Twitter Snowflake 77669839702851584 = (timestamp << 22) | (machine << 12) | sequence
  • 25.
    Boundary Flake • 128 bits • No co-ordination at all • K-ordered • Erlang service
  • 26.
    Boundary Flake 64 bits Timestamp millisecond precision, 1970 epoch 48 bits MAC address 16 bits Sequence number
  • 27.
    PHP Cruftflake • Based on Twitter Snowflake • No co-ordination (after startup) • K-ordered • PHP, ZeroMQ interface, uses Zookeeper for configuration
  • 28.
  • 29.
    References Flickr distributed ticketserver http://code.flickr.com/blog/2010/02/08/ticket-servers-distributed-unique-primary-keys-on- the-cheap/ UUIDs http://tools.ietf.org/html/rfc4122 How random are random UUIDs? http://stackoverflow.com/a/2514722/15318 Twitter Snowflake https://github.com/twitter/snowflake Boundary Flake https://github.com/boundary/flake PHP Cruftflake https://github.com/davegardnerisme/cruftflake
  • 30.
    private function mintId64($timestamp,$machine, $sequence) { $timestamp = (int)$timestamp; $value = ($timestamp << 22) | ($machine << 12) | $sequence; return (string)$value; } private function mintId32($timestamp, $machine, $sequence) { $hi = (int)($timestamp / pow(2,10)); $lo = (int)($timestamp * pow(2, 22)); // stick in the machine + sequence to the low bit $lo = $lo | ($machine << 12) | $sequence; // reconstruct into a string of numbers $hex = pack('N2', $hi, $lo); $unpacked = unpack('H*', $hex); $value = $this->hexdec($unpacked[1]); return (string)$value; }
  • 31.
    public function generate() { $t = floor($this->timer->getUnixTimestamp() - $this->epoch); if ($t !== $this->lastTime) { $this->sequence = 0; $this->lastTime = $t; } else { $this->sequence++; if ($this->sequence > 4095) { throw new OverflowException('Sequence overflow'); } } if (PHP_INT_SIZE === 4) { return $this->mintId32($t, $this->machine, $this->sequence); } else { return $this->mintId64($t, $this->machine, $this->sequence); } }