What makes a LDAP server
running fast ?
Emmanuel Lécharny
Apache Software Foundation member
Chairman of MINA project
PMC of Apache Directory Project
IKTEK Owner (www.iktek.com)
www.iktek@com, elecharny@iktek.com
Latency numbers
every programmer should know !
(https://gist.github.com/hellerbarde/2843375)

Main memory reference ...................... 100 ns
Read 1 MB sequentially from memory ..... 250,000 ns

= 250 µs

Read 1 MB sequentially from SSD* ..... 1,000,000 ns

=

1 ms

Send 1 MB over 1 Gbps network ....... 10,000,000 ns

=

10 ms

Disk seek ........................... 10,000,000 ns

=

10 ms

Read 1 MB sequentially from disk .... 20,000,000 ns

=

20 ms

3
A request...
Search : From client to client
ASN/1
ASN/1 codec
FROM :
connection.bind( "uid=akarasulu,dc=example,dc=com", "password" );

TO :
0x30, 0x33,
0x02, 0x01, 0x01,
0x60, 0x2E,

// LDAPMessage ::=SEQUENCE {
// messageID MessageID
// CHOICE { ..., bindRequest BindRequest, ...
// BindRequest ::= APPLICATION[0] SEQUENCE {
0x02, 0x01, 0x03,
// version INTEGER (1..127),
0x04, 0x1F,
// name LDAPDN,
'u', 'i', 'd', '=', 'a', 'k', 'a', 'r', 'a', 's', 'u', 'l', 'u', ',', 'd', 'c', '=', 'e', 'x', 'a',
'm', 'p', 'l', 'e', ',', 'd', 'c', '=', 'c', 'o', 'm',
( byte ) 0x80, 0x08,
// authentication AuthenticationChoice
// AuthenticationChoice ::= CHOICE { simple [0] OCTET STRING,
// ...
'p', 'a', 's', 's', 'w', 'o', 'r', 'd'

7
Searching
Search, search, search

It's all about

Search
performance !
Check, please !
Search Request Checks
Checks done before the first entry is returned :
- Normalize the filter
- Check if the password should be reset
- Check if the user is authenticated
- Check the filter attributes
- Find the backend

This represents 9% of the initial
search processing
(for a search returning one entry).
11
The candidates
Selecting candidates

Candidates are references
to
entries
(in other words, they are just pointers...)
Selecting candidates

AND

OR

NOT

No index

∀

∀

∀

Index

∩

∪

∀

Remember : We don't actually fetch any entry !
Candidates & AND filter
Maximum = min(filters candidates)
Minimum = min(filters candidates)

Here, max = 72 and min = 72
Candidates & OR filter
Maximum = sum(filters candidates)
Minimum = sum(filters candidates)

Here, max = 8098 and min = 8098
Cost of creating candidates

Looking for the best index
+
Creating the set of references
=
20% of the search processing
Index

No index,
no

Gain
Filters

Build your search

Filters
with caution
The Cache
Cache

It's all about
Memory
Vs

Disk
latency
Reminder

Disk is from 4x to 80x slower !
Read 1 MB sequentially from memory ..... 250,000 ns

= 250 µs

Read 1 MB sequentially from SSD* ..... 1,000,000 ns

=

1 ms

Read 1 MB sequentially from disk .... 20,000,000 ns

=

20 ms
Cache, the good

No Disk access
=>
Fast (very!)
Entry Cache

It caches Objects
Hash map
Or
Ordered data structure
Cache, the bad

Locks...
Algorithms...
Memory...
Cache, the ugly

L Has to be 'warm'
L Immutable objects
=>
A kind of copy is needed
45% of the search processing time
DISK
Backend

Storage !
Backend

Remember :
memory vs disk latency...
Memory
Of Price and Men

Memory : 64 GB = 1000$
vs
1 day of consulting to 'tune'
your servers for little gain = ???
Machines
(Olivia) Newton (John) Theory

Let's get physical,

physical
VM vs Bare metal

http://www.altechnative.net/2012/08/04/virtual-performance-part-1-vmware/
VM vs Bare metal

From 16h to 16 mins...
Most certainly IOs and/or disk access
(Spinning disks on a SAN)
Disk

Own your Disks !
Don't share them...

SSD is a winner !
SSD 1TB = 600$
HD 1TB = 100$
Network

Own your network !
Don't share it...
Network
Network
It's 4x to 40x times slower than memory :
Read 1 MB sequentially from memory ..... 250,000 ns

= 250 µs

Read 1 MB sequentially from SSD* ..... 1,000,000 ns

=

1 ms

Send 1 MB over 1 Gbps network ....... 10,000,000 ns

=

10 ms

But still : you can send up to 100 000 1kb entries per second
Through 1Gbps network...
Network

Get a fast network !
Misceallenous
Authorization

AUTHz
Is
Not
Free.
Thanks!

What makes a LDAP server running fast ? An bit of insight about the various bottlenecks and solutions to avoid them