2. Web Performance & Operations Meetup
2 march 2012
Ahmed Omar – omar@nimbuzz.nl
Nico Klasens – nico@nimbuzz.nl
2
3. What is Nimbuzz?
Nimbuzz is a communication platform which provides
services
to make calls (Audio and Video), Internet and normal numbers
send instant messages
share files
On any mobile device, desktop computer or Internet
browser if possible.
It connects with popular instant messaging and social
networks
Facebook, Windows Live Messenger (MSN), GoogleTalk, Yahoo!,
and SIP/VoIP accounts
3
4. Architecture
Guidelines
• Implement business features in external components,
not modules inside xmpp router
• Bundle functionality in small services
• Implement services as stateless as possible
• Communicate with users over XMPP
• Communicate internal data over other connections
This makes it possible to do multiple deployments to
Nimbuzz every day.
4
6. What is XMPP?
eXtensible Message and Presence Protocol, Formerly known as Jabber
A full XMPP session is one XML document.
Client opens a <stream> and exchanges xml packets
At the end closes </stream>.
This requires long running TCP connections
Packets always have a
• From JID (JabberID : user@nimbuzz.com /resource)
• To JID (JabberID : user@nimbuzz.com /resource)
Three subtypes (Message, Presence and IQ)
• Message (type=normal/chat, subject, body)
• Presence (type=unavailable/subscribe/probe, show=chat/away/dnd)
• IQ (id, type=get/set/result/error,
one child element with extension namespace)
6
7. What is XMPP?
Instant Messaging and Presence extension
• Roster - central point of focus is a list of one's contacts or
"buddies"
• Local Nimbuzz friends
• Transports (gateways to external IM systems)
• Transport friends
• Presence information - network availability of particular
contacts
• Presence subscription – authorize contacts to receive
“presence”
• PrivacyList
7
8. Ejabberd / Erlang
Ejabberd is a XMPP instant messaging server, written in
Erlang/OTP. Nimbuzz runs a modified version with its
own extensions.
Erlang is a programming language
Erlang's runtime system has built-in support for
concurrency, distribution and fault tolerance.
OTP is a set of Erlang libraries It includes its own
distributed database, debugging and release
handling tools.
8
9. Erlang/OTP
- Quick History
- Why Erlang?
o Concurrency
o Fault tolerance
o Distribution
o Hot code swapping/ High Availability
- Who uses Erlang?
10. XMPP in action
• XML stanzas (presence, iq, message)
<presence to=‘user3@server-x’ type=‘subscribe’/>
<presence> <show> chat </show><status>Just
talk</status></presence>
<iq from='user2@server-x/pc' type='get' id='roster_req1'>
<query xmlns='jabber:iq:roster'/>
</iq>
<message to='user3@server-x' from ='user1@server-x/pc'
type = 'chat‘>
<body> Hey </body>
</message>
11. Ejabberd
Why XMPP?
o Real time communication
o Extensibility
Why Ejabberd?
o Flexible
easy to setup a cluster
easy to configure
easy to extend
support for external services.
o Powerful
o Scalable... with caution.
12. Persistence Bridge
Application introduced to migrate to ejabberd, while still
using the old database schema of the old xmpp server.
data requests 1.839.333.853 per day = 21.288 per second
A REST service written in java
Apply validation/business rules and enhance data.
Cache most accessed data in memcached what is stored in
MySQL
Cache community gateway rosters in Redis
Migrate data to more efficient storage backends or database
tables.
12
13. Cache server Practices
Use a naming convention for your keys: namespace:sequence@identifier
Sequence has to be configurable
Choose a good balance between memory size, expiry and number of
servers.
Memcached is/was single-threaded
Decide on connection, read and data retrieval timeouts.
Use command pipelining on every connection.
Use different connections for different namespaced keys.
Only write to MySQL, try to only read from cache server. update cache
server on write
Use Check And Set (CAS) command for partial updates and back out
after retries.
13
14. MySQL Practices
Use connection pooling (mysql driver for java creates
connections slow)
Test connections with a mysql ping not with a statement
“SELECT 1”
Make every transaction a single sql-statement and turn
autocommit on by default
Split read and write statements to different connections.
Add requesting host and application as sql comment to
statement
14
15. MySQL Practices
Put statements outside programming code.
Never use "SELECT *" column order might change and hard to check
which are still used
Never use "INSERT INTO table_name VALUES( ... )“
Use "INSERT INTO column1, colum2 table_name VALUES( ... )“
Do not rely on database DEFAULT values. Provide all values on INSERT.
An exception on this is a TIMESTAMP field.
All columns in the database have to be NOT NULL and a DEFAULT value
Use Primary Keys as much as possible.
15
16. Data tweaks
Users send a lot of junk. Validate and drop. Do not try to correct
Do not store data which is implied like + of phonenumbers
Bulk insert, update and delete.
INSERT INTO table (data) VALUES(?), (?)
INSERT INTO table (id, data) VALUES(?,?), (?, ?)
ON DUPLICATE KEY UPDATE data = VALUES(data)
Do not use REPLACE if you don’t want to DELETE and INSERT.
Very bad IO performance
Sort rows based on primary key before update and delete Improves
InnoDB page locks
Use compound primary key to store records of one user together on
disk (user_id, auto_increment_id)
A mysql index on large table with text columns do not perform. Use
fulltext search engines to have an index which is not fully in memory.
Remove foreign keys to reduce storage. Trust the application to update
and delete
16