PubSub in ejabberd
How does it work and how to extend it for fun and profit
PubSub overview
• generic publish-subscribe functionality, specified in XEP-0060
v1.13
• More than 100 pages of specifications
• 12 very detailed use cases with many possibles options and
possible situations: Subscribe, Unsubscribe, Configure
subscription, Retrieve items, Publish item, Delete item, Create
node, Configure node, Delete node, Purge node, Manage
subscriptions, Manage affiliations.
• XEP-0163 v1.2 (PEP) based on PubSub
• XEP-0248 (deprecated) for Collection Nodes
History
• Initial implementation from Aleksey Shchepin
(ejabberd author)
• Add ability to organise nodes in a tree back in 2007
by Christophe Romain
• First attempt to create an API plugins in 2007
• Improvements until 2015
Implementation
• A poll of iq handlers handled by ejabberd router
• A sending process
• A core router to perform high level actions for every
use case
• Plugins to handle nodes, affiliations/subscriptions,
and items at lower level and interface with data
backend
Nodetree plugins
• They handles storage and organisation of PubSub
nodes. Called on get, create and delete node.
tree (default) both internal and odbc backend
virtual (no backend)
dag (to handle XEP-0248)
Node plugins
• They handle affiliations, subscriptions and items. They
provide default node configuration and features. Called on
every pubsub use cases.
• Responsible of checks to handle all possibles cases
• Reply action result to PubSub engine and let it handle the
routing
• Many plugins: flat, hometree, pep, dag, public, private, ...,
yours ?
• Few backends: internal, odbc (other possible)
Plugin design
• Due to complexity of XEP-0060, PubSub engine do
successive calls to nodetree and node plugins in
order to check validity, perform corresponding
action and return result or appropriate error
• Plugin design follows this requirement and divide
actions by type of data to allow transient backend
implementation without any PubSub engine change
Create Node
Delete Node
Subscribe
Unsubscribe
Publish item
Delete item
Purge Node
Get item
node_flat
• default plugin, no node hierarchy, handles standard
PubSub case
[{deliver_payloads, true},
{notify_config, false},
{notify_delete, false},
{notify_retract, true},
{purge_offline, false},
{persist_items, true},
{max_items, 10},
{subscribe, true},
{access_model, open},
{roster_groups_allowed, []},
{publish_model, publishers},
{notification_type, headline},
{max_payload_size, 60000},
{send_last_published_item, on_sub_and_presence},
{deliver_notifications, true},
{presence_based_delivery, false}].
node_hometree
• Use exact same features as flat plugin
• Organise nodes in a tree, follows same scheme as
path in filesystem.
• Every user can create nodes in its own home
root: /home/user and/or /home/domain/user
• Each node can contain items and/or sub-nodes.
node_pep
• Handles XEP-0163: Personal Eventing Protocol
• Do not persist items
• Just keep last item in memory cache
• Node names are raw namespace attached to a
given bare JID
• Every user can have its own node with a common
namespace sharing with others
node_dag
• Handles XEP-0248: PubSub Collection Nodes
• Contribution from Brian Cully
• Every node takes places in a tree and is either a
collection node (have only sub-nodes) or a leaf
node (contains only items)
• No restriction on the tree structure
Available backends
• Flat, hometree and PEP supports mnesia and odbc
backend.
• Any derivated plugin can support the same (public,
private, club, buddy...)
• Business Edition also supports in ets and mdb
• Adding backend does not require any PubSub
engine change. Plugin just need to comply API.
Storage choices
• nodetree plugin to handle pubsub_node table
• node plugin to handle subscription/affiliation in
pubsub_state (or can even be spread by
implementation) and items in pubsub_item table
• if all nodes shares same configuration, I/O on
pubsub_node can be avoided (nodetree_virtual)
Customisation
• Write your own plugin, implements needed functions:
[init/3, terminate/2, options/0, features/0,
create_node_permission/6, create_node/2, delete_node/1,
purge_node/2, subscribe_node/8, unsubscribe_node/4,
publish_item/6, delete_item/4, remove_extra_items/3,
get_entity_affiliations/2, get_node_affiliations/1,
get_affiliation/2, set_affiliation/3,
get_entity_subscriptions/2, get_node_subscriptions/1,
get_subscriptions/2, set_subscriptions/4,
get_pending_nodes/2, get_states/1, get_state/2,
set_state/1, get_items/7, get_items/3, get_item/7,
get_item/2, set_item/1, get_item_name/3, node_to_path/1,
path_to_node/1]
• Generic function must call their corresponding
partner in node_flat
Example
• Customize options/0 and features/0 to match your
need using all available features from PubSub
engine. (This triggers the way PubSub controls
calls to plugins)
• implement create_node_permission, for example
check an LDAP directory against an access flag
• Write your own tests on publish or create node,
forbids explicit access to items, etc...
Clustering
• Ejabberd's implementation tends to cover most generic and
standard uses. It's good for common use, but far from optimal
for edges or specific cases.
• nodes, affiliations, subscriptions and items are stored in a
replicated database.
• Each ejabberd node have access to all the data.
• Each ejabberd node handles part of the load, but keep locking
database cluster wide on node records write (pubsub_node)
• affiliations, subscriptions and items uses non blocking write
(pubsub_state and pubsub_item)
Optimisations
• Take advantage of clustering depending on your needs:
millions of nodes and few subscribers: split nodes over the cluster, use
hash and very few replications. if no configurable nodes, just use virtual
nodetree
few nodes and lot of subscribers: split subscriptions over the cluster, each
ejabberd node only store/handle local subscribers, multi call publish_item
no subscriptions options, remove use of pubsub_subscriptions call from
the plugin
if high publish rate (real time notification): just remove item persistency,
enable memory cache of last item only if needed (pubsub_last_item table).
keep cache replicated or local depending on clustering scheme.
QUESTIONS ?
Possible improvement
Questions
Christophe Romain <cromain@process-one.net>

Deep Dive Into ejabberd Pubsub Implementation

  • 1.
    PubSub in ejabberd Howdoes it work and how to extend it for fun and profit
  • 2.
    PubSub overview • genericpublish-subscribe functionality, specified in XEP-0060 v1.13 • More than 100 pages of specifications • 12 very detailed use cases with many possibles options and possible situations: Subscribe, Unsubscribe, Configure subscription, Retrieve items, Publish item, Delete item, Create node, Configure node, Delete node, Purge node, Manage subscriptions, Manage affiliations. • XEP-0163 v1.2 (PEP) based on PubSub • XEP-0248 (deprecated) for Collection Nodes
  • 3.
    History • Initial implementationfrom Aleksey Shchepin (ejabberd author) • Add ability to organise nodes in a tree back in 2007 by Christophe Romain • First attempt to create an API plugins in 2007 • Improvements until 2015
  • 4.
    Implementation • A pollof iq handlers handled by ejabberd router • A sending process • A core router to perform high level actions for every use case • Plugins to handle nodes, affiliations/subscriptions, and items at lower level and interface with data backend
  • 5.
    Nodetree plugins • Theyhandles storage and organisation of PubSub nodes. Called on get, create and delete node. tree (default) both internal and odbc backend virtual (no backend) dag (to handle XEP-0248)
  • 6.
    Node plugins • Theyhandle affiliations, subscriptions and items. They provide default node configuration and features. Called on every pubsub use cases. • Responsible of checks to handle all possibles cases • Reply action result to PubSub engine and let it handle the routing • Many plugins: flat, hometree, pep, dag, public, private, ..., yours ? • Few backends: internal, odbc (other possible)
  • 7.
    Plugin design • Dueto complexity of XEP-0060, PubSub engine do successive calls to nodetree and node plugins in order to check validity, perform corresponding action and return result or appropriate error • Plugin design follows this requirement and divide actions by type of data to allow transient backend implementation without any PubSub engine change
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
    node_flat • default plugin,no node hierarchy, handles standard PubSub case [{deliver_payloads, true}, {notify_config, false}, {notify_delete, false}, {notify_retract, true}, {purge_offline, false}, {persist_items, true}, {max_items, 10}, {subscribe, true}, {access_model, open}, {roster_groups_allowed, []}, {publish_model, publishers}, {notification_type, headline}, {max_payload_size, 60000}, {send_last_published_item, on_sub_and_presence}, {deliver_notifications, true}, {presence_based_delivery, false}].
  • 17.
    node_hometree • Use exactsame features as flat plugin • Organise nodes in a tree, follows same scheme as path in filesystem. • Every user can create nodes in its own home root: /home/user and/or /home/domain/user • Each node can contain items and/or sub-nodes.
  • 18.
    node_pep • Handles XEP-0163:Personal Eventing Protocol • Do not persist items • Just keep last item in memory cache • Node names are raw namespace attached to a given bare JID • Every user can have its own node with a common namespace sharing with others
  • 19.
    node_dag • Handles XEP-0248:PubSub Collection Nodes • Contribution from Brian Cully • Every node takes places in a tree and is either a collection node (have only sub-nodes) or a leaf node (contains only items) • No restriction on the tree structure
  • 20.
    Available backends • Flat,hometree and PEP supports mnesia and odbc backend. • Any derivated plugin can support the same (public, private, club, buddy...) • Business Edition also supports in ets and mdb • Adding backend does not require any PubSub engine change. Plugin just need to comply API.
  • 21.
    Storage choices • nodetreeplugin to handle pubsub_node table • node plugin to handle subscription/affiliation in pubsub_state (or can even be spread by implementation) and items in pubsub_item table • if all nodes shares same configuration, I/O on pubsub_node can be avoided (nodetree_virtual)
  • 22.
    Customisation • Write yourown plugin, implements needed functions: [init/3, terminate/2, options/0, features/0, create_node_permission/6, create_node/2, delete_node/1, purge_node/2, subscribe_node/8, unsubscribe_node/4, publish_item/6, delete_item/4, remove_extra_items/3, get_entity_affiliations/2, get_node_affiliations/1, get_affiliation/2, set_affiliation/3, get_entity_subscriptions/2, get_node_subscriptions/1, get_subscriptions/2, set_subscriptions/4, get_pending_nodes/2, get_states/1, get_state/2, set_state/1, get_items/7, get_items/3, get_item/7, get_item/2, set_item/1, get_item_name/3, node_to_path/1, path_to_node/1] • Generic function must call their corresponding partner in node_flat
  • 23.
    Example • Customize options/0and features/0 to match your need using all available features from PubSub engine. (This triggers the way PubSub controls calls to plugins) • implement create_node_permission, for example check an LDAP directory against an access flag • Write your own tests on publish or create node, forbids explicit access to items, etc...
  • 24.
    Clustering • Ejabberd's implementationtends to cover most generic and standard uses. It's good for common use, but far from optimal for edges or specific cases. • nodes, affiliations, subscriptions and items are stored in a replicated database. • Each ejabberd node have access to all the data. • Each ejabberd node handles part of the load, but keep locking database cluster wide on node records write (pubsub_node) • affiliations, subscriptions and items uses non blocking write (pubsub_state and pubsub_item)
  • 25.
    Optimisations • Take advantageof clustering depending on your needs: millions of nodes and few subscribers: split nodes over the cluster, use hash and very few replications. if no configurable nodes, just use virtual nodetree few nodes and lot of subscribers: split subscriptions over the cluster, each ejabberd node only store/handle local subscribers, multi call publish_item no subscriptions options, remove use of pubsub_subscriptions call from the plugin if high publish rate (real time notification): just remove item persistency, enable memory cache of last item only if needed (pubsub_last_item table). keep cache replicated or local depending on clustering scheme. QUESTIONS ?
  • 26.
  • 27.