HOW TO IMPROVE ON 
MQTT 3.1.1 
Tim Kellogg 
@kellogh 
MQTT: An Implementor’s Perspective 
AOL Keywords: vasters mqtt
MQTT 
Servers 
Wired 
Networks 
Vertically 
scaled
fuel 
loca-tion 
Funnel Protocol 
temp 
Kafka | AMQP | HTTP
MQTT 
“Uptime” isn’t straightforward 
“Exactly Once” Violates CAP 
Security Problems 
• Retained messages 
• Provenance 
Error Handling
CAP Theorem 
Availability 
Network 
Partitions 
Consistency CP 
AP
CAP Theorem 
Availability 
Partition 
Tolerance 
Guaranteed 
In-Order 
Delivery 
CP 
AP
Subscriber dies prematurely 
Receive 
• Via QoS 2 
Do Work 
• Not 
idempotent 
work 
• Duplicate 
message 
would 
cause 
errors 
Crash 
• ACK isn’t 
sent
Subscriber dies prematurely (2) 
Receive 
• Via QoS 2 
ACK 
• Auto-acknowledge 
Crash 
• Before work 
is done
MQTT Recommendations 
Message Dedup 
• Have clients implement “QoS 2” via QoS 1 
Token-based Security 
Provenance story 
Metadata about messages 
Error codes
CoAP-PubSub 
Resource Directory 
Garbage Collection 
Polling Allowed 
Retained Messages 
Error Codes
QUESTIONS? 
@kellogh 
app.thingfabric.com 
mqtt.io – No Bullshit, Just MQTT™ 
coap.io – No Bullshit, Just CoAP-PubSub™

ThingMonk 2014: How To Improve On MQTT 3.1.1

Editor's Notes

  • #3 MQTT was created in 1999 when “scaling” meant “scaling vertically”. MQTT scales great vertically, but lots of issues when scaling horizontally.
  • #4 MQTT works great when used as a funnel protocol, piping data into a more appropriate mechanism for intra-cloud communication. Still doesn’t do a great job of catering to constrained sensors/actuators
  • #5 Uptime – inconsistent perception of “available” between client and broker. The broker may be available but not accessible due to stale DNS entries (assuming DNS load balancing) Exactly-once – More about this on next slides Security– Infinite topics can be stuffed with retained messages, causing possible DoS. Lack of proper errors makes it difficult to communicate authorization and rate-limits. Provenance – Either end-to-end security (and give up pub-sub features) or use a hacky topic-level ACL implementation. Error handling – Can’t communicate rate limits, authorization errors, message size errors, etc.
  • #7 2-byte identifier needs to be unique across cluster, not big enough Takes client failures into account, but mostly ignores server failures and auto-failover