Rnotify

Rnotify
A Scalable Distributed
filesystems Notifications
Solution for Applications
Ashwin Raghav
www.rnotifications.com
github.com/ashwinraghav/rnotify-c/
1
1Tuesday, April 30, 13

Agenda
• Motivation
• Problem Statement / State of the art
• General Overview
• Hypothesis
• Approach
• Evaluation
• Conclusion
2

Motivation
• Applications need File System
Notifications
• Previously applications polled
file systems naively
• Now,All Operating Systems
provide FS Notifications API
3







 



 


Problem
VFS is an abstraction
to treat all ﬁlesystems
uniformly
All FS read/writes
happen viaVFS - ideal
place to implement
notiﬁcations
Does not work
with Distributed
File Systems
4

Problems / State of the art
Use ad-hoc (polling) implementations for Distributed FS.
Polling creates an unfortunate tension between
resource consumption and timeliness
Any general solution must be location transparent,
scalable, tunable.
Use inotify to subscribe to local ﬁlesystems
5

Requirements
• Compatibility with existing applications that use Inotify
• Provide Horizontal Scalability, Decomposition of Functionality,
Tunable Performance
• Location Transparency
• High Throughput notiﬁcations per client
6

Assumptions
• Relaxing Reliability Guarantees
• Modifying Notiﬁcation Semantics
• Congestion Control Semantics
• Failure Notiﬁcation Semantics
7

Related Work
• FAM (File Alteration Monitor) - does not scale
• Internet scale systems like Thialﬁ, Zookeeper are built for larger scales
of clients.
• Bayeux, Scribe, Siena, Hermes, Swag etc assume overlay networks to
establish multicast trees for message dissemination
• Inotify was introduced in kernel 2.6.13 - for local FS notiﬁcations
8






























Overview
Multiplexing/
Proxying
Subscriptions
Serializing
Notiﬁcations
Demultiplexing
Notiﬁcations





















9

Hypothesis
As a result of clearly decomposing functionality into
replicable components, Rnotify can be tuned to fit different
notification workloads to consistently deliver notifications
at low latency.
10

Key Properties
• Low Latency Notiﬁcations (under 10ms)
• Compatible with applications that use Inotify
• Tuned to ﬁt workloads
• Greedy Applications can use Rnotify by distributing their
workloads across nodes.
11

Approach
• Registration
• Notiﬁcation
• Replica Conﬁguration Management
12






























Registration





















• Inform the Proxy about the newly watched ﬁle
• Place Registrations on preferred Publishers
13

   










• Client Driven Registration
• Registration is transactional
from the application ‘s point
of view
• Client Driven Migration of
subscriptions
Client Library & API usage
14











Client Library & API usage
15






























Notiﬁcation Pipeline





















• Congestion Control
• Opportunistic Batching
• Publisher Selection
16

 












Dispatchers
• Serialize notiﬁcation blocks
• Congestion Control
• Dispatch to Publisher
17

Congestion Control at
Dispatcher












Subscription Id Number of notiﬁcations in
Time window
1 1000
2 3000
Frequency List
Frequency List
Frequency List
NOTIFICATION_BURST is
sent to Publisher
18

Avoid atomic broadcasts




















Frequency List
Frequency List
Frequency List
Frequency List
19


























Publishers
• Identify the subscribers for a
notiﬁcation
• Dispatch to the subscribers
20

Representing State - Publisher
Get all
Subscribers
Get all
Notifications
File Id IP address of Subscribers
1 192.168.1.2:3000
192.168.3.4:3001
2 192.168.1.2:3000
192.168.3.4:3001
Subscriber Undelivered Notifications
192.168.1.2:3000 N1, N2, N3
192.168.3.4:3001 N4, N5, N6
File Id Notifications
1 N1, N2, N3,
2 N4, N5
Append new
Notification
21






























Publisher Selection





















How do the dispatchers and Registrar maintain a shared
understanding of ‘preferred’ publishers?
22

Partition and Placement of Publishers





pos3 = SHA1(Publisher3_IP_ADDR)
23

Partition and Placement of Subscriptions





ﬁle3 = SHA1(File_Path3)
ﬁle1 = SHA1( File_Path1)
24

Arrival of Publisher





new_publisher = SHA1(New_Pub_IP_Addr)
Reissue_registrations_between(pos1, pos2)
Lock free way to make conﬁguration eventually consistent
25

Dispatcher Replication
• Dispatcher is provided the registrar location at startup
• It acquires the publisher list from the registrar
transactionally.
• Inform the Proxies independently
26

Evaluation Strategy
Mid size GlusterFS
deployment on EC2
Postmark Benchmark
to represent FS activity
Using Chef to startup
serviced clients
Measure Latency end
to end
8xl machines with 32 cores each
helped simulate several clients each
All machines were
acquired within a
placement group
27

Evaluation - Scalability
Tune Dispatchers based on FS throughput
Tune Publishers based on number of clients
28

Scalability - Overactive FileSystems
Post Mark threads writing to different
directories29

PostMark threads writing to same directory30

PostMark threads
writing to different
ﬁles
PostMark
threads writing
to same ﬁles
Applications like
web/mail server
HPC
applications
31

Scalability - Servicing many clients
32

Performance
Demonstrate consistency
Demonstrate footprint in comparison
to naive polling
33

Performance - Consistency
34

Comparison to naive Polling
• Developed a poller
Node.js REST API
• For just 100 clients , 5
ﬁles, 50000 stats per
second
• Has an extremely heavy
footprint on the FS
performance
35

Greedy Applications
• Increasing the number of
notiﬁcations delivered
per client
• Linear increase in latency
• Messages spend more
time in queues
36

Inotify - Inefﬁcient Applications
37

Greedy Applications
If you need to consume
more notiﬁcations,
Distribute yourself
Inefﬁcient
Application
38

Summary - Why is this
work different?
• FAM does not scale and is obsolete.
• All PubSub systems do not cater to many notiﬁcations per
client
• Multicast Trees are established for reliability (Performance
suffers)
• Pub Sub systems provide a richer set of semantics with lower
performance
39

Future Work
• Introduce a security model
• Introduce message ordering
• Provide message delivery reliability
40

Conclusion
• Rnotify is a solution to receive notifications from POSIX
compliant Distributed File Systems
• Tuned to fit different notification workloads
• Incrementally Scalable, location transparent and mimics Inotify
• We have tested Rnotify to scale to 2.5 million notifications per
second
• Latency under 10ms for 88% notifications
41

Questions
42





 
















Subscription Proxy
• Resides on the File Host &
Proxies subscriptions &
notiﬁcations.
• Idempotent API wrappers for
subscription
43

Design Alternatives
• File System Modiﬁcation
• VFS Modiﬁcation
• Modifying Inotify Implementation
44

Latency Tests - Zero
45

Throughput Tests - Zero
46

Rnotify

Recommended

Recommended

More Related Content

Similar to Rnotify

Similar to Rnotify (20)

Recently uploaded

Recently uploaded (20)

Rnotify