Accumulo is primarily used as a Big Data storage facility in a clustered environment. Accumulo’s columnar arrangement of rows, key-value pair indices and cell-level security make it attractive for non-Big Data applications as well. In this talk, we describe how to use Accumulo to implement message queuing that provides confidentiality protection. One feature of message queuing is broadcasting messages from a producer to multiple consumers. The messages could be part of a stream that the producer is providing to multiple consumers. In some cases, not all consumers should see every message in the stream. In a traditional queuing system, separate queues would be created for different levels of access. Thereby the messages would be duplicated for each level of access. In thistalk, we show how to use Accumulo to create a queuing system that does not require duplication. We also present results from experiments testing the performance of such a system under different loads. We also present results comparing the performance of streaming messages using a queuing system based on Accumulo compare to traditional queuing systems, such as Apache QPid.
Accumulo Summit 2014: Using Accumulo to Implement Confidentiality Protection in Message Queuing
1. Dr. Rod Moten
Chief Scientist
PROARC, Inc.
6/17/2014PROARC, Inc. | 300 E. Lombard Suite 640 Baltimore MD 21202 | info@proarc-inc.com | 410-665-2230 1
2. Ensure confidential information is only
accessible by those with the correct privileges
Example
◦ Ensure only people with Secret clearances can read
Secret documents
6/17/2014
PROARC, INC. PROPRIETARY INFORMATION: The information contained herein may not be used in whole or in part except for
the limited purpose for which it was furnished. Do not distribute, duplicate, or reproduce in whole or in part without the
prior written consent of an authorized official of PROARC, Inc) 2
3. Artifacts are tagged with
attributes that specify their
confidentiality level
Portions of a single artifact
can have different
confidentiality levels
Entire artifact will be
protected at the highest
level of its parts
Reduce confidentiality level
by stripping out portions
with higher levels
Example
Protection level of this document
is Trade Secret
(Public) Sweeping fingers in
shapes across the screen of a
smartphone or tablet, can be
used to unlock devices.
(Confidential) The CEO of Acme
uses the same shape for all his
devices.
(Trade Secret) When near a CEO
exploit the Bluetooth bleed bug
to send a fake notification to his
device and study his gesture.
(Public) The free-form gestures
have an inherent appeal as
passwords.
4. Mark each frame or collection of frames with
a confidentiality level
◦ Consumers can only receive frames for which they
are privileged to read
Consumers cannot directly transfer frames to
producers.
◦ A broker is required
Use traditional message queuing system with
access control, such as Qpid.
Queue per Confidentiality Level
5. Frame 1
A,B
Frame 2
A
Frame 3
A,B
Frame 4
A,B
Queue for Confidentiality Level A
Queue for Confidentiality Level B
Frame 1
A,B
Frame 3
A,B
Frame 4
A,B
Frame 1
A,B
Frame 1
A,B
Frame 2
A
Frame 3
A,B
Frame 3
A,B
Frame 4
A,B
Frame 4
A,B
Frame 1
A,B
Frame 2
A
Frame 3
A,B
Frame 4
A,B
Frame 1
A,B
Frame 2
A
Frame 3
A,B
Frame 4
A,B
A separate queue for each protection level
Consumers read all frames from queue for which they have access
Queue for A, but Not B
Frame 2
A
6. A single queue contains all frames for all
confidentiality levels
Consumers only read frames for which they
have access.
7. Frame 1
A,B
Frame 2
A
Frame 3
A,B
Frame 4
A,B
Frame 1
A,B
Frame 2
A
Frame 3
A,B
Frame 4
A,B
Frame 1
A,B
Frame 2
A
Frame 3
A,B
Frame 4
A,B
A single queue contains all frames for all protection levels
Consumers only read frames for which they have access.
Consumers with Access to A
Consumers with Access to B
Frame 1
A,B
Frame 1
A,B
Frame 2
A
Frame 3
A,B
Frame 4
A,B
Frame 4
A,B
Frame 3
A,B
8. Treat queue as an unbounded buffer
◦ Single writer – multiple readers
Buffer implemented as an Accumulo table
◦ Technically it is a very large bounded buffer
◦ Theoretically it can hold 2632 = 1.9 x 1049 entries
Each row contains a frame
Row ID string of 32 characters from the set [a-z]
2632 frames = 1.9 x 1049 frames
1st frame: aaa…aaa
2nd frame: aaa…aab
27th frame: aaa…aba
Security label Confidentiality level
9. The frame is stored as the values of one or more columns.
◦ A frame will be partitioned into multiple values if it is large.
Column Family
◦ Contains the column index number
Column Qualifier
◦ First column – total size of frame
◦ Subsequent columns – size of value
Example – 1KB Frame divided into two columns
Row ID Column Family Column Qualifier Value
aaa…aaa 0 1024
aaa…aaa 1 512 <512 bytes>
aaa…aaa 2 512 <512 bytes>
11. Batch writing of rows
◦ Currently, Writers flush after writing one row.
Reduce polling
◦ Currently Readers polls for a new row when it has
reached the end of the buffer
◦ Writers can notify Readers via multicast when a row
is written
12. Comparison between Qpid and our POC
messaging system
◦ Compare the average time to read and write a
frame at a specific rate
Frames sizes: 2MB and 8KB
Frame rate: 50 ms
Number of Consumers: 1, 10, 100, 1000
Number of confidentiality levels: 1 and 5
We didn’t make any special configurations to
Qpid or Accumulo.
13. Accumulo Qpid
# of
Levels
Frame
Size
Avg.
Write
Time
Avg.
Read
Time
1 8KB 0.18ms 4.3ms
1 2MB 111ms 196ms
5 8KB 0.18ms 4.3ms
5 2MB 111ms 196ms
# of
Levels
Frame
Size
Avg.
Write
Time
Avg.
Read
Time
1 8KB 0.93ms 47ms
1 2MB 129ms 3.98s
5 8KB 2.21ms 47ms
5 2MB 3.58s 3.98s
The number of access levels had
no impact on the read and write
times.
As expected, duplicating the
frame for each confidentiality level
slows down writes.
14. Accumulo Qpid
# of
Levels
Frame
Size
Avg.
Write
Time
Avg.
Read
Time
1 8KB 0.21ms 28.3ms
1 2MB 236ms 2.23s
5 8KB 0.21ms 28.3ms
5 2MB 236ms 2.23s
# of
Levels
Frame
Size
Avg.
Write
Time
Avg.
Read
Time
1 8KB 0.93ms 47ms
1 2MB 129ms 3.98s
5 8KB 2.21ms 47ms
5 2MB 3.58s 3.98s
The read and write times for 1
and 100 consumers were so close
we only show the results from 1
consumer.
Impacted by the number of
consumers.
15. # of Levels Frame Size Avg. Write
Time
Avg. Read
Time
Frame Rate
1 & 5 8KB 2.43ms 209ms 50 ms
1 & 5 2MB 12.9s 11.4s 50 ms
1 & 5 2MB 512ms 18.6s Write-50ms
Read-30s
Read times impacted by multiple consumers
on the same VM and disk contention.
We didn’t test Qpid with 1000 Consumers
because the queues are kept in RAM and we
didn’t have enough RAM for 1000 consumers.
16. 4.3 5.38
28.3
209
0
50
100
150
200
250
1 10 100 1000
Read/Writetimesinmilliseconds
# of Consumers
8KB Frames
Read Write
Read times are almost the same when
there is only 1 consumer per VM.
Write times remain
flat while read
times increase as
the number of
consumers
increase on the
same VM.
17. Accumulo may be suitable as the backbone for a
message queuing system
◦ Accumulo outperforms Qpid for complex attribute
policies.
◦ A messaging system based on Accumulo isn’t restricted
by RAM like Qpid.
◦ Drawback: May require a lot of polling.
Large frames
◦ Small number of consumers and no more than 5 frames
per second.
Small frames
◦ 100’s of consumers per buffer and no more than 40
frames per second.