Practical NoSQL: Accumulo's dirlist Example

Practical NoSQL: Accumulo's dirlist Example
May 21, 2019
John Highcock & Henry Sowell

© Cloudera, Inc. All rights reserved. 2
Changing the paradigm
Moving to NoSQL
• They want NoSQL…
• Everyone understands RDBMS
• Transition thinking about
storing the same data in a
NoSQL store

Accumulo Examples
Since ACCUMULO-1

Accumulo Examples: dirlist
Emulating Filesystem Characteristics

Background for understanding the dirlist example
• HDFS small file abuse
• Accumulo manages lots of small things well
• Scalability

Background for understanding the dirlist example (cont’d)
Accumulo K/V structure

Setup
• Create some sample files

Setup (cont’d)
• /opt/files/prod/done is
executable
• /opt/files/prod/.shh is
hidden

Setup (cont’d)
• Compute MD5s for later reference

Ingest and
Accumulo Setup
• Ingest the
files/directories
• Different
authorization between
prod and test
• chunkSize arbitrarily
low to force chunking
(more on that later)

Accumulo Setup
(cont’d)
• Created 3 tables
• Setting auths for the user

dirTable eye chart
• Row for each node in
the filetree
• Entry for various FS
characteristics
• hidden
• executable
• md5 for files

dirTable snippet
• Example for
/opt/files/prod/done
• Bits from filesystem
replicated to table
entries

dataTable (file
content storage)
• done file Example:
row cf :cq [vis] value
eb3d... refs:377ff...x00name [prod] /opt/files/prod/done

dataTable (file
content storage)

dataTable
• Chunked based on size

ChunkCombiner
• Configured for
dataTable

Index Table
• Forward and reverse
tokens in row
• Provides lookup for
dirTable

Query
• Leading / Middle / Trailing Wildcard
• Exact Term

Query Flexibility
• Can pick arbitrary
depths to start and
stop scans of dirTable
due to depth prefix

FileCount
• Count file and
directory depths per
node

Simple file viewer for navigating
the dirTable and displaying
dataTable content
• Opened with root of /opt/files
• Displays file/directory
metadata in upper right frame
• File content (as applicable) in
lower right frame
Filesystem App on
Accumulo

Full Circle with “done”
Filesystem App on
Accumulo

Chunked file “code” stitched back
together
Filesystem App on
Accumulo

Obligatory authorization example
• Top viewer started with only
“prod” authorization
• Middle viewer with only “test”
• Bottom with no authorizations
Filesystem App on
Accumulo

dirTable Query
• Search by path
• Directory Metadata

Practical NoSQL: Accumulo's dirlist Example

More Related Content

What's hot

Similar to Practical NoSQL: Accumulo's dirlist Example

More from DataWorks Summit

Recently uploaded

Practical NoSQL: Accumulo's dirlist Example