Practical NoSQL: Accumulo's dirlist Example
May 21, 2019
John Highcock & Henry Sowell
© Cloudera, Inc. All rights reserved. 2
Changing the paradigm
Moving to NoSQL
• They want NoSQL…
• Everyone understands RDBMS
• Transition thinking about
storing the same data in a
NoSQL store
© Cloudera, Inc. All rights reserved. 3
Accumulo Examples
Since ACCUMULO-1
© Cloudera, Inc. All rights reserved. 4
Accumulo Examples: dirlist
Emulating Filesystem Characteristics
© Cloudera, Inc. All rights reserved. 5
Background for understanding the dirlist example
• HDFS small file abuse
• Accumulo manages lots of small things well
• Scalability
© Cloudera, Inc. All rights reserved. 6
Background for understanding the dirlist example (cont’d)
Accumulo K/V structure
© Cloudera, Inc. All rights reserved. 7
Setup
• Create some sample files
© Cloudera, Inc. All rights reserved. 8
Setup (cont’d)
• /opt/files/prod/done is
executable
• /opt/files/prod/.shh is
hidden
© Cloudera, Inc. All rights reserved. 9
Setup (cont’d)
• Compute MD5s for later reference
© Cloudera, Inc. All rights reserved. 10
Ingest and
Accumulo Setup
• Ingest the
files/directories
• Different
authorization between
prod and test
• chunkSize arbitrarily
low to force chunking
(more on that later)
© Cloudera, Inc. All rights reserved. 11
Accumulo Setup
(cont’d)
• Created 3 tables
• Setting auths for the user
© Cloudera, Inc. All rights reserved. 12
dirTable eye chart
• Row for each node in
the filetree
• Entry for various FS
characteristics
• hidden
• executable
• md5 for files
© Cloudera, Inc. All rights reserved. 13
dirTable snippet
• Example for
/opt/files/prod/done
• Bits from filesystem
replicated to table
entries
© Cloudera, Inc. All rights reserved. 14
dataTable (file
content storage)
• done file Example:
row cf :cq [vis] value
eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
© Cloudera, Inc. All rights reserved. 15
dataTable (file
content storage)
• done file Example:
row cf :cq [vis] value
eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
© Cloudera, Inc. All rights reserved. 16
dataTable (file
content storage)
• done file Example:
row cf :cq [vis] value
eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
© Cloudera, Inc. All rights reserved. 17
dataTable (file
content storage)
• done file Example:
row cf :cq [vis] value
eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
© Cloudera, Inc. All rights reserved. 18
dataTable (file
content storage)
• done file Example:
row cf :cq [vis] value
eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
© Cloudera, Inc. All rights reserved. 19
dataTable (file
content storage)
• done file Example:
row cf :cq [vis] value
eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
© Cloudera, Inc. All rights reserved. 20
dataTable
• Chunked based on size
© Cloudera, Inc. All rights reserved. 21
ChunkCombiner
• Configured for
dataTable
© Cloudera, Inc. All rights reserved. 22
Index Table
• Forward and reverse
tokens in row
• Provides lookup for
dirTable
© Cloudera, Inc. All rights reserved. 23
Query
• Leading / Middle / Trailing Wildcard
• Exact Term
© Cloudera, Inc. All rights reserved. 24
Query Flexibility
• Can pick arbitrary
depths to start and
stop scans of dirTable
due to depth prefix
© Cloudera, Inc. All rights reserved. 25
FileCount
• Count file and
directory depths per
node
© Cloudera, Inc. All rights reserved. 26
Simple file viewer for navigating
the dirTable and displaying
dataTable content
• Opened with root of /opt/files
• Displays file/directory
metadata in upper right frame
• File content (as applicable) in
lower right frame
Filesystem App on
Accumulo
© Cloudera, Inc. All rights reserved. 27
Full Circle with “done”
Filesystem App on
Accumulo
© Cloudera, Inc. All rights reserved. 28
Chunked file “code” stitched back
together
Filesystem App on
Accumulo
© Cloudera, Inc. All rights reserved. 29
Obligatory authorization example
• Top viewer started with only
“prod” authorization
• Middle viewer with only “test”
• Bottom with no authorizations
Filesystem App on
Accumulo
© Cloudera, Inc. All rights reserved.
THANK YOU
© Cloudera, Inc. All rights reserved.
Backup
© Cloudera, Inc. All rights reserved. 32
dirTable Query
• Search by path
• Directory Metadata

Practical NoSQL: Accumulo's dirlist Example

  • 1.
    Practical NoSQL: Accumulo'sdirlist Example May 21, 2019 John Highcock & Henry Sowell
  • 2.
    © Cloudera, Inc.All rights reserved. 2 Changing the paradigm Moving to NoSQL • They want NoSQL… • Everyone understands RDBMS • Transition thinking about storing the same data in a NoSQL store
  • 3.
    © Cloudera, Inc.All rights reserved. 3 Accumulo Examples Since ACCUMULO-1
  • 4.
    © Cloudera, Inc.All rights reserved. 4 Accumulo Examples: dirlist Emulating Filesystem Characteristics
  • 5.
    © Cloudera, Inc.All rights reserved. 5 Background for understanding the dirlist example • HDFS small file abuse • Accumulo manages lots of small things well • Scalability
  • 6.
    © Cloudera, Inc.All rights reserved. 6 Background for understanding the dirlist example (cont’d) Accumulo K/V structure
  • 7.
    © Cloudera, Inc.All rights reserved. 7 Setup • Create some sample files
  • 8.
    © Cloudera, Inc.All rights reserved. 8 Setup (cont’d) • /opt/files/prod/done is executable • /opt/files/prod/.shh is hidden
  • 9.
    © Cloudera, Inc.All rights reserved. 9 Setup (cont’d) • Compute MD5s for later reference
  • 10.
    © Cloudera, Inc.All rights reserved. 10 Ingest and Accumulo Setup • Ingest the files/directories • Different authorization between prod and test • chunkSize arbitrarily low to force chunking (more on that later)
  • 11.
    © Cloudera, Inc.All rights reserved. 11 Accumulo Setup (cont’d) • Created 3 tables • Setting auths for the user
  • 12.
    © Cloudera, Inc.All rights reserved. 12 dirTable eye chart • Row for each node in the filetree • Entry for various FS characteristics • hidden • executable • md5 for files
  • 13.
    © Cloudera, Inc.All rights reserved. 13 dirTable snippet • Example for /opt/files/prod/done • Bits from filesystem replicated to table entries
  • 14.
    © Cloudera, Inc.All rights reserved. 14 dataTable (file content storage) • done file Example: row cf :cq [vis] value eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
  • 15.
    © Cloudera, Inc.All rights reserved. 15 dataTable (file content storage) • done file Example: row cf :cq [vis] value eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
  • 16.
    © Cloudera, Inc.All rights reserved. 16 dataTable (file content storage) • done file Example: row cf :cq [vis] value eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
  • 17.
    © Cloudera, Inc.All rights reserved. 17 dataTable (file content storage) • done file Example: row cf :cq [vis] value eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
  • 18.
    © Cloudera, Inc.All rights reserved. 18 dataTable (file content storage) • done file Example: row cf :cq [vis] value eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
  • 19.
    © Cloudera, Inc.All rights reserved. 19 dataTable (file content storage) • done file Example: row cf :cq [vis] value eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
  • 20.
    © Cloudera, Inc.All rights reserved. 20 dataTable • Chunked based on size
  • 21.
    © Cloudera, Inc.All rights reserved. 21 ChunkCombiner • Configured for dataTable
  • 22.
    © Cloudera, Inc.All rights reserved. 22 Index Table • Forward and reverse tokens in row • Provides lookup for dirTable
  • 23.
    © Cloudera, Inc.All rights reserved. 23 Query • Leading / Middle / Trailing Wildcard • Exact Term
  • 24.
    © Cloudera, Inc.All rights reserved. 24 Query Flexibility • Can pick arbitrary depths to start and stop scans of dirTable due to depth prefix
  • 25.
    © Cloudera, Inc.All rights reserved. 25 FileCount • Count file and directory depths per node
  • 26.
    © Cloudera, Inc.All rights reserved. 26 Simple file viewer for navigating the dirTable and displaying dataTable content • Opened with root of /opt/files • Displays file/directory metadata in upper right frame • File content (as applicable) in lower right frame Filesystem App on Accumulo
  • 27.
    © Cloudera, Inc.All rights reserved. 27 Full Circle with “done” Filesystem App on Accumulo
  • 28.
    © Cloudera, Inc.All rights reserved. 28 Chunked file “code” stitched back together Filesystem App on Accumulo
  • 29.
    © Cloudera, Inc.All rights reserved. 29 Obligatory authorization example • Top viewer started with only “prod” authorization • Middle viewer with only “test” • Bottom with no authorizations Filesystem App on Accumulo
  • 30.
    © Cloudera, Inc.All rights reserved. THANK YOU
  • 31.
    © Cloudera, Inc.All rights reserved. Backup
  • 32.
    © Cloudera, Inc.All rights reserved. 32 dirTable Query • Search by path • Directory Metadata