Look Ma!
No more blobs
Aparna Chaudhary
NoSQL matters, @Cologne Germany 2013
EMBRACE
POLYGLOT
PERSISTENCE!
STOP
RDBMS ABUSE!
KNOW YOUR
USE CASE
Parse
Extract
Store
Read XML
We don't do rocket
science...
Use Case
Runtime support for
document types
Metadata definition
provided at runtime
Document type names -
max 50 char
Look up content based
on metadata
RA
Challenges
Storage of up to one
million documents of
10KB to 2GB per
document type per year
Write 1MB < x msec
Retrieve 1MB < y msec
......and detailsRA
But…the Numbers make it
interesting...
How?
File
System
MongoDB
RDBMS
JCR
Document
Management
if you want to store files,
its logical to use file system.
ain't it?
File System
✓ Ease of Use
✓ No special skill-set
✓ Backup and Recovery
✓ It’s free!
How do I name them?
Support for metadata storage?
Performance with too many small
files?
Query - Administration?
High
Availability?
Limitation on
total number of
files?
Relational database
Integrity
Consistency
Durability
Atomicity
Joins
Backups
High Availability
You name it, We have it!
RDBMS
Aggregations
RDBMS
Developer’s Perspective
Challenge #1
RA
We need runtime support
for document type.
RA
We need runtime support
for document type.
Challenge #1
DOC_1 DOC_2 DOC_3
DOC_4 DOC_5 DOC_6
Dynamic DDL Generation
DOC_1 DOC_2 DOC_3
DOC_4 DOC_5 DOC_6
Dynamic DDL Generation
Challenge #1
String concatenations
are ugly…
DEV
String concatenations
are ugly…
DEV
Challenge #1
Let's build a utility.
DEV
Let's build a utility.
DEV
Challenge #1
More WorkMore Work
Challenge #2
RA
Document type is 50 char
long
RA
Document type is 50 char
long
Challenge #2
TABLE NAME LIMITS
Wait…
SQL-92 says 128 Char
?
We rule. Let's support only
30 char.
TABLE NAME LIMITS
Wait…
SQL-92 says 128 Char
?
We rule. Let's support only
30 char.
Challenge #2
DOC_TYPE_MAPPING
Let's create a mapping
table.
DEV
DOC_TYPE_MAPPING
Let's create a mapping
table.
DEV
Challenge #2
Ugly unreadable table
names!
Ugly unreadable table
names!
So...finally...
Read XML
Dynamic DDL
generation
Document Type Alias
DocumentType
Defined
Yes
No
Extract Metadata
Store Metadata
Store Content
Simple use case
becomes complex...
Remember...
Our Challenge
QA
Let's see if we are in spec
for response time.
Aah..what about
performance now?
DEV
MongoDB
Document Based
GridFS
B-Tree
Dynamic Schema
JSON
BSON
Query
Scalable
http://www.10gen.com/presentations/storage-engine-internals
Joins
Complex
Transaction
F1 F2 F3 F4 F5ID1
ID2
ID3
ID4
ID5
F1
F1
F1
F1
F2
F2 F3 F4 F5 F6
F2 F3 F4 F5 Fx
F8
F3
F9 F7
Concepts
Database
Collection
Collection Collection Collection
CollectionCollection
Database
Collection
Collection Collection Collection
CollectionCollection
Database
Collection
Collection Collection Collection
CollectionCollection
Database
Collection
Collection Collection Collection
CollectionCollection
Table = Collection
Column = Field
Row = Document
Database = Database
GridFS
MongoDB divides the
large content into
chunks
Stores Metadata
and Chunksseparately
http://docs.mongodb.org/manual/core/gridfs/
> mybucket.files
{ "_id" : ObjectId("514d5cb8c2e6ea4329646a5c"),
"chunkSize" : NumberLong(262144),
"length" : NumberLong(103015),
"md5" : "34d29a163276accc7304bd69c5520e55",
"filename" : "health_record_2.xml",
"contentType" : application/xml,
"uploadDate" : ISODate("2013-03-23T07:41:44.907Z"),
"aliases" : null,
"metadata" : { "fname" : "Aparna", "lname" : "Chaudhary","country" :
"Netherlands" }
}
ObjectId - 12 Byte BSON:
4 Byte - Seconds since Epoch
3 Byte - Machine Id
2 Byte - Process Id
3 Byte - Counter
> mybucket.chunks
{ "_id" :
ObjectId("514d5cb8c2e6ea4329646a5d"),
"files_id" :
ObjectId("514d5cb8c2e6ea4329646a5c"),
"n" : 0,
"data" : BinData(0,...)
}
?
I'm storing 10KB file, but
would it use 256KB on disk?
Last Chunk =
FileSize % 256
+
Metadata overhead
256
1128KB
256 256 256
104
+ x
10KB
10
+ x
Chunk is as
big as it
needs to be...
Challenge #1
DEV
MongoDB supports Dynamic
Schema.
You can use collection per
docType and they are
created dynamically.
RA
We need runtime support
for document type.
Challenge #2
RA
Document type is 50 char
long
DEV
MongoDB namespace can
be up to 123 char.
So...finally...
Simple use case
remains simple...well becomes
simpler...
Read XML
Extract Metadata
Store Metadata &
Content
Remember...
Our Challenge
QA
Let's see if we are in spec
for response time.
DEV
Performance test is part of
our definition of 'DONE'
BEcause seeing is believing!
Demo
‣ GridFS 2.4.0
‣ PostgreSQL 9.2
‣ Spring Data
‣ JMeter 2.7
‣ Mac OS X 10.8.3 2.3GHz
Quad-Core Intel Core i7,
16GB RAM
https://github.com/aparnachaudhary/nosql-matters-demo
EMBRACE
POLYGLOT
PERSISTENCE!
STOP
RDBMS ABUSE!
KNOW YOUR
USE CASE
@aparnachaudhary
Java Developer, Data Lover
Eindhoven, Netherlands
http://blog.aparnachaudhary.com/
@aparnachaudhary
Thank You!

Look Ma! No more blobs