Strongly Typed Languages and Flexible
Schemas
3
Agenda
Strongly Typed Languages
Flexible Schema Databases
Change Management
Strategies
Tradeoffs
Strongly Typed Languages
"Aprogramming language that requires a
variable to be defined as well as the variable it
is"
Flexible Schema Databases
7
Traditional RDMS
create table users (id int, firstname text, lastname text);Table definition
Column structure
8
Traditional RDMS
Table with
checks
create table cat_pictures(
id int not null,
size int not null,
picture blob not null,
user_id int,
primary key (id),
foreign key (user_id) references users(id));
Null checks
Foreign and Primary
key checks
9
Traditional RDMS
users cat_pictures
1 N
10
Is this Flexible?
• What happens when we need to
change the schema?
– Add new fields
– Add new relations
– Change data types
• What happens when we need to
scale out our data structure?
11
Flexible Schema Database
Document Graph Key Value
12
Flexible Schema
• No mandatory schema definition
• No structure restrictions
• No schema validation process
13
We start from code
public class CatPicture {
int size;
byte[] blob;
}
public class User {
int id;
String firstname;
String lastname;
CatPicture[] cat_pictures;
}
14
Document Structure
{
_id: 1234,
firstname: 'Juan',
lastname: 'Olivo',
cat_pictures: [ {
size: 10,
picture: BinData("0x133334299399299432"),
}
]
}
Rich Data Types
Embedded
Documents
15
Flexible Schema Databases
• Challenges
–Different Versions of Documents
–Different Structures of Documents
–Different Value Types for Fields in
Documents
16
Different Versions of Documents
Same document across time suffers changes on how it
represents data
{ "_id" : 174, "firstname": "Juan" }
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" }
First Version
Second Version
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" , "cat_pictures":
[{"size": 10, picture: BinData("0x133334299399299432")}]
}
Third Version
17
Different Versions of Documents
Same document across time suffers changes on how it
represents data
{ "_id" : 174, "firstname": "Juan" }
{ "_id" : 174, "name": { "first": "Juan", "last": "Olivo"} }
Different Structure
18
Different Structures of Documents
Different documents coexisting on the same collection
{ "_id" : 175, "brand": "Ford", "model": "Mustang", "date": ISODate("XXX") }
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" }
Within same collection
19
Different Data Types for Fields
Different documents coexisting on the same collection
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "bdate": 1224234312}
{ "_id" : 175, "firstname": "Paco", "lastname": "Hernan", "bdate": "2015-06-27"}
{ "_id" : 176, "firstname": "Tomas", "lastname": "Marce", "bdate": ISODate("2015-06-27")}
Same field, different data type
Change Management
21
Change Management
Versioning Class Loading
How to set correct data format
versioning?
What mechanisms are out there to
make this work ?
Strategies
23
Strategies
• Decoupling Architectures
• ODM'S
• Versioning
• Data Migrations
Decoupled Architectures
25
Strongly Coupled
26
Becomes a mess in your hair…
Coupled Architectures
DatabaseApplication A
Application C
Application B Let me perform
some schema
changes!
Decoupled Architecture
DatabaseApplication A API
Application C
Application B
29
Decoupled Architectures
• Allows the business logic to evolve independently of the
data layer
• Decouples the underlying storage / persistency option
from the business service
• Changes are "requested" and not imposed across all
applications
• Better versioning control of each request and it's
mapping
ODM's
31
ODM
• Reduce impedance between code and Databases
• Data management facilitator
• Hides complexity of operators
• Tries to decouple business complexity with "magic"
recipes
32
Spring Data
• POJO centric model
• MongoTemplate || CrudRepository
extensions to make the connection to the
repositories
• Uses annotations to override default field names
and even data types (data type mapping)
public interface UserRepository extends
MongoRepository<User, Integer>{
}
public class User {
@Id
int id;
@Field("first_name")
String firstname;
String lastname;
33
Spring Data Document Structure
{
"_id": 1,
"first_name": "first",
"lastname": "last",
"catpictures": [
{
"size": 10,
"blob": BinData(0, "Kr3AqmvV1R9TJQ==")
},
]
}
34
Spring Data Considerations
• Data formats, versions and types still need to be
managed
• Does not solve issues like type validation out-of-box
• Can make things more complicated but more
"controllable"
@Field("first_name")
String firstname;
35
Morphia
• Data source centric
• Will do all the discovery of POJO's for
given package
• Also uses annotations to perform
overrides and deal with object mapping
@Entity("users")
public class User {
@Id
int id;
String firstname;
String lastname;
morphia.mapPackage("examples.odms.morphia.pojos");
Datastore datastore = morphia.createDatastore(new MongoClient(),
"morphia_example");
datastore.save(user);
36
Morphia Document Structure
{
"_id": 1,
"className": "examples.odms.morphia.pojos.User",
"firstname": "first",
"lastname": "last",
"catpictures": [
{
"size": 10,
"blob": BinData(0, "Kr3AqmvV1R9TJQ==")
},
]
}
Class Definition
37
Morphia Considerations
• Enables better control at Class loading
• Also facilitates, like Spring Data, the field overriding (tags
to define field keys)
• Better support for Object Polymorphism
Versioning
39
Versioning
Versioning of data structures (specially documents) can be
very helpful
Recreate documents over time
Flow Control
Data / Field Multiversion Requirements
Archiving and History Purposes
40
Versioning – Option 0
Change existing document each time there is a write with
monotonically increasing version number inside
{ "_id" : 174, "v" : 1, "firstname": "Juan" }
{ "_id" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }
{ "_id" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
> db.users.update( {"_id":174 } , { {"$set" :{ ... }, {"$inc": { "v": 1 }} } )
Increment field
value
41
Versioning – Option 1
Store full document each time there is a write with
monotonically increasing version number inside
{ "docId" : 174, "v" : 1, "firstname": "Juan" }
{ "docId" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }
{ "docId" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
> db.users.insert( {"docId":174 …})
> db.docs.find({"docId":174}).sort({"v":-1}).limit(-1);
Find always latest
version
42
Versioning – Option 2
Store all document versions inside a single document.
> db.users.update( {"_id": 174 } , { {"$set" :{ "current": ... },
{"$inc": { "current.v": 1 }}, {"$addToSet": {"prev": {... }}} } )
Current value
{ "_id" : 174, "current" : { "v" :3, "attr1": 184, "attr2" : "A-1" },
"prev" : [
{ "v" : 1, "attr1": 165 },
{ "v" : 2, "attr1": 165, "attr2": "A-1" }
]
}
Previous values
43
Versioning – Option 3
Keep collection for "current" version and past versions
> db.users.find( {"_id": 174 })
> db.users_past.find( {"pid": 174 })
{ "pid" : 174, "v" : 1, "firstname": "Juan" }
{ "pid" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }
{ "_id" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
Previous versions
collection
Current collection
44
Versioning
Schema Fetch 1 Fetch Many Update Recover if
Fail
0) Increment
Version
Easy, Fast Fast Easy Medium N/A
1) New
Document
Easy, Fast Not Easy,
Slow
Medium Hard
2) Embedded in
Single Doc
Easy,
Fastest
Easy, Fastest Medium N/A
3) Separate
Collection
Easy,
Fastest
Easy, Fastest Medium Medium, Hard
Migrations
46
Migrations
Several types of "Migrations":
Add/Remove Fields
Change Field Names
Change Field Data Type
Extract Embedded Document into Collection
47
Add / Remove Fields
For Flexible Schema Database this is our Bread & Butter
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "newfield": "value" }
> db.users.update( {"_id": 174}, {"$set": { "newfield": "value"
}, "$unset": {"gender":""} })
48
Change Field Names
Again, programmatically you can do it
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo",}
{ "_id" : 174, "first": "Juan", "last": "Olivo" }
> db.users.update( {"_id": 174}, {"$rename": { "firstname":
"first", "lastname":"last"} })
49
Change Field Data Type
Align to a new code change and move from Int to String
{..."bdate": 1435394461522} {..."bdate": "2015-06-27"}
1) Batch Process
2) Aggregation Framework
3) Change based on usage
50
Change Field Data Type
1) Batch Process – bulk api
public void migrateBulk(){
DateFormat df = new SimpleDateFormat("yyyy-MM-DD");
...
List<UpdateOneModel<Document>> toUpdate =
new ArrayList<UpdateOneModel<Document>>();
for (Document doc : coll.find()){
String dateAsString = df.format( new Date( doc.getInteger("bdate", 0) ));
Document filter = new Document("_id", doc.getInteger("_id"));
Document value = new Document("bdate", dateAsString);
Document update = new Document("$set", value);
toUpdate.add(new UpdateOneModel<Document>(filter, update));
}
coll.bulkWrite(toUpdate);
51
Change Field Data Type
1) Batch Process – bulk api
public void migrateBulk(){
...
for (Document doc : coll.find()){
...
}
coll.bulkWrite(toUpdate);
Is there any problem with
this?
52
Change Field Data Type
1) Batch Process – bulk api
public void migrateBulk(){
...
//bson type 16 represents int32 data type
Document query = new Document("bdate", new Document("$type", "16"));
for (Document doc : coll.find(query)){
...
}
coll.bulkWrite(toUpdate);
More efficient filtering!
53
Extract Document into Collection
Normalize your schema
{"size": 10, picture: BinData("0x133334299399299432")}
{ "_id" : 174, "firstname": "Juan",
"lastname": "Olivo",}
> db.users.aggregate( [
{$unwind: "$cat_pictures"},
{$project: { "_id":0, "uid":"$_id", "size": "$cat_pictures.size",
"picture": "$cat_pictures.picture"}},
{$out:"cats"}])
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" , "cat_pictures":
[{"size": 10, picture: BinData(0, "m/lhLlLmoNiUKQ==")}]
}
{"size": 10, "picture": BinData(0, "m/lhLlLmoNiUKQ==")}
Tradeoffs
55
Tradeoffs
Positives Penalties
Decoupled Architecture - Should be your default
approach
- Clean Solution
- Scalable
N/A
Data Structures Variability - Reflects Nowadays data
structures
- You can push decisions for
later
- More complex code base
Data Structures Strictness - Simple to maintain
- Always aligned with your
code base
- Will eventually need
Migrations
- Restricts your code
iterations
Recap
57
Recap
• Flexible and Dynamic Schemas are a great tool
– Use them wisely
– Make sure you understand the tradeoffs
– Make sure you understand the different strategies and
options
• Works well with Strongly Typed Languages
58
Free Education
https://university.mongodb.com/courses/M101J/about
Obrigado!
Norberto Leite
Technical Evangelist
http://www.mongodb.com/norberto
norberto@mongodb.com
@nleite
Webinar: Strongly Typed Languages and Flexible Schemas

Webinar: Strongly Typed Languages and Flexible Schemas

  • 2.
    Strongly Typed Languagesand Flexible Schemas
  • 3.
    3 Agenda Strongly Typed Languages FlexibleSchema Databases Change Management Strategies Tradeoffs
  • 4.
  • 5.
    "Aprogramming language thatrequires a variable to be defined as well as the variable it is"
  • 6.
  • 7.
    7 Traditional RDMS create tableusers (id int, firstname text, lastname text);Table definition Column structure
  • 8.
    8 Traditional RDMS Table with checks createtable cat_pictures( id int not null, size int not null, picture blob not null, user_id int, primary key (id), foreign key (user_id) references users(id)); Null checks Foreign and Primary key checks
  • 9.
  • 10.
    10 Is this Flexible? •What happens when we need to change the schema? – Add new fields – Add new relations – Change data types • What happens when we need to scale out our data structure?
  • 11.
  • 12.
    12 Flexible Schema • Nomandatory schema definition • No structure restrictions • No schema validation process
  • 13.
    13 We start fromcode public class CatPicture { int size; byte[] blob; } public class User { int id; String firstname; String lastname; CatPicture[] cat_pictures; }
  • 14.
    14 Document Structure { _id: 1234, firstname:'Juan', lastname: 'Olivo', cat_pictures: [ { size: 10, picture: BinData("0x133334299399299432"), } ] } Rich Data Types Embedded Documents
  • 15.
    15 Flexible Schema Databases •Challenges –Different Versions of Documents –Different Structures of Documents –Different Value Types for Fields in Documents
  • 16.
    16 Different Versions ofDocuments Same document across time suffers changes on how it represents data { "_id" : 174, "firstname": "Juan" } { "_id" : 174, "firstname": "Juan", "lastname": "Olivo" } First Version Second Version { "_id" : 174, "firstname": "Juan", "lastname": "Olivo" , "cat_pictures": [{"size": 10, picture: BinData("0x133334299399299432")}] } Third Version
  • 17.
    17 Different Versions ofDocuments Same document across time suffers changes on how it represents data { "_id" : 174, "firstname": "Juan" } { "_id" : 174, "name": { "first": "Juan", "last": "Olivo"} } Different Structure
  • 18.
    18 Different Structures ofDocuments Different documents coexisting on the same collection { "_id" : 175, "brand": "Ford", "model": "Mustang", "date": ISODate("XXX") } { "_id" : 174, "firstname": "Juan", "lastname": "Olivo" } Within same collection
  • 19.
    19 Different Data Typesfor Fields Different documents coexisting on the same collection { "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "bdate": 1224234312} { "_id" : 175, "firstname": "Paco", "lastname": "Hernan", "bdate": "2015-06-27"} { "_id" : 176, "firstname": "Tomas", "lastname": "Marce", "bdate": ISODate("2015-06-27")} Same field, different data type
  • 20.
  • 21.
    21 Change Management Versioning ClassLoading How to set correct data format versioning? What mechanisms are out there to make this work ?
  • 22.
  • 23.
    23 Strategies • Decoupling Architectures •ODM'S • Versioning • Data Migrations
  • 24.
  • 25.
  • 26.
    26 Becomes a messin your hair…
  • 27.
    Coupled Architectures DatabaseApplication A ApplicationC Application B Let me perform some schema changes!
  • 28.
    Decoupled Architecture DatabaseApplication AAPI Application C Application B
  • 29.
    29 Decoupled Architectures • Allowsthe business logic to evolve independently of the data layer • Decouples the underlying storage / persistency option from the business service • Changes are "requested" and not imposed across all applications • Better versioning control of each request and it's mapping
  • 30.
  • 31.
    31 ODM • Reduce impedancebetween code and Databases • Data management facilitator • Hides complexity of operators • Tries to decouple business complexity with "magic" recipes
  • 32.
    32 Spring Data • POJOcentric model • MongoTemplate || CrudRepository extensions to make the connection to the repositories • Uses annotations to override default field names and even data types (data type mapping) public interface UserRepository extends MongoRepository<User, Integer>{ } public class User { @Id int id; @Field("first_name") String firstname; String lastname;
  • 33.
    33 Spring Data DocumentStructure { "_id": 1, "first_name": "first", "lastname": "last", "catpictures": [ { "size": 10, "blob": BinData(0, "Kr3AqmvV1R9TJQ==") }, ] }
  • 34.
    34 Spring Data Considerations •Data formats, versions and types still need to be managed • Does not solve issues like type validation out-of-box • Can make things more complicated but more "controllable" @Field("first_name") String firstname;
  • 35.
    35 Morphia • Data sourcecentric • Will do all the discovery of POJO's for given package • Also uses annotations to perform overrides and deal with object mapping @Entity("users") public class User { @Id int id; String firstname; String lastname; morphia.mapPackage("examples.odms.morphia.pojos"); Datastore datastore = morphia.createDatastore(new MongoClient(), "morphia_example"); datastore.save(user);
  • 36.
    36 Morphia Document Structure { "_id":1, "className": "examples.odms.morphia.pojos.User", "firstname": "first", "lastname": "last", "catpictures": [ { "size": 10, "blob": BinData(0, "Kr3AqmvV1R9TJQ==") }, ] } Class Definition
  • 37.
    37 Morphia Considerations • Enablesbetter control at Class loading • Also facilitates, like Spring Data, the field overriding (tags to define field keys) • Better support for Object Polymorphism
  • 38.
  • 39.
    39 Versioning Versioning of datastructures (specially documents) can be very helpful Recreate documents over time Flow Control Data / Field Multiversion Requirements Archiving and History Purposes
  • 40.
    40 Versioning – Option0 Change existing document each time there is a write with monotonically increasing version number inside { "_id" : 174, "v" : 1, "firstname": "Juan" } { "_id" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" } { "_id" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" } > db.users.update( {"_id":174 } , { {"$set" :{ ... }, {"$inc": { "v": 1 }} } ) Increment field value
  • 41.
    41 Versioning – Option1 Store full document each time there is a write with monotonically increasing version number inside { "docId" : 174, "v" : 1, "firstname": "Juan" } { "docId" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" } { "docId" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" } > db.users.insert( {"docId":174 …}) > db.docs.find({"docId":174}).sort({"v":-1}).limit(-1); Find always latest version
  • 42.
    42 Versioning – Option2 Store all document versions inside a single document. > db.users.update( {"_id": 174 } , { {"$set" :{ "current": ... }, {"$inc": { "current.v": 1 }}, {"$addToSet": {"prev": {... }}} } ) Current value { "_id" : 174, "current" : { "v" :3, "attr1": 184, "attr2" : "A-1" }, "prev" : [ { "v" : 1, "attr1": 165 }, { "v" : 2, "attr1": 165, "attr2": "A-1" } ] } Previous values
  • 43.
    43 Versioning – Option3 Keep collection for "current" version and past versions > db.users.find( {"_id": 174 }) > db.users_past.find( {"pid": 174 }) { "pid" : 174, "v" : 1, "firstname": "Juan" } { "pid" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" } { "_id" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" } Previous versions collection Current collection
  • 44.
    44 Versioning Schema Fetch 1Fetch Many Update Recover if Fail 0) Increment Version Easy, Fast Fast Easy Medium N/A 1) New Document Easy, Fast Not Easy, Slow Medium Hard 2) Embedded in Single Doc Easy, Fastest Easy, Fastest Medium N/A 3) Separate Collection Easy, Fastest Easy, Fastest Medium Medium, Hard
  • 45.
  • 46.
    46 Migrations Several types of"Migrations": Add/Remove Fields Change Field Names Change Field Data Type Extract Embedded Document into Collection
  • 47.
    47 Add / RemoveFields For Flexible Schema Database this is our Bread & Butter { "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "gender": "M" } { "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "newfield": "value" } > db.users.update( {"_id": 174}, {"$set": { "newfield": "value" }, "$unset": {"gender":""} })
  • 48.
    48 Change Field Names Again,programmatically you can do it { "_id" : 174, "firstname": "Juan", "lastname": "Olivo",} { "_id" : 174, "first": "Juan", "last": "Olivo" } > db.users.update( {"_id": 174}, {"$rename": { "firstname": "first", "lastname":"last"} })
  • 49.
    49 Change Field DataType Align to a new code change and move from Int to String {..."bdate": 1435394461522} {..."bdate": "2015-06-27"} 1) Batch Process 2) Aggregation Framework 3) Change based on usage
  • 50.
    50 Change Field DataType 1) Batch Process – bulk api public void migrateBulk(){ DateFormat df = new SimpleDateFormat("yyyy-MM-DD"); ... List<UpdateOneModel<Document>> toUpdate = new ArrayList<UpdateOneModel<Document>>(); for (Document doc : coll.find()){ String dateAsString = df.format( new Date( doc.getInteger("bdate", 0) )); Document filter = new Document("_id", doc.getInteger("_id")); Document value = new Document("bdate", dateAsString); Document update = new Document("$set", value); toUpdate.add(new UpdateOneModel<Document>(filter, update)); } coll.bulkWrite(toUpdate);
  • 51.
    51 Change Field DataType 1) Batch Process – bulk api public void migrateBulk(){ ... for (Document doc : coll.find()){ ... } coll.bulkWrite(toUpdate); Is there any problem with this?
  • 52.
    52 Change Field DataType 1) Batch Process – bulk api public void migrateBulk(){ ... //bson type 16 represents int32 data type Document query = new Document("bdate", new Document("$type", "16")); for (Document doc : coll.find(query)){ ... } coll.bulkWrite(toUpdate); More efficient filtering!
  • 53.
    53 Extract Document intoCollection Normalize your schema {"size": 10, picture: BinData("0x133334299399299432")} { "_id" : 174, "firstname": "Juan", "lastname": "Olivo",} > db.users.aggregate( [ {$unwind: "$cat_pictures"}, {$project: { "_id":0, "uid":"$_id", "size": "$cat_pictures.size", "picture": "$cat_pictures.picture"}}, {$out:"cats"}]) { "_id" : 174, "firstname": "Juan", "lastname": "Olivo" , "cat_pictures": [{"size": 10, picture: BinData(0, "m/lhLlLmoNiUKQ==")}] } {"size": 10, "picture": BinData(0, "m/lhLlLmoNiUKQ==")}
  • 54.
  • 55.
    55 Tradeoffs Positives Penalties Decoupled Architecture- Should be your default approach - Clean Solution - Scalable N/A Data Structures Variability - Reflects Nowadays data structures - You can push decisions for later - More complex code base Data Structures Strictness - Simple to maintain - Always aligned with your code base - Will eventually need Migrations - Restricts your code iterations
  • 56.
  • 57.
    57 Recap • Flexible andDynamic Schemas are a great tool – Use them wisely – Make sure you understand the tradeoffs – Make sure you understand the different strategies and options • Works well with Strongly Typed Languages
  • 58.
  • 59.

Editor's Notes

  • #6 Do not confuse strongly typed with statically typed languages because they tend to be different. You can find different definitions out in the internet on what does this means and what's the different categorization aspects
  • #8 Definition
  • #9 Definition
  • #10 Once you have this structure you can now start building up your application
  • #12 There are few examples of Flexible schema databases: Document oriented databases Graph databases Key Value Stores
  • #13 No mandatory schema definiton If a collection does not exist one will be created If a database does not exist one will be created No Structure Restrictions No forced fields or data types
  • #14 Definition
  • #15 Definition
  • #34 Definition
  • #37 Definition
  • #40 You must correctly generate the new version number in a multithreaded system You must return only the current version of each document when there is a query You must "update" correctly by including all current attributes in addition to newly provided attributes If the system fails at any point, you must either have a consistent state of the data, or it must be possible on re-start to infer the state of the data and clean it up, or otherwise bring it to consistent state.
  • #47 You must correctly generate the new version number in a multithreaded system You must return only the current version of each document when there is a query You must "update" correctly by including all current attributes in addition to newly provided attributes If the system fails at any point, you must either have a consistent state of the data, or it must be possible on re-start to infer the state of the data and clean it up, or otherwise bring it to consistent state.
  • #59 Next Session starting on Aug 04