-
1.
Building Java
Applications
on MongoDB
Aveek Bushan
aveekshith.bushan@mongodb.com
Solutions Architect – APAC Lead
• @aveekshith
-
2.
Contents
• From
Sampling
to
n=all
• Implica1ons
from
a
Data
Standpoint
• Building
an
Applica1on
in
Java
• Core
Features
of
MongoDB
• Connec1ng
the
Dots
-
3.
Random Sample
Image
Source:
SurveyMonkey
Sampling
• Based
on
Random
Sampling
• Used
in
a
variety
of
fields
–
Opinion
Polls,
Bug
Es1ma1on
etc
Issues
• Loss
of
detail
–
3%
margin
of
error
• Are
the
samples
truly
Random?
• Outliers
might
have
very
interes1ng
informa1on
• Black
Swan
Events
have
a
massive
impact
that
cannot
be
captured
in
a
Normal
Distribu1on
-
4.
N=All
Causa1on
to
Correla1on
From
Why
to
What
Ever-‐more
Ubiquitous
access
to
the
Digital
World
Cost
of
Storage
has
plummeted
over
the
years
Ability
to
process
Unstructured
and
semi-‐structured
informa1on
SoUware
tools
that
can
process
the
data
at
real-‐1me
Source:
Big
Data
–
Viktor
Mayer-‐Schönberger
and
Kenneth
Cukier
-
5.
Data Implications
-
6.
Data Implications
Rich
Data
Data
Variety
Fast
Processing
Data
Availability
Data
Volume
Geo-‐
Spa1al
Real-‐1me
Access
Data
Durability
-
7.
Expressive
Query
Language
Strong
Consistency
Secondary
Indexes
Flexibility
Scalability
Performance
MongoDB - Nexus Architecture
Relational + NoSQL
-
8.
Example Application Requirements
• Skillsets of Employees
• Certification and Skill level
• Dashboard View of real-time data
• Scalable, Reliable and Performant
Database
-
9.
Design the Schema
Embedded
Informa1on
Sub-‐documents,
Arrays
etc
Na1vely
Supported
Differing
Data
RD
DVa
FP
DA
DVo
GS
RTA
DD
-
10.
Preparing the Java Application
• Add
the
driver
Libraries
to
the
Classpath
3.0
New
Features
– Generic
MongoCollec1on
Interface
– New
Asynchronous
API
– New
Codec
Infrastructure
– New
Core
Driver
• Start
the
MongoDB
instance.
Let’s
start
with
a
standalone
instance.
For
a
write-‐
performant
storage
engine,
start
the
mongod
with
–storageEngine
wiredTiger
-
11.
Build the Java Object
Or
Use
a
Object-‐
Document
Mapper
such
as
Morphia
@En1ty
public
class
coll
{
@Id
private
int
id;
private
String
name;
@Embedded
private
List<SkillsPOJO>
skills;
@Embedded
private
InfoPOJO
info;
@Embedded
public
class
SkillsPOJO
{
private
String
skill;
private
int
level;
private
String
version;
private
boolean
cer1fied;
//
Similarly
for
Info
POJO
public
class
DataObject
{
private
int
id;
private
String
name;
private
List<SkillObject>
obj;
private
InfoObject
info;
public
class
SkillObject
{
private
String
skill;
private
int
level;
private
String
version;
private
boolean
cer1fied;
public
class
InfoObject
{
private
String
dept;
private
int
experience;
private
List<Double>
gps;
private
String
loca1on;
private
boolean
reviewed;
-
12.
DB
Tier
Connect to MongoDB
mongod
Java
Client
Driver
public
void
MongoConnect(String[]
hosts)
{
List<ServerAddress>
seeds
=
new
ArrayList<ServerAddress>();
for
(String
h
:
hosts)
{
//
MongoDB
Server
address
and
Port
seeds.add(new
ServerAddress(h));
}
//
MongoDB
client
with
internal
connec1on
pooling.
client
=
new
MongoClient(seeds);
//
The
database
to
connect
to
database
=
client.getDatabase("mydb");
//
The
collec1on
to
connect
to
collec6on
=
database.getCollec/on("coll");
}
import
com.mongodb.MongoClient;
import
com.mongodb.client.MongoCollec1on;
import
com.mongodb.client.MongoDatabase;
-
13.
Or Use an ODM
import
com.mongodb.MongoClient;
import
com.mongodb.client.MongoCollec1on;
import
com.mongodb.client.MongoDatabase;
import
org.mongodb.morphia.Datastore;
import
org.mongodb.morphia.Morphia;
public
void
MorphiaConnect(String[]
hosts)
{
List<ServerAddress>
seeds
=
new
ArrayList<ServerAddress>();
for
(String
h
:
hosts)
{
seeds.add(new
ServerAddress(h));
}
client
=
new
MongoClient(seeds);
morphia
=
new
Morphia();
//
Map
the
Morphia
Object
morphia.map(coll.class).map(SkillsPOJO.class).
map(InfoPOJO.class);
//
Create
a
datastore
to
interact
with
MongoDB
//
using
POJOs
ds
=
morphia.createDatastore(client,
"mydb");
}
DB
Tier
mongod
Java
Client
Driver
-
14.
Authentication
String
dbName
=
”testdb";
String
userName
=
"user1";
char[]
password
=
{‘p',’w',’d'};
MongoCreden1al
creden1al
=
MongoCredenAal.createMongoCRCreden/al(
dbName,
userName,
password);
//
With
the
appropriate
Creden1al
client
=
new
MongoClient(seeds,
Arrays.asList(creden1al));
-
15.
Perform some Inserts
Using
Morphia
Document
doc
=
new
Document("_id",
emplList.get(i).getId())
.append("name",
emplList.get(i).getName())
.append("skills",
skillBOList)
.append("info",
new
Document("dept",
info.getDept())
.append("yearsexp",
info.getExperience())
.append("gps",
info.getGPS())
.append("loca1on",
info.getLoca1on()));
collec/on.insertOne(doc);
import
org.bson.Document;
import
com.mongodb.client.MongoCollec1on;
public
void
insert(List<coll>
emplList)
throws
InterruptedExcep1on
{
ds.save(emplList);
}
RD
DVa
FP
DA
DVo
GS
RTA
DD
-
16.
Async Operations
//
Factory
of
MongoClient
Instances
client
=
MongoClients.create("mongodb://localhost");
database
=
client.getDatabase("mydb");
collec6on
=
database.getCollec6on("coll");
…
//
methods
that
cause
network
IO
take
a
SingleResponseCallback<T>
and
return
immediately
collec6on.insertOne(doc,
new
SingleResultCallback<Void>()
{
@Override
public
void
onResult(final
Void
result,
final
Throwable
t)
{
System.out.println("Inserted!");
}
});
…
import
com.mongodb.async.SingleResultCallback;
import
com.mongodb.async.client.*;
RD
DVa
FP
DA
DVo
GS
RTA
DD
-
17.
Retrieve the Data
import
sta1c
com.mongodb.client.model.Filters.*;
…
public
void
read(int
id)
{
Document
myDoc
=
collec/on.find(eq("_id",
id)).first();
System.out.println("Read
Document
with
id:
"
+
id
+
"n”
+
myDoc.toJson()
+
"n");
…
}
Using
Morphia
List<coll>
empl
=
ds.createQuery(coll.class).filter("id
=",
id)
.asList();
RD
DVa
FP
DA
DVo
GS
RTA
DD
-
18.
Retrieving a Datapoint
{
"_id"
:
5,
"name"
:
"John
Snow",
"skills"
:
[
{
"name"
:
"java",
"level"
:
3,
"cerAfied"
:
true
},
{
"name"
:
"mongo",
"level"
:
5
}
],
"info"
:
{
"dept"
:
"A91",
"yearsexp"
:
3,
"gps"
:
[-‐74.00597,
40.71427],
"locaAon"
:
"New
York"
}
}
RD
DVa
FP
DA
DVo
GS
RTA
DD
-
19.
Geo-Location Query
import
sta1c
com.mongodb.client.model.Filters.*;
…
public
void
read(List<Double>
gps,
Double
maxDistance,
Double
minDistance)
{
double
longitude
=
gps.get(0);
double
la1tude
=
gps.get(1);
collec6on.createIndex(new
Document("info.gps",
"2dsphere"));
MongoCursor<Document>
cursor
=
collec/on.find(
near("info.gps",
new
Point(
new
PosiAon(longitude,
laAtude)),
maxDistance,
minDistance)).iterator();
while
(cursor.hasNext())
{
…
}
…
}
RD
DVa
FP
DA
DVo
GS
RTA
DD
-
20.
Geo-Location - Output
• Query to get all employees in and around Boston(GPS coordinates Lat
42.35843, Long -71.05977), within maxDistance of 400,000 Ms
{
"_id"
:
5,
"name"
:
"John
Snow",
"skills"
:
[
{
"name"
:
"java",
"level"
:
3,
"cerAfied"
:
true
},
{
"name"
:
"mongo",
"level"
:
5
}
],
"info"
:
{
"dept"
:
"A91",
"yearsexp"
:
3,
"gps"
:
[-‐74.00597,
40.71427],
"locaAon"
:
"New
York"
}
}
{
"_id"
:45,
"name"
:
”Jack
Kingsley",
"skills"
:
[
{
"name"
:
”c++",
"level"
:
4
},
{
"name"
:
"mongo",
"level"
:
2,
“version”:
“3.0”
}
],
"info"
:
{
"dept"
:
”A83",
"yearsexp"
:
18,
"gps"
:
[-‐71.05977,
42.35843],
"locaAon"
:
”Boston"
}
RD
DVa
FP
DA
DVo
GS
RTA
DD
-
21.
Update the Data
import
sta1c
com.mongodb.client.model.Filters.*;
…
Map<String,
Object>
updateOps
=
new
HashMap<String,
Object>();
updateOps.put("$inc",
new
Document("info.yearsexp",
1));
updateOps.put("$set",
new
Document("info.reviewed",
true));
result
=
collec/on.updateOne(eq("_id",
id),
new
Document(updateOps));
Using
Morphia
Query<coll>
query
=
ds.createQuery(coll.class).field("id").equal(id);
UpdateOpera1ons<coll>
ops
=
ds.createUpdateOpera/ons(coll.class)
.inc("info.experience",
1)
.set("info.reviewed",
true);
ds.update(query,
ops);
RD
DVa
FP
DA
DVo
GS
RTA
DD
-
22.
Update - Output
• Data point has been reviewed after 1 more year of employment
{
"_id"
:
5,
"name"
:
"John
Snow",
"skills"
:
[
{
"name"
:
"java",
"level"
:
3,
"cer1fied"
:
true
},
{
"name"
:
"mongo",
"level"
:
5
}
],
"info"
:
{
"dept"
:
"A91",
"yearsexp"
:
3,
"gps"
:
[-‐74.00597,
40.71427],
"loca1on"
:
"New
York"
}
}
{
"_id"
:
5,
"name"
:
"John
Snow",
"skills"
:
[
{
"name"
:
"java",
"level"
:
3,
"cer1fied"
:
true
},
{
"name"
:
"mongo",
"level"
:
5
}
],
"info"
:
{
"dept"
:
"A91",
"yearsexp"
:
4,
"gps"
:
[-‐74.00597,
40.71427],
"loca1on"
:
"New
York”,
“reviewied”
:
true
}
}
RD
DVa
FP
DA
DVo
GS
RTA
DD
-
23.
Delete Data
Using
Morphia
import
sta1c
com.mongodb.client.model.Filters.*;
…
public
void
delete(int
id)
{
collec/on.deleteOne(eq("_id",
id));
System.out.println("Deleted
Document
with
id:
"
+
id
+
"n");
…
}
public
void
delete(int
id)
{
Query<coll>
query
=
ds.createQuery(coll.class)
.field("id").equal(id);
ds.delete(query);
…
}
RD
DVa
FP
DA
DVo
GS
RTA
DD
-
24.
Replica
Set
High Availability
Secondary
Secondary
Primary
Java
Client
Driver
✔
✔
✔
• Automated
Fail-‐
over
• Rolling
upgrades
• Mul1
Data
Center
Support
• Data
Durability
and
Strong
Consistency
Heartbeat
RD
DVa
FP
DA
DVo
GS
RTA
DD
-
25.
MongoDB set up
Use MongoDB OpsManager or Cloud Manager Automation to set up the cluster
(or)
sudo mongod --port 27017 --dbpath /data/rs1 --replSet rs --logpath /logs/rs1.log --fork
sudo mongod --port 27018 --dbpath /data/rs2 --replSet rs --logpath /logs/rs2.log --fork
sudo mongod --port 27019 --dbpath /data/rs3 --replSet rs --logpath /logs/rs3.log --fork
mongo --port 27017
> config = { "_id" : "rs", "members" : [
... {"host":"localhost:27017", "_id":0},
... {"host":"localhost:27018", "_id":1},
... {"host":"localhost:27019", "_id":2}
... ]
... }
rs.initiate(config)
In
the
Java
Program,
pass
the
addresseses
and
Ports
of
the
replica
set
members
as
part
of
the
Connec1on
String
RD
DVa
FP
DA
DVo
GS
RTA
DD
-
26.
Ensuring Durability
• By
default,
WriteConcern
is
Acknowledged
=>
received
write
opera1on
and
has
applied
the
change
in-‐memory
• Primary
Server
crash
means
that
the
data
might
be
lost
• Stricter
WriteConcern
such
as
Majority
or
w:2
for
(int
retry
=
0;
retry
<
3;
retry++)
{
try
{
collec6on.withWriteConcern(WriteConcern.MAJORITY)
.insertOne(doc);
break;
}
catch
(Excep1on
e)
{
e.getMessage();
Thread.sleep(5000);
}
}
RD
DVa
FP
DA
DVo
GS
RTA
DD
-
27.
Eventual Consistency
Repor1ng
Applica1on
Driver
Replica
Set
P
S
S
• Read
from
the
nearest
node
for
lower
latency
• Read-‐only
applica1ons
where
eventual
consistency
is
OK
–
For
Ex:
Repor1ng
Applica1ons
• Can
be
achieved
using
ReadPreference
in
MongoDB
• Modes
of
Primary,
PrimaryPreferred,
Secondary,
SecondaryPreferred
and
Nearest
Repor1ng
Applica1on
and
Secondary
Member
are
on
the
same
DC
myDoc
=
collec6on
.withReadPreference(ReadPreference.nearest())
.find(eq("_id",
id)).first();
-
28.
HA Best Practices
• HA
against
DC
failures
and
ac1ve-‐ac1ve
=>
5
Nodes
across
3
DCs
• For
Writes
=>
Majority
Nodes
Need
to
be
in
Ac1ve
State
• For
Reads
=>
Secondary
Reads
can
con1nue
• Majority
Inac1ve
=>
Force
Reconfig
to
con1nue
Writes
rs:SECONDARY>
config
=
{
"_id"
:
"rs",
"members"
:
[
...
{"host":"localhost:27018",
"_id":1}
...
]
...
}
rs:SECONDARY>
rs.reconfig(config,
{force:true})
{
"ok"
:
1
}
rs:PRIMARY>
Replica
Set
Removed
Removed
Primary
Java
Client
Driver
✔
✗
✗
-
29.
Aggregation of Data
import
sta1c
com.mongodb.client.model.Accumulators.avg;
import
sta1c
com.mongodb.client.model.Accumulators.sum;
import
sta1c
com.mongodb.client.model.Aggregates.group;
import
sta1c
com.mongodb.client.model.Aggregates.sort;
import
sta1c
com.mongodb.client.model.Aggregates.unwind;
import
sta1c
com.mongodb.client.model.Aggregates.out;
…
public
void
deptForSkills()
{
Document
group
=
new
Document();
group.append("skills",
"$skills.name");
group.append("dept",
"$info.dept");
AggregateIterable<Document>
iter
=
collec6on.aggregate(Arrays
.asList(unwind("$skills"),
group(group,
avg("avgLevel",
"$skills.level"),
sum("count",
1)),
sort(new
Document().append(
"_id.skills",
1).append(
"avgLevel",
-‐1)),
out("skills")));
}
RD
DVa
FP
DA
DVo
GS
RTA
DD
{
"_id"
:
5,
"name"
:
"John
Snow",
"skills"
:
[
{
"name"
:
"java",
"level"
:
3,
"cerAfied"
:
true
},
{
"name"
:
"mongo",
"level"
:
5
}
],
"info"
:
{
"dept"
:
"A91",
"yearsexp"
:
3,
"gps"
:
[-‐74.00597,
40.71427],
"locaAon"
:
"New
York"
}
}
-
30.
Aggregation - Output
{
"_id"
:
{
"skills"
:
"c++",
"dept"
:
"A75"
},
"avgLevel"
:
5,
"count"
:
10
}
{
"_id"
:
{
"skills"
:
"c++",
"dept"
:
"A83"
},
"avgLevel"
:
4.666666666666667,
"count"
:
30
}
{
"_id"
:
{
"skills"
:
"c++",
"dept"
:
"A91"
},
"avgLevel"
:
3,
"count"
:
10
}
{
"_id"
:
{
"skills"
:
"java",
"dept"
:
"A75"
},
"avgLevel"
:
4,
"count"
:
10
}
{
"_id"
:
{
"skills"
:
"java",
"dept"
:
"A83"
},
"avgLevel"
:
3.5,
"count"
:
10
}
{
"_id"
:
{
"skills"
:
"java",
"dept"
:
"A91"
},
"avgLevel"
:
3,
"count"
:
40
}
{
"_id"
:
{
"skills"
:
"mongo",
"dept"
:
"A91"
},
"avgLevel"
:
5,
"count"
:
40}
{
"_id"
:
{
"skills"
:
"mongo",
"dept"
:
"A83"
},
"avgLevel"
:
2,
"count"
:
10
}
{
"_id"
:
{
"skills"
:
"mongo",
"dept"
:
"A75"
},
"avgLevel"
:
1,
"count"
:
10
}
RD
DVa
FP
DA
DVo
GS
RTA
DD
-
31.
DB
Tier
Sharding
Shard
1
Java
Client
Driver
Shard
2
P
S
S
P
S
S
Router
Router
…
Client
Tier
Config
Server
Config
Server
Config
Server
Shard
n
P
S
S
• Scale
as
you
grow
• Redundancy
is
built-‐in
at
all
levels
• 3
Types
of
Sharding
–
Range,
Hashed
or
Tag-‐
Aware
RD
DVa
FP
DA
DVo
GS
RTA
DD
-
32.
MongoDB set up
Use MongoDB OpsManager or Cloud Manager Automation to set up the cluster
(or)
sudo mongod --port 37017 --dbpath /data/shard1 --logpath /logs/shard1.log –fork
sudo mongod --port 37018 --dbpath /data/shard2 --logpath /logs/shard2.log –fork
sudo mongod --port 47017 --dbpath /data/cfg —configsvr --logpath /logs/cfg.log –fork
sudo mongos --port 57017 --configdb localhost:47017
sudo mongos --port 57018 --configdb localhost:47017
mongo --port 57017
> sh.addShard("localhost:37017”)
> sh.addShard("localhost:37018”)
> sh.enableSharding("mydb”)
> sh.shardCollection("mydb.coll",{"_id":1})
In
the
Java
Program,
pass
the
Router
IP
addresseses
and
Ports
as
part
of
the
Connec1on
String
RD
DVa
FP
DA
DVo
GS
RTA
DD
-
33.
MongoDB for a Big Data World
Rich
Data
Data
Variety
Fast
Processing
Data
Availability
Data
Volume
Geo-‐
Spa1al
Real-‐1me
Access
Data
Durability
-
34.
MongoDB for a Big Data World
Rich
Data
Data
Variety
Fast
Processing
Data
Availability
Data
Volume
Geo-‐
Spa1al
Real-‐1me
Access
Data
Durability
Flexible
Data
Model
and
Dynamic
Schema
Embedded
Data
Na1ve
Replica1on
Across
Data
Centers
Appropriate
WriteConcern
Rich
Query
Model
and
Aggrega1on
Na1ve
Geo-‐
Spa1al
Features
Horizontal
Scalability
as
you
grow
Sub-‐documents,
Arrays
etc
-
35.
More Information – Java/MongoDB
Resource Location
MongoDB Java Driver
http://docs.mongodb.org/
ecosystem/drivers/java/
Java API to connect to
MongoDB
http://api.mongodb.org/java/3.0/
Driver Download
http://mongodb.github.io/mongo-
java-driver/
Morphia Project
https://github.com/mongodb/
morphia
Hadoop Driver for
MongoDB
http://docs.mongodb.org/
ecosystem/tools/hadoop/
University Course
https://university.mongodb.com/
courses/M101J/about?
jmp=docs&_ga=1.249916550.186
6581253.1440492145
-
36.
Resource Location
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Free Online Training university.mongodb.com
Webinars and Events mongodb.com/events
Documentation docs.mongodb.org
MongoDB Downloads mongodb.com/download
Additional Info info@mongodb.com
More Information – MongoDB
-
37.
Thank You!
aveekshith.bushan@mongodb.com
You can reach me at
Core Driver – alternative API
MongoDB Async Driver - A new asynchronous API that can leverage either Netty or Java 7’s AsynchronousSocketChannel for fast and non-blocking IO.
Netty is a non-blocking I/O (NIO) client-server framework for the development of Java network applications such as protocol servers and clients.
Pool of connections to the Database – even with multiple threads
MongoClientOptions.Builder()
connectionsPerHost
HeartbeatConnectTimeout
HeartbeatFrequency
MaxconnectionIdleTime
To create a capped Collection -> createCollection (MaxDocuments, UsePowerof2Sizes, capped), getCollection – defer creation till data is written
MongoDB Challenge Response Mechanism
X509
SCRAM SASL
Kerberos
LDAP
insertMany
Lambda function as well – Java 8
SingleResultCallback<T> - An interface to describe the completion of an asynchronous operation.
QueryFilter
A position is the fundamental geometry construct. The "coordinates" member of a geometry object is composed of one position (in the case of a Point geometry), an array of positions (LineString or MultiPoint geometries), an array of arrays of positions (Polygons, MultiLineStrings), or a multidimensional array of positions (MultiPolygon)
updateMany
deleteMany