From Oracle to MongoDB

From Oracle to
MongoDB
A real use case at
Telefónica PDI

Pablo Enfedaque
pev@tid.es
06.10.2012

Content
Introduction
• Telefónica PDI. Who?
01
• Personalisation Server. Why? What?

The SQL version
• Data model and architecture
02
• Integrations, problems and improvements
The NoSQL version
• Data model and architecture
03
• Performance boost
• The bad
Conclusions
• Conclusions
04
• Personal thoughts

01
Título del capítulo
Introduction
Máximo 3 líneas

01
Telefónica PDI. Who?

•  Telefónica
§  Fifth largest telecommunications company in the world
§  Operations in Europe (7 countries), the United States and Latin America
(15 countries)

•  Telefónica Digital
§  Web and mobile digital contents and services division

•  Product Development and Innovation unit
§  Formerly Telefónica R&D
§  Product & service development, platforms development, research,
technology strategy, user experience and deployment & operation
§  Around 70 different on going projects at all time.

Telefónica PDI
4

01
Personalisation Server. What?

•  User proﬁling system

•  Machine learning

•  Recommendations

•  Customer’s proﬁle storage

Telefónica PDI
5

01
Opt-in and proﬁle module. Why?

•  Users data, proﬁle and permissions, was scattered across different
storages

• Gender
IPTV service
• Film and music preferences
So you want to
Mobile • Permission to contact by SMS?
know my
service
• Gender
address…
AGAIN?!
Music tickets • Address
service
• Music preferences

Location • Address
based offers
• Permission to contact by SMS?

Telefónica PDI
6

01

•  Users data, proﬁle and permissions, was scattered across different
storages

• Gender
IPTV service
• Film and music preferences

Mobile • Permission to contact by SMS?
service
• Gender

Music tickets • Address
service
• Music preferences

Location • Address
based offers
• Permission to contact by SMS?

Telefónica PDI
7

01

•  Provide a module to become master
customer’s data storage

•  Gender
IPTV service
•  Film and music
preferences
•  Permission to contact
Mobile
by SMS?
service
•  Address

Music tickets
service

Location
based offers

Telefónica PDI
8

01
Opt-in and profile module. What?

•  Features:
§  Flexible profile definition, classified in services

§  Profile sharing options between different services

§  Real time API

§  Supplementary offline batch interface

§  Authorization system

§  High availability

§  Inexpensive solution & hardware

Telefónica PDI
9

02
The SQL capítulo
Título del solution
Máximo 3 líneas

02
Data model
Services, users and their profile

•  Services defined a set of attributes (their profile), with default
value and data type
•  Users were registered in services
•  Users defined values for some of the services attributes
•  Each attribute value had an update date to avoid overwriting newer
changes through batch loads

Telefónica PDI
11

02
Data model
Services proﬁle sharing matrix

•  Services could access attributes declared inside other services
•  There were sharing rights for read or read and write
•  The user had to be registered in both services

Telefónica PDI
12

02
Data model
Authorization system

•  Everything that could be accessed in the PS was a resource
•  Roles deﬁned access rights (read or read and write) of resources
•  Auth users had roles
•  Roles could include other roles

Telefónica PDI
13

02
Data model
Bonus features!

•  Multiple IDS:
§  Users proﬁle could be accessed with different equivalent IDs depending
on the service
§  Each user ID was deﬁned by an ID type (phone number, email, portal ID,
hash…) and the ID value

Telefónica PDI
14

02
High level logical architecture

§  Everything running on Red Hat EL 5.4 64 bits

Telefónica PDI
15

02


Telefónica PDI
16

02
Integration
Planned integration

•  PS replaces all customers proﬁle and
permissions DBs

•  All systems access this data through
PS real time API

•  In special cases, some PS-consumers
could use the batch interface.

•  The same way new services could be
added quite easily

Telefónica PDI
17

02
Integration
Problems arise

•  Budget restrictions: adapt all services
to use the API was too expensive

•  Keep independent systems DBs and
synchronize PS through batch

•  Use DBs built-in massive extraction
feature to generate daily batch ﬁles

•  However… in most cases those DBs
were not able to generate Delta
(only changes) extractions
§  Provide full daily snapshots!

Telefónica PDI
18

02
First version performance
Ireland

•  1.8M customers, 180 proﬁle attributes, 6 services
•  Sizes
§  Tables + indexes size: 65Gb
§  30% of the size were indexes

•  Batch
§  Full DWH customer’s proﬁle import: > 24 hours
§  Delta extractions: 4 - 6 hours
§  Loads and extractions performance proportional to data size

•  API:
§  Response time with average traffic: 110ms

Telefónica PDI
19

03
The SQL capítulo
Second 3 líneas
Máximo version

03
Second version

•  New approach: batch processes access directly DB
Telefónica PDI
21

03
Second version
Batch processes

•  Batch processes had to
§  Validate authentication and authorization

§  Verify user, service and attribute existence

§  Check equivalent IDs

§  Validate sharing matrix rights

§  Validate values data type

§  Check the update date of the existing values

Telefónica PDI
22

03
Second version
DB Batch processing

BAs
O ur D

Telefónica PDI
23

03
Second version
New DB-based batch loading process

•  Preprocess incoming batch file in BE servers
§  Validate format, services and attributes existence and values data types
§  Generate intermediate file with structure like target DB table

•  Load intermediate file (Oracle’s SQL*Loader) to a temporal table
•  Switch DB to “deferred writing”, storing all incoming modifications
•  Merge temporal table and final table, checking values update date
•  Replace old users attributes values table with merge result
•  Apply deferred writing operations
Telefónica PDI
24

03
Second version
New batch extraction process

•  Generate a temporal DB table with format similar to final batch file.
Two loops over users attributes values table required:
§  Select format of the table; number and order of columns / attributes
§  Fill the new table

•  Loop the whole temporal table for final formatting (empty fields…)
•  From batch side loop across the whole table (SELECT * FROM …)

•  Write each retrieved row as a line in the resulting file

Telefónica PDI
25

03
Second version performance
Ireland performance requirements

•  Batch time window: 3:30 hours
§  Full DWH load
§  Two Delta loads
§  Three Delta extractions

•  API:
§  Ireland requirement: < 500ms

Telefónica PDI
26

03
Second version performance
Ireland

•  Sizes
§  Temporal tables size increases almost exponentially: 15Gb and above
§  Intermediate ﬁle size: from 700Mb to 7Gb
•  Batch
§  Full DWH customer’s proﬁle import: 2:30 hours
§  Delta extractions: 1:00 hour
§  Loads performance worsened quickly (almost exp): 6:00 hours
§  Extractions performance proportional to data size
§  Concurrent batch processes may halt the DB
•  API:
§  Response time while loading was unpredictable: >300ms

Telefónica PDI
27

04
The SQL capítulo
Máximo 3 líneas
Third version

04
Third version
Speed up DB Batch processes

gain)
A s (a
Our DB

Telefónica PDI
29

04
Third version
New (second) DB-based batch loading process

•  Minor preprocessing of incoming batch file in BE servers
§  Just validate format
§  No intermediate file needed!

•  Load validated file (Oracle’s SQL*Loader) to a temporal table

•  Loop the temporal table merging the values into final table, checking
values update date and data types
§  Use several concurrent writing jobs

•  Store results on real table, no need to replace!
•  No “deferred writing”!

Telefónica PDI
30

04
Third version
Enhancements to extraction process

•  Optimized loops to generate temporal output table.
§  Use several concurrent writing jobs
§  We achieved a speed-up of between 1.5 and 2

•  Loop the whole temporal table for ﬁnal formatting (empty ﬁelds…)

•  Download and write lines directly inside Oracle’s sqlplus
•  No SELECT * FROM … query from Batch side!

Telefónica PDI
31

04
Third version performance
Ireland

•  Sizes
§  Temporal tables: 15Gb

•  Batch
§ Full DWH customer’s proﬁle import: 1:10 hours (vs. 2:30 hours)
§ Three Delta extractions: 2:15 hours (vs. 3:00 hours)
§ Loads and extractions performance proportional to data size
§ Concurrent batch processes not so harmful
s
DBA
•  API:
Our
F**K YEAH
§  Response time while loading: 400ms

Telefónica PDI
32

04
United Kingdom

•  25M customers, 150 proﬁle attributes, 15 services
•  Sizes

•  Batch
§  Two Delta imports: < 2:00 hours
§  Two Delta extractions: < 2:00 hours

•  API:
s
DBA
Our
F**K YEAH

Telefónica PDI
33

04

Ireland
3rd version
2nd version
DB size
65Gb + 15Gb (temp)
65Gb + > 15Gb
Full DWH load
1:10 hours
2:30 hours
Three Delta exports
2:15 hours
3:00 hours
Batch stability
Stable, linear
Unstable, exponential
API response time
110ms
110ms
API while loading
400ms
Unpredictable

United Kingdom
3rd version
DB size
700Gb
s
Two Delta loads
< 2:00 hours
DBA
Our
Three Delta exports
< 2:00 hours
F**K YEAH
API response time
90ms

Telefónica PDI
34

04
DB stats

•  20 database tables
•  API: several queries with up to 35 joins and even some unions
•  Authorization: 5 joins to validate auth users access
•  Batch:
§  Load: 1700 lines of PL/SQL
§  Extraction: 1200 of PL/SQL

Telefónica PDI
35

04
Mission completed?

Telefónica PDI
36

04
Mexico

•  Mexico time window: 4:00 hours
§  Full DWH load!
§  Additional Delta feeds loads
§  At least two Delta extractions

D BAs
Our

Telefónica PDI
37

05
The NoSQL solution
Máximo 3 líneas

05
MongoDB Data Model
Services and their proﬁle + sharing matrix
{ _id : 7,
service_name : "root",
id_type : 1,
default_values: false,
attrib_id = service_id * 10000 + num attribs + 1
owned_attribs :
[
{
attrib_id : 70005,
attrib_nane : “marketing.consent",
attrib_data_type : 1,
attrib_def_value : "no",
attrib_status : 1
}, ...
],
shared_attribs :
[
{attrib_id : 20144, sharing_mode : 0},
...
]
}

Telefónica PDI
39

05
MongoDB Data Model
Users and their proﬁle + multiple IDs
{
_id : "011234"
Equivalent ID document:
services_list :

[
_id = “id type” + “user ID”
{
{
_id : “05abcd"
service_id : 1,
ue : "011234"
reg_date : {"$date" : 1318040693000}
}
},
...
_id = “id type” + “user ID”
],
user_values :
[
{
attrib_id : 10140,
attrib_value : "Open",
update_date : {"$date" : 1317110161000}
},
...
]
}

Telefónica PDI
40

05
MongoDB Data Model
Authorization system
ROLES COLLECTION:

{
AUTH USERS COLLECTION:
_id: 'PS_ADMIN_ROLE',

roles_resources: [
{
{
_id: "admin"
resource_id: "admin.**”,
auth_pswd: ”XXX",
method: 'R' },
auth_roles: ['PS_ADMIN_ROLE’, …],
{
auth_uris: [
resource_id: "stats.**”,
{uri_path: "/**", method: 'R'},
method: 'IMPORT' },
{uri_path: "/stats/**", method: 'RW'},
...
{uri_path: "/kpis/**", method: ’IMPORT'},
]
...
}
]
}
RESOURCES COLLECTION:

{
_id: "admin.**",
Replicate uris (from resources)
role_uri: "/**"
and methods (from roles)
}

Telefónica PDI
41

05
MongoDB Data Model
DB stats

•  Only 5 collections
•  API: typically 2 accesses (services and users collections)
•  Authorization: access only 1 collection to grant access
•  Batch: all processing done outside DB

Telefónica PDI
42

05
NoSQL version


Telefónica PDI
43

05
NoSQL version performance
Ireland (at PDI lab)

•  Sizes
§  Collections + indexes size: 20Gb (vs. 65Gb)
§  < 5% of the size are indexes (vs. 30%)

•  Batch
§  Full DWH customer’s proﬁle import: 0:12 hours (vs. 1:10 hours)
§  Three Delta extractions: 0:40 hours (vs. 2:15 hours)
§  Concurrent batch processes without performance affection

•  API:
§  Response time with average traffic: < 10ms (vs. 110ms)
§  Response time while loading: the same
§  High load (600 TPS) response time while loading: 300ms

Telefónica PDI
44

05
United Kingdom (at PDI lab)

•  Sizes
§  Collections + indexes size: 210Gb (vs. 700Gb)
§  < 5% of the size were indexes

•  Batch
§  Two Delta imports: < 0:40 hours (vs. 2:00 hours)

Telefónica PDI
45

05
Mexico

•  Sizes
§  Collections + indexes size: 320Gb
§  Indexes size: 1.2Gb

•  Batch
§  Initial Full import (20M, 40 attributes): 2:00 hours
§  Small Full import (20M, 6 attributes): 0:40 hours

•  API:
§  Response time with average traffic: < 10ms (vs. 90ms)
§  Response time while loading: the same
§  High load (500 TPS) response time while loading: 270ms

Telefónica PDI
46

04
Ireland
NoSQL version
SQL version
DB size
20Gb
80Gb
Full DWH load
0:12 hours
1:10 hours
Three Delta exports
0:40 hours
2:15 hours
API while loading
< 10ms
400ms
API 600TPS + loading
300ms
Timeout / failure

United Kingdom
NoSQL version
SQL version
DB size
210Gb
700Gb
Two Delta loads
< 0:40hours
< 2:00 hours

Mexico
NoSQL version
DB size
320Gb
Initial Full load (40 attr)
2:00 hours

Daily Full load (6 attr)
0:40 hours
D BAs
Our
API while loading
< 10ms
API 500TPS
Telefónica PDI
+ loading
270ms
47

05
Mission completed?

Telefónica PDI
48

05
The bad

•  Batch load process was too fast
§  To keep secondary nodes synched we needed oplog of 16 or 24Gb
§  We had to disable journaling for the first migrations

•  Labels of documents fields take up disc space
§  Reduced them to just 2 chars: “attribute_id” -> “ai”

•  Respect the unwritten law of at least 70% of size in RAM
•  Take care with compound indexes, order matters
§  You can save one index… or you can have problems
§  Put most important key (never nullable) the first one

•  DBAs whining and complaining about NoSQL
§  “If we had enough RAM for all data, Oracle would outperform MongoDB”

Telefónica PDI
49

05
The ugly

•  Second migration once the PS is already running
§  Full import adding 30 new attributes values: 10:00 hours
§  Full import adding 150 new attributes values: 40:00 hours

•  Increase considerably documents size (i.e. adding lots of new values
to the users) makes MongoDB rearrange the documents, performing
around 5 times slower
§  That’s a problem when you are updating 10k documents per second

•  Solutions?
§  Avoid this situation at all cost. Run away!
§  Normalize users values; move to a new individual collection
§  Prealloc the size with a faux ﬁeld
•  You could waste space!
§  Load in new collection, merge and swap, like we did in Oracle

Telefónica PDI
50

06
Conclusions
Título del capítulo 
Máximo líneas
Máximo 3 3 líneas

06
Conclusions & personal thoughts

•  Awesome performance boost
§  But not all use cases ﬁt in a MongoDB / NoSQL solution!

•  New technology, different limitations
•  Fear of the unknown
§  SSDs performance?
§  Long term performance and stability?

•  Python + MongoDB + pymongo = fast development
§  I mean, really fast

•  MongoDB Monitoring Service (MMS)
•  10gen people were very helpful
Telefónica PDI
52

06
Questions?

Telefónica PDI
53

0X
SQL Physical architecture

§  Scale horizontally adding more BE or DB servers or disks in the SAN
§  Virtualized or physical servers depending on the deployment

Telefónica PDI
55

0X
MongoDB Physical architecture

§  MongoDB arbiters running on BE servers
§  Scale horizontally adding more BE servers or disks in the SAN
§  Sharding may already be conﬁgured to scale adding more replica sets

Telefónica PDI
56

From Oracle to MongoDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to From Oracle to MongoDB

Similar to From Oracle to MongoDB (20)

More from Pablo Enfedaque

More from Pablo Enfedaque (8)

From Oracle to MongoDB