From Oracle to MongoDB

10,681 views

Published on

From Oracle to MongoDB, a real use case at Telefónica R&D
The talk will cover the use case of the Personalisation Server, a master customers profile storage for the companies of the Telefonica Group (Telefonica, O2…). It provides real-time (ReST API) and batch interfaces to update, retrieve and share customers profile. Initially the PS used Oracle, but due to scalability and cost issues we implemented a new version with MongoDB.
In the talk we will see the problems that made us move to MongoDB and all the benefits that we obtained (with real performance figures, ofc).
Right now the Oracle version is being used at UK and Ireland (aprox.
30M user profiles stored) and the NoSQL version is being deployed at Mexico (18M customers) and other Latam countries.

3 Comments
13 Likes
Statistics
Notes
  • @pablito56 Can you contact me directly? My email is my last name followed by pythian domain, which is com. If you don't mind, I would like to ask you a few more questions on the types of attributes you stored.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • The servers specs were different on each country, from 16Gb of RAM + 4 Cores Intel Xeon servers to 72Gb of RAM + 2 x 8 Cores Intel Xeon + fiber optics shared storage machines. The good thing for us was that MongoDB version fitted in servers sized for Oracle.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hi Pablito - do you have the server specs for Oracle solution and MongoDB?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
10,681
On SlideShare
0
From Embeds
0
Number of Embeds
57
Actions
Shares
0
Downloads
231
Comments
3
Likes
13
Embeds 0
No embeds

No notes for slide

From Oracle to MongoDB

  1. 1. From Oracle toMongoDBA real use case atTelefónica PDIPablo Enfedaquepev@tid.es06.10.2012
  2. 2. Content Introduction • Telefónica PDI. Who?01 • Personalisation Server. Why? What? The SQL version • Data model and architecture 02 • Integrations, problems and improvements The NoSQL version • Data model and architecture03 • Performance boost • The bad Conclusions • Conclusions04 • Personal thoughts
  3. 3. 01Título del capítuloIntroductionMáximo 3 líneas
  4. 4. 01 Telefónica PDI. Who? •  Telefónica §  Fifth largest telecommunications company in the world §  Operations in Europe (7 countries), the United States and Latin America (15 countries) •  Telefónica Digital §  Web and mobile digital contents and services division •  Product Development and Innovation unit §  Formerly Telefónica R&D §  Product & service development, platforms development, research, technology strategy, user experience and deployment & operation §  Around 70 different on going projects at all time.Telefónica PDI 4
  5. 5. 01 Personalisation Server. What? •  User profiling system •  Machine learning •  Recommendations •  Customer’s profile storageTelefónica PDI 5
  6. 6. 01 Opt-in and profile module. Why? •  Users data, profile and permissions, was scattered across different storages • Gender IPTV service • Film and music preferences So you want to Mobile • Permission to contact by SMS? know my service • Gender address… AGAIN?! Music tickets • Address service • Music preferences Location • Address based offers • Permission to contact by SMS?Telefónica PDI 6
  7. 7. 01 Opt-in and profile module. Why? •  Users data, profile and permissions, was scattered across different storages • Gender IPTV service • Film and music preferences Mobile • Permission to contact by SMS? service • Gender Music tickets • Address service • Music preferences Location • Address based offers • Permission to contact by SMS?Telefónica PDI 7
  8. 8. 01 Opt-in and profile module. Why? •  Provide a module to become master customer’s data storage •  Gender IPTV service •  Film and music preferences •  Permission to contact Mobile by SMS? service •  Address Music tickets service Location based offersTelefónica PDI 8
  9. 9. 01 Opt-in and profile module. What? •  Features: §  Flexible profile definition, classified in services §  Profile sharing options between different services §  Real time API §  Supplementary offline batch interface §  Authorization system §  High availability §  Inexpensive solution & hardwareTelefónica PDI 9
  10. 10. 02The SQL capítuloTítulo del solutionMáximo 3 líneas
  11. 11. 02 Data model Services, users and their profile •  Services defined a set of attributes (their profile), with default value and data type •  Users were registered in services •  Users defined values for some of the services attributes •  Each attribute value had an update date to avoid overwriting newer changes through batch loadsTelefónica PDI 11
  12. 12. 02 Data model Services profile sharing matrix •  Services could access attributes declared inside other services •  There were sharing rights for read or read and write •  The user had to be registered in both servicesTelefónica PDI 12
  13. 13. 02 Data model Authorization system •  Everything that could be accessed in the PS was a resource •  Roles defined access rights (read or read and write) of resources •  Auth users had roles •  Roles could include other rolesTelefónica PDI 13
  14. 14. 02 Data model Bonus features! •  Multiple IDS: §  Users profile could be accessed with different equivalent IDs depending on the service §  Each user ID was defined by an ID type (phone number, email, portal ID, hash…) and the ID valueTelefónica PDI 14
  15. 15. 02 High level logical architecture §  Everything running on Red Hat EL 5.4 64 bitsTelefónica PDI 15
  16. 16. 02 High level logical architecture §  Everything running on Red Hat EL 5.4 64 bitsTelefónica PDI 16
  17. 17. 02 Integration Planned integration •  PS replaces all customers profile and permissions DBs •  All systems access this data through PS real time API •  In special cases, some PS-consumers could use the batch interface. •  The same way new services could be added quite easilyTelefónica PDI 17
  18. 18. 02 Integration Problems arise •  Budget restrictions: adapt all services to use the API was too expensive •  Keep independent systems DBs and synchronize PS through batch •  Use DBs built-in massive extraction feature to generate daily batch files •  However… in most cases those DBs were not able to generate Delta (only changes) extractions §  Provide full daily snapshots!Telefónica PDI 18
  19. 19. 02 First version performance Ireland •  1.8M customers, 180 profile attributes, 6 services •  Sizes §  Tables + indexes size: 65Gb §  30% of the size were indexes •  Batch §  Full DWH customer’s profile import: > 24 hours §  Delta extractions: 4 - 6 hours §  Loads and extractions performance proportional to data size •  API: §  Response time with average traffic: 110msTelefónica PDI 19
  20. 20. 03The SQL capítuloTítulo del solutionSecond 3 líneasMáximo version
  21. 21. 03 Second version High level logical architecture •  New approach: batch processes access directly DBTelefónica PDI 21
  22. 22. 03 Second version Batch processes •  Batch processes had to §  Validate authentication and authorization §  Verify user, service and attribute existence §  Check equivalent IDs §  Validate sharing matrix rights §  Validate values data type §  Check the update date of the existing valuesTelefónica PDI 22
  23. 23. 03 Second version DB Batch processing BAs O ur DTelefónica PDI 23
  24. 24. 03 Second version New DB-based batch loading process •  Preprocess incoming batch file in BE servers §  Validate format, services and attributes existence and values data types §  Generate intermediate file with structure like target DB table •  Load intermediate file (Oracle’s SQL*Loader) to a temporal table •  Switch DB to “deferred writing”, storing all incoming modifications •  Merge temporal table and final table, checking values update date •  Replace old users attributes values table with merge result •  Apply deferred writing operationsTelefónica PDI 24
  25. 25. 03 Second version New batch extraction process •  Generate a temporal DB table with format similar to final batch file. Two loops over users attributes values table required: §  Select format of the table; number and order of columns / attributes §  Fill the new table •  Loop the whole temporal table for final formatting (empty fields…) •  From batch side loop across the whole table (SELECT * FROM …) •  Write each retrieved row as a line in the resulting fileTelefónica PDI 25
  26. 26. 03 Second version performance Ireland performance requirements •  Batch time window: 3:30 hours §  Full DWH load §  Two Delta loads §  Three Delta extractions •  API: §  Ireland requirement: < 500msTelefónica PDI 26
  27. 27. 03 Second version performance Ireland •  1.8M customers, 180 profile attributes, 6 services •  Sizes §  Tables + indexes size: 65Gb §  30% of the size were indexes §  Temporal tables size increases almost exponentially: 15Gb and above §  Intermediate file size: from 700Mb to 7Gb •  Batch §  Full DWH customer’s profile import: 2:30 hours §  Delta extractions: 1:00 hour §  Loads performance worsened quickly (almost exp): 6:00 hours §  Extractions performance proportional to data size §  Concurrent batch processes may halt the DB •  API: §  Response time with average traffic: 80ms §  Response time while loading was unpredictable: >300msTelefónica PDI 27
  28. 28. 04The SQL capítuloTítulo del solutionMáximo 3 líneasThird version
  29. 29. 04 Third version Speed up DB Batch processes gain) A s (a Our DBTelefónica PDI 29
  30. 30. 04 Third version New (second) DB-based batch loading process •  Minor preprocessing of incoming batch file in BE servers §  Just validate format §  No intermediate file needed! •  Load validated file (Oracle’s SQL*Loader) to a temporal table •  Loop the temporal table merging the values into final table, checking values update date and data types §  Use several concurrent writing jobs •  Store results on real table, no need to replace! •  No “deferred writing”!Telefónica PDI 30
  31. 31. 04 Third version Enhancements to extraction process •  Optimized loops to generate temporal output table. §  Use several concurrent writing jobs §  We achieved a speed-up of between 1.5 and 2 •  Loop the whole temporal table for final formatting (empty fields…) •  Download and write lines directly inside Oracle’s sqlplus •  No SELECT * FROM … query from Batch side!Telefónica PDI 31
  32. 32. 04 Third version performance Ireland •  1.8M customers, 180 profile attributes, 6 services •  Sizes §  Tables + indexes size: 65Gb §  30% of the size were indexes §  Temporal tables: 15Gb •  Batch § Full DWH customer’s profile import: 1:10 hours (vs. 2:30 hours) § Three Delta extractions: 2:15 hours (vs. 3:00 hours) § Loads and extractions performance proportional to data size § Concurrent batch processes not so harmful s DBA •  API: Our F**K YEAH §  Response time with average traffic: 110ms §  Response time while loading: 400msTelefónica PDI 32
  33. 33. 04 Third version performance United Kingdom •  25M customers, 150 profile attributes, 15 services •  Sizes §  Tables + indexes size: 700Gb §  40% of the size were indexes •  Batch §  Two Delta imports: < 2:00 hours §  Two Delta extractions: < 2:00 hours §  Loads and extractions performance proportional to data size •  API: §  Response time with average traffic: 90ms s DBA Our F**K YEAHTelefónica PDI 33
  34. 34. 04 Third version performance Ireland 3rd version 2nd version DB size 65Gb + 15Gb (temp) 65Gb + > 15Gb Full DWH load 1:10 hours 2:30 hours Three Delta exports 2:15 hours 3:00 hours Batch stability Stable, linear Unstable, exponential API response time 110ms 110ms API while loading 400ms Unpredictable United Kingdom 3rd version DB size 700Gb s Two Delta loads < 2:00 hours DBA Our Three Delta exports < 2:00 hours F**K YEAH API response time 90msTelefónica PDI 34
  35. 35. 04 Third version performance DB stats •  20 database tables •  API: several queries with up to 35 joins and even some unions •  Authorization: 5 joins to validate auth users access •  Batch: §  Load: 1700 lines of PL/SQL §  Extraction: 1200 of PL/SQLTelefónica PDI 35
  36. 36. 04 Mission completed?Telefónica PDI 36
  37. 37. 04 Third version performance Mexico •  20M customers, 200 profile attributes, 10 services •  Mexico time window: 4:00 hours §  Full DWH load! §  Additional Delta feeds loads §  At least two Delta extractions D BAs OurTelefónica PDI 37
  38. 38. 05Título del capítuloThe NoSQL solutionMáximo 3 líneas
  39. 39. 05 MongoDB Data Model Services and their profile + sharing matrix { _id : 7, service_name : "root", id_type : 1, default_values: false, attrib_id = service_id * 10000 + num attribs + 1 owned_attribs : [ { attrib_id : 70005, attrib_nane : “marketing.consent", attrib_data_type : 1, attrib_def_value : "no", attrib_status : 1 }, ... attrib_id = service_id * 10000 + num attribs + 1 ], shared_attribs : [ {attrib_id : 20144, sharing_mode : 0}, ... ] }Telefónica PDI 39
  40. 40. 05 MongoDB Data Model Users and their profile + multiple IDs { _id : "011234" Equivalent ID document: services_list : [ _id = “id type” + “user ID” { { _id : “05abcd" service_id : 1, ue : "011234" reg_date : {"$date" : 1318040693000} } }, ... _id = “id type” + “user ID” ], user_values : attrib_id = service_id * 10000 + num attribs + 1 [ { attrib_id : 10140, attrib_value : "Open", update_date : {"$date" : 1317110161000} }, ... ] }Telefónica PDI 40
  41. 41. 05 MongoDB Data Model Authorization system ROLES COLLECTION: { AUTH USERS COLLECTION: _id: PS_ADMIN_ROLE, roles_resources: [ { { _id: "admin" resource_id: "admin.**”, auth_pswd: ”XXX", method: R }, auth_roles: [PS_ADMIN_ROLE’, …], { auth_uris: [ resource_id: "stats.**”, {uri_path: "/**", method: R}, method: IMPORT }, {uri_path: "/stats/**", method: RW}, ... {uri_path: "/kpis/**", method: ’IMPORT}, ] ... } ] } RESOURCES COLLECTION: { _id: "admin.**", Replicate uris (from resources) role_uri: "/**" and methods (from roles) }Telefónica PDI 41
  42. 42. 05 MongoDB Data Model DB stats •  Only 5 collections •  API: typically 2 accesses (services and users collections) •  Authorization: access only 1 collection to grant access •  Batch: all processing done outside DBTelefónica PDI 42
  43. 43. 05 NoSQL version High level logical architecture §  Everything running on Red Hat EL 6.2 64 bitsTelefónica PDI 43
  44. 44. 05 NoSQL version performance Ireland (at PDI lab) •  1.8M customers, 180 profile attributes, 6 services •  Sizes §  Collections + indexes size: 20Gb (vs. 65Gb) §  < 5% of the size are indexes (vs. 30%) •  Batch §  Full DWH customer’s profile import: 0:12 hours (vs. 1:10 hours) §  Three Delta extractions: 0:40 hours (vs. 2:15 hours) §  Loads and extractions performance proportional to data size §  Concurrent batch processes without performance affection •  API: §  Response time with average traffic: < 10ms (vs. 110ms) §  Response time while loading: the same §  High load (600 TPS) response time while loading: 300msTelefónica PDI 44
  45. 45. 05 NoSQL version performance United Kingdom (at PDI lab) •  25M customers, 150 profile attributes, 15 services •  Sizes §  Collections + indexes size: 210Gb (vs. 700Gb) §  < 5% of the size were indexes •  Batch §  Two Delta imports: < 0:40 hours (vs. 2:00 hours) §  Loads and extractions performance proportional to data sizeTelefónica PDI 45
  46. 46. 05 NoSQL version performance Mexico •  20M customers, 200 profile attributes, 15 services •  Sizes §  Collections + indexes size: 320Gb §  Indexes size: 1.2Gb •  Batch §  Initial Full import (20M, 40 attributes): 2:00 hours §  Small Full import (20M, 6 attributes): 0:40 hours •  API: §  Response time with average traffic: < 10ms (vs. 90ms) §  Response time while loading: the same §  High load (500 TPS) response time while loading: 270msTelefónica PDI 46
  47. 47. 04 NoSQL version performance Ireland NoSQL version SQL version DB size 20Gb 80Gb Full DWH load 0:12 hours 1:10 hours Three Delta exports 0:40 hours 2:15 hours API while loading < 10ms 400ms API 600TPS + loading 300ms Timeout / failure United Kingdom NoSQL version SQL version DB size 210Gb 700Gb Two Delta loads < 0:40hours < 2:00 hours Mexico NoSQL version DB size 320Gb Initial Full load (40 attr) 2:00 hours Daily Full load (6 attr) 0:40 hours D BAs Our API while loading < 10ms API 500TPSTelefónica PDI + loading 270ms 47
  48. 48. 05 Mission completed?Telefónica PDI 48
  49. 49. 05 The bad •  Batch load process was too fast §  To keep secondary nodes synched we needed oplog of 16 or 24Gb §  We had to disable journaling for the first migrations •  Labels of documents fields take up disc space §  Reduced them to just 2 chars: “attribute_id” -> “ai” •  Respect the unwritten law of at least 70% of size in RAM •  Take care with compound indexes, order matters §  You can save one index… or you can have problems §  Put most important key (never nullable) the first one •  DBAs whining and complaining about NoSQL §  “If we had enough RAM for all data, Oracle would outperform MongoDB”Telefónica PDI 49
  50. 50. 05 The ugly •  Second migration once the PS is already running §  Full import adding 30 new attributes values: 10:00 hours §  Full import adding 150 new attributes values: 40:00 hours •  Increase considerably documents size (i.e. adding lots of new values to the users) makes MongoDB rearrange the documents, performing around 5 times slower §  That’s a problem when you are updating 10k documents per second •  Solutions? §  Avoid this situation at all cost. Run away! §  Normalize users values; move to a new individual collection §  Prealloc the size with a faux field •  You could waste space! §  Load in new collection, merge and swap, like we did in OracleTelefónica PDI 50
  51. 51. 06Título del capítuloConclusionsTítulo del capítulo
Máximo líneasMáximo 3 3 líneas
  52. 52. 06 Conclusions & personal thoughts •  Awesome performance boost §  But not all use cases fit in a MongoDB / NoSQL solution! •  New technology, different limitations •  Fear of the unknown §  SSDs performance? §  Long term performance and stability? •  Python + MongoDB + pymongo = fast development §  I mean, really fast •  MongoDB Monitoring Service (MMS) •  10gen people were very helpfulTelefónica PDI 52
  53. 53. 06 Questions?Telefónica PDI 53
  54. 54. 0X SQL Physical architecture §  Scale horizontally adding more BE or DB servers or disks in the SAN §  Virtualized or physical servers depending on the deploymentTelefónica PDI 55
  55. 55. 0X MongoDB Physical architecture §  MongoDB arbiters running on BE servers §  Scale horizontally adding more BE servers or disks in the SAN §  Sharding may already be configured to scale adding more replica setsTelefónica PDI 56

×