A Morning with MongoDB Barcelona: From Oracle to MongoDB

984 views

Published on

http://www.10gen.com/events/MongoDB-Morning-Barcelona

1 Comment
3 Likes
Statistics
Notes
  • Muy Bueno
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
984
On SlideShare
0
From Embeds
0
Number of Embeds
352
Actions
Shares
0
Downloads
10
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

A Morning with MongoDB Barcelona: From Oracle to MongoDB

  1. 1. From Oracle toMongoDBA real use case atTelefónica PDIPablo Enfedaquepev@tid.es06.10.2012
  2. 2. Content Introduction • Telefónica PDI. Who?01 • Personalisation Server. Why? What? The SQL version • Data model and architecture 02 • Integrations, problems and improvements The NoSQL version • Data model and architecture03 • Performance boost • The bad Conclusions • Conclusions04 • Personal thoughts
  3. 3. 01Título del capítuloIntroductionMáximo 3 líneas
  4. 4. 01 Telefónica PDI. Who? •  Telefónica §  Fifth largest telecommunications company in the world §  Operations in Europe (7 countries), the United States and Latin America (15 countries) •  Telefónica Digital §  Web and mobile digital contents and services division •  Product Development and Innovation unit §  Formerly Telefónica R&D §  Product & service development, platforms development, research, technology strategy, user experience and deployment & operation §  Around 70 different on going projects at all time.Telefónica PDI 4
  5. 5. 01 Personalisation Server. What? •  User profiling system •  Machine learning •  Recommendations •  Customer’s profile storageTelefónica PDI 5
  6. 6. 01 Opt-in and profile module. Why? •  Users data, profile and permissions, was scattered across different storages • Gender IPTV service • Film and music preferences So you want to Mobile • Permission to contact by SMS? know my service • Gender address… AGAIN?! Music tickets • Address service • Music preferences Location • Address based offers • Permission to contact by SMS?Telefónica PDI 6
  7. 7. 01 Opt-in and profile module. Why? •  Users data, profile and permissions, was scattered across different storages • Gender IPTV service • Film and music preferences Mobile • Permission to contact by SMS? service • Gender Music tickets • Address service • Music preferences Location • Address based offers • Permission to contact by SMS?Telefónica PDI 7
  8. 8. 01 Opt-in and profile module. Why? •  Provide a module to become master customer’s data storage •  Gender IPTV service •  Film and music preferences •  Permission to contact Mobile by SMS? service •  Address Music tickets service Location based offersTelefónica PDI 8
  9. 9. 01 Opt-in and profile module. What? •  Features: §  Flexible profile definition, classified in services §  Profile sharing options between different services §  Real time API §  Supplementary offline batch interface §  Authorization system §  High availability §  Inexpensive solution & hardwareTelefónica PDI 9
  10. 10. 02The SQL capítuloTítulo del solutionMáximo 3 líneas
  11. 11. 02 Data model Services, users and their profile •  Services defined a set of attributes (their profile), with default value and data type •  Users were registered in services •  Users defined values for some of the services attributes •  Each attribute value had an update date to avoid overwriting newer changes through batch loadsTelefónica PDI 11
  12. 12. 02 Data model Services profile sharing matrix •  Services could access attributes declared inside other services •  There were sharing rights for read or read and write •  The user had to be registered in both servicesTelefónica PDI 12
  13. 13. 02 Data model Authorization system •  Everything that could be accessed in the PS was a resource •  Roles defined access rights (read or read and write) of resources •  Auth users had roles •  Roles could include other rolesTelefónica PDI 13
  14. 14. 02 Data model Bonus features! •  Multiple IDS: §  Users profile could be accessed with different equivalent IDs depending on the service §  Each user ID was defined by an ID type (phone number, email, portal ID, hash…) and the ID valueTelefónica PDI 14
  15. 15. 02 High level logical architecture §  Everything running on Red Hat EL 5.4 64 bitsTelefónica PDI 15
  16. 16. 02 High level logical architecture §  Everything running on Red Hat EL 5.4 64 bitsTelefónica PDI 16
  17. 17. 02 Integration Planned integration •  PS replaces all customers profile and permissions DBs •  All systems access this data through PS real time API •  In special cases, some PS-consumers could use the batch interface. •  The same way new services could be added quite easilyTelefónica PDI 17
  18. 18. 02 Integration Problems arise •  Budget restrictions: adapt all services to use the API was too expensive •  Keep independent systems DBs and synchronize PS through batch •  Use DBs built-in massive extraction feature to generate daily batch files •  However… in most cases those DBs were not able to generate Delta (only changes) extractions §  Provide full daily snapshots!Telefónica PDI 18
  19. 19. 02 First version performance Ireland •  1.8M customers, 180 profile attributes, 6 services •  Sizes §  Tables + indexes size: 65Gb §  30% of the size were indexes •  Batch §  Full DWH customer’s profile import: > 24 hours §  Delta extractions: 4 - 6 hours §  Loads and extractions performance proportional to data size •  API: §  Response time with average traffic: 110msTelefónica PDI 19
  20. 20. 03The SQL capítuloTítulo del solutionSecond 3 líneasMáximo version
  21. 21. 03 Second version High level logical architecture •  New approach: batch processes access directly DBTelefónica PDI 21
  22. 22. 03 Second version Batch processes •  Batch processes had to §  Validate authentication and authorization §  Verify user, service and attribute existence §  Check equivalent IDs §  Validate sharing matrix rights §  Validate values data type §  Check the update date of the existing valuesTelefónica PDI 22
  23. 23. 03 Second version DB Batch processing BAs O ur DTelefónica PDI 23
  24. 24. 03 Second version New DB-based batch loading process •  Preprocess incoming batch file in BE servers §  Validate format, services and attributes existence and values data types §  Generate intermediate file with structure like target DB table •  Load intermediate file (Oracle’s SQL*Loader) to a temporal table •  Switch DB to “deferred writing”, storing all incoming modifications •  Merge temporal table and final table, checking values update date •  Replace old users attributes values table with merge result •  Apply deferred writing operationsTelefónica PDI 24
  25. 25. 03 Second version New batch extraction process •  Generate a temporal DB table with format similar to final batch file. Two loops over users attributes values table required: §  Select format of the table; number and order of columns / attributes §  Fill the new table •  Loop the whole temporal table for final formatting (empty fields…) •  From batch side loop across the whole table (SELECT * FROM …) •  Write each retrieved row as a line in the resulting fileTelefónica PDI 25
  26. 26. 03 Second version performance Ireland performance requirements •  Batch time window: 3:30 hours §  Full DWH load §  Two Delta loads §  Three Delta extractions •  API: §  Ireland requirement: < 500msTelefónica PDI 26
  27. 27. 03 Second version performance Ireland •  1.8M customers, 180 profile attributes, 6 services •  Sizes §  Tables + indexes size: 65Gb §  30% of the size were indexes §  Temporal tables size increases almost exponentially: 15Gb and above §  Intermediate file size: from 700Mb to 7Gb •  Batch §  Full DWH customer’s profile import: 2:30 hours §  Delta extractions: 1:00 hour §  Loads performance worsened quickly (almost exp): 6:00 hours §  Extractions performance proportional to data size §  Concurrent batch processes may halt the DB •  API: §  Response time with average traffic: 80ms §  Response time while loading was unpredictable: >300msTelefónica PDI 27
  28. 28. 04The SQL capítuloTítulo del solutionMáximo 3 líneasThird version
  29. 29. 04 Third version Speed up DB Batch processes gain) A s (a Our DBTelefónica PDI 29
  30. 30. 04 Third version New (second) DB-based batch loading process •  Minor preprocessing of incoming batch file in BE servers §  Just validate format §  No intermediate file needed! •  Load validated file (Oracle’s SQL*Loader) to a temporal table •  Loop the temporal table merging the values into final table, checking values update date and data types §  Use several concurrent writing jobs •  Store results on real table, no need to replace! •  No “deferred writing”!Telefónica PDI 30
  31. 31. 04 Third version Enhancements to extraction process •  Optimized loops to generate temporal output table. §  Use several concurrent writing jobs §  We achieved a speed-up of between 1.5 and 2 •  Loop the whole temporal table for final formatting (empty fields…) •  Download and write lines directly inside Oracle’s sqlplus •  No SELECT * FROM … query from Batch side!Telefónica PDI 31
  32. 32. 04 Third version performance Ireland •  1.8M customers, 180 profile attributes, 6 services •  Sizes §  Tables + indexes size: 65Gb §  30% of the size were indexes §  Temporal tables: 15Gb •  Batch § Full DWH customer’s profile import: 1:10 hours (vs. 2:30 hours) § Three Delta extractions: 2:15 hours (vs. 3:00 hours) § Loads and extractions performance proportional to data size § Concurrent batch processes not so harmful s DBA •  API: Our F**K YEAH §  Response time with average traffic: 110ms §  Response time while loading: 400msTelefónica PDI 32
  33. 33. 04 Third version performance United Kingdom •  25M customers, 150 profile attributes, 15 services •  Sizes §  Tables + indexes size: 700Gb §  40% of the size were indexes •  Batch §  Two Delta imports: < 2:00 hours §  Two Delta extractions: < 2:00 hours §  Loads and extractions performance proportional to data size •  API: §  Response time with average traffic: 90ms s DBA Our F**K YEAHTelefónica PDI 33
  34. 34. 04 Third version performance Ireland 3rd version 2nd version DB size 65Gb + 15Gb (temp) 65Gb + > 15Gb Full DWH load 1:10 hours 2:30 hours Three Delta exports 2:15 hours 3:00 hours Batch stability Stable, linear Unstable, exponential API response time 110ms 110ms API while loading 400ms Unpredictable United Kingdom 3rd version DB size 700Gb s Two Delta loads < 2:00 hours DBA Our Three Delta exports < 2:00 hours F**K YEAH API response time 90msTelefónica PDI 34
  35. 35. 04 Third version performance DB stats •  20 database tables •  API: several queries with up to 35 joins and even some unions •  Authorization: 5 joins to validate auth users access •  Batch: §  Load: 1700 lines of PL/SQL §  Extraction: 1200 of PL/SQLTelefónica PDI 35
  36. 36. 04 Mission completed?Telefónica PDI 36
  37. 37. 04 Third version performance Mexico •  20M customers, 200 profile attributes, 10 services •  Mexico time window: 4:00 hours §  Full DWH load! §  Additional Delta feeds loads §  At least two Delta extractions D BAs OurTelefónica PDI 37
  38. 38. 05Título del capítuloThe NoSQL solutionMáximo 3 líneas
  39. 39. 05 MongoDB Data Model Services and their profile + sharing matrix { _id : 7, service_name : "root", id_type : 1, default_values: false, attrib_id = service_id * 10000 + num attribs + 1 owned_attribs : [ { attrib_id : 70005, attrib_nane : “marketing.consent", attrib_data_type : 1, attrib_def_value : "no", attrib_status : 1 }, ... attrib_id = service_id * 10000 + num attribs + 1 ], shared_attribs : [ {attrib_id : 20144, sharing_mode : 0}, ... ] }Telefónica PDI 39
  40. 40. 05 MongoDB Data Model Users and their profile + multiple IDs { _id : "011234" Equivalent ID document: services_list : [ _id = “id type” + “user ID” { { _id : “05abcd" service_id : 1, ue : "011234" reg_date : {"$date" : 1318040693000} } }, ... _id = “id type” + “user ID” ], user_values : attrib_id = service_id * 10000 + num attribs + 1 [ { attrib_id : 10140, attrib_value : "Open", update_date : {"$date" : 1317110161000} }, ... ] }Telefónica PDI 40
  41. 41. 05 MongoDB Data Model Authorization system ROLES COLLECTION: { AUTH USERS COLLECTION: _id: PS_ADMIN_ROLE, roles_resources: [ { { _id: "admin" resource_id: "admin.**”, auth_pswd: ”XXX", method: R }, auth_roles: [PS_ADMIN_ROLE’, …], { auth_uris: [ resource_id: "stats.**”, {uri_path: "/**", method: R}, method: IMPORT }, {uri_path: "/stats/**", method: RW}, ... {uri_path: "/kpis/**", method: ’IMPORT}, ] ... } ] } RESOURCES COLLECTION: { _id: "admin.**", Replicate uris (from resources) role_uri: "/**" and methods (from roles) }Telefónica PDI 41
  42. 42. 05 MongoDB Data Model DB stats •  Only 5 collections •  API: typically 2 accesses (services and users collections) •  Authorization: access only 1 collection to grant access •  Batch: all processing done outside DBTelefónica PDI 42
  43. 43. 05 NoSQL version High level logical architecture §  Everything running on Red Hat EL 6.2 64 bitsTelefónica PDI 43
  44. 44. 05 NoSQL version performance Ireland (at PDI lab) •  1.8M customers, 180 profile attributes, 6 services •  Sizes §  Collections + indexes size: 20Gb (vs. 65Gb) §  < 5% of the size are indexes (vs. 30%) •  Batch §  Full DWH customer’s profile import: 0:12 hours (vs. 1:10 hours) §  Three Delta extractions: 0:40 hours (vs. 2:15 hours) §  Loads and extractions performance proportional to data size §  Concurrent batch processes without performance affection •  API: §  Response time with average traffic: < 10ms (vs. 110ms) §  Response time while loading: the same §  High load (600 TPS) response time while loading: 300msTelefónica PDI 44
  45. 45. 05 NoSQL version performance United Kingdom (at PDI lab) •  25M customers, 150 profile attributes, 15 services •  Sizes §  Collections + indexes size: 210Gb (vs. 700Gb) §  < 5% of the size were indexes •  Batch §  Two Delta imports: < 0:40 hours (vs. 2:00 hours) §  Loads and extractions performance proportional to data sizeTelefónica PDI 45
  46. 46. 05 NoSQL version performance Mexico •  20M customers, 200 profile attributes, 15 services •  Sizes §  Collections + indexes size: 320Gb §  Indexes size: 1.2Gb •  Batch §  Initial Full import (20M, 40 attributes): 2:00 hours §  Small Full import (20M, 6 attributes): 0:40 hours •  API: §  Response time with average traffic: < 10ms (vs. 90ms) §  Response time while loading: the same §  High load (500 TPS) response time while loading: 270msTelefónica PDI 46
  47. 47. 04 NoSQL version performance Ireland NoSQL version SQL version DB size 20Gb 80Gb Full DWH load 0:12 hours 1:10 hours Three Delta exports 0:40 hours 2:15 hours API while loading < 10ms 400ms API 600TPS + loading 300ms Timeout / failure United Kingdom NoSQL version SQL version DB size 210Gb 700Gb Two Delta loads < 0:40hours < 2:00 hours Mexico NoSQL version DB size 320Gb Initial Full load (40 attr) 2:00 hours Daily Full load (6 attr) 0:40 hours D BAs Our API while loading < 10ms API 500TPSTelefónica PDI + loading 270ms 47
  48. 48. 05 Mission completed?Telefónica PDI 48
  49. 49. 05 The bad •  Batch load process was too fast §  To keep secondary nodes synched we needed oplog of 16 or 24Gb §  We had to disable journaling for the first migrations •  Labels of documents fields take up disc space §  Reduced them to just 2 chars: “attribute_id” -> “ai” •  Respect the unwritten law of at least 70% of size in RAM •  Take care with compound indexes, order matters §  You can save one index… or you can have problems §  Put most important key (never nullable) the first one •  DBAs whining and complaining about NoSQL §  “If we had enough RAM for all data, Oracle would outperform MongoDB”Telefónica PDI 49
  50. 50. 05 The ugly •  Second migration once the PS is already running §  Full import adding 30 new attributes values: 10:00 hours §  Full import adding 150 new attributes values: 40:00 hours •  Increase considerably documents size (i.e. adding lots of new values to the users) makes MongoDB rearrange the documents, performing around 5 times slower §  That’s a problem when you are updating 10k documents per second •  Solutions? §  Avoid this situation at all cost. Run away! §  Normalize users values; move to a new individual collection §  Prealloc the size with a faux field •  You could waste space! §  Load in new collection, merge and swap, like we did in OracleTelefónica PDI 50
  51. 51. 06Título del capítuloConclusionsTítulo del capítulo
Máximo líneasMáximo 3 3 líneas
  52. 52. 06 Conclusions & personal thoughts •  Awesome performance boost §  But not all use cases fit in a MongoDB / NoSQL solution! •  New technology, different limitations •  Fear of the unknown §  SSDs performance? §  Long term performance and stability? •  Python + MongoDB + pymongo = fast development §  I mean, really fast •  MongoDB Monitoring Service (MMS) •  10gen people were very helpfulTelefónica PDI 52
  53. 53. 06 Questions?Telefónica PDI 53
  54. 54. 0X SQL Physical architecture §  Scale horizontally adding more BE or DB servers or disks in the SAN §  Virtualized or physical servers depending on the deploymentTelefónica PDI 55
  55. 55. 0X MongoDB Physical architecture §  MongoDB arbiters running on BE servers §  Scale horizontally adding more BE servers or disks in the SAN §  Sharding may already be configured to scale adding more replica setsTelefónica PDI 56

×