Breaking the Oracle tie;High Performance OLTP and analytics using MongoDB                                 AlexandrosGiamas...
Can you afford to leave half the opportunity on the table?You wont believe it                         You wont believe it ...
The Marketing Communication SuiteWe Generate the marketing messages that work best.For any customer, any product, at any t...
Persado History                  Oracle shop
Persado History
Persado History• Exponentially growing dataset• Data value/KB?
Persado History                  Not anymore...
Persado History                  Transactional Data and Analytics
Transaction (Re)-defined                Social, Mobile, Email, Web, Display, Search                           Which one st...
Conversational and Transactional Properties                         Web based channels
Conversational and Transactional Properties                       Mobile Text Messaging
Flexi-structured dataOne User across campaigns and mediums{   "_id" : ObjectId("511e3cbea9f1fd01
Overall Architecture - Data flow
Sizing transactional data☛   User Terminated data☛   User Originated data☛   Metadata (state for User per campaign and glo...
ETL for OLAPOffline / Online processing•Going online is mostly simpler•Offline must take into account data irregularities ...
ETL for OLAP ☛Custom Data transformation ☛Custom “continueOnError” implementation
AnalyticsFirst cut    - Custom js server-side using $where
Analytics                 GWL            Global Write Lock
Analytics In the real world
Your own mini transactionsBreak down Spring Batch steps in idempotent and non idempotentones•For idempotent steps, just re...
Your own mini transactions Issues •16MB document size limit... •Slow to replay •Hard to test using Selenium
Analytics In the real worldMap Reduce Implementation
Analytics In the real worldCaching layers✓ Caching in collections
Analytics In the real worldCaching layers✓ Caching in ehcache
Analytics using the Aggregation Framework{$project: { "rdd": {             $isoDate: {                 year: {$year:"$_id....
Analytics using the Aggregation FrameworkDouble project phase, followed by grouping results
Analytics using the Aggregation FrameworkPros:   ✓    More flexible than it sounds   ✓    Rapid development   ✓    Easy de...
Fine grained write semantics and asynchronous magic Fine grained write semantics     •WriteConcern.SAFE for most writes   ...
Lessons Learned Use    replica sets    Journaling    Aggregation Framework    MMS Dont use    Development versions ac...
MongoDB on EC2                 4 nodes with 6 mongod processes
MongoDB on EC2 Using LVMsFor high performance, use LVMs with RAID 0 or 10Have your guerilla team ready: http://goo.gl/8NbV7
MongoDB on EC2 Lesson LearnedUnix level tweaks:   •Raise ulimit   •Raise tcp timeout   •Noatimenodirtime   •Use XFS or ex...
Questions?               alexandergiamas@yahoo.com             alexandros.giamas@persado.com
Breaking the Oracle Tie; High Performance OLTP and Analytics Using MongoDB
Upcoming SlideShare
Loading in …5
×

Breaking the Oracle Tie; High Performance OLTP and Analytics Using MongoDB

2,329 views

Published on

This talk is the story of design and implementation of Marketing Communication Suite at Persado. Marketing Communication Suite is a platform serving tens of customers ranging from telecoms to finance and web properties with persuasion marketing language messaging. Our platform uses a range of technologies with the most important being MongoDB for the online transactional and analytical processing of messages. Topics this talk will be about: MongoDB Aggregation vs. Mapreduce Data Modeling Deployment Architecture Migration Scenarios Hybrid Solutions

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,329
On SlideShare
0
From Embeds
0
Number of Embeds
1,082
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Dataset was growing really fast, value/KB is pretty low, it was estimated that the monthly cost for our RAC would be equivalent to the image below
  • Explain async invocation every 3 mins, reading from OLTP cluster, writing to OLAP cluster. Performing custom transformation
  • Breaking the Oracle Tie; High Performance OLTP and Analytics Using MongoDB

    1. 1. Breaking the Oracle tie;High Performance OLTP and analytics using MongoDB AlexandrosGiamas Senior Software Engineer 1
    2. 2. Can you afford to leave half the opportunity on the table?You wont believe it You wont believe it You wont believe itThey dial, you answer on VoIP! Pick an Online Number! Pick an Online Number!Why youll love your Online Number: Why get an Online Number: Why youll love your Online Number:1. Family & friends without VoIP can call 1. Your friends without VoIP can call you 1. Your friends without VoIP can call youyou 2. You answer on VoIP 2. You answer on VoIP2. You answer on VoIP 3. You also have voicemail included 3. You also have voicemail included3. And you can use it from anywhere in theworld I like that! I like that! I like that! 1.11% 1.42% 2.07% …another 16 Million + combinations 4
    3. 3. The Marketing Communication SuiteWe Generate the marketing messages that work best.For any customer, any product, at any time.
    4. 4. Persado History Oracle shop
    5. 5. Persado History
    6. 6. Persado History• Exponentially growing dataset• Data value/KB?
    7. 7. Persado History Not anymore...
    8. 8. Persado History Transactional Data and Analytics
    9. 9. Transaction (Re)-defined Social, Mobile, Email, Web, Display, Search Which one stands out?
    10. 10. Conversational and Transactional Properties Web based channels
    11. 11. Conversational and Transactional Properties Mobile Text Messaging
    12. 12. Flexi-structured dataOne User across campaigns and mediums{ "_id" : ObjectId("511e3cbea9f1fd01
    13. 13. Overall Architecture - Data flow
    14. 14. Sizing transactional data☛ User Terminated data☛ User Originated data☛ Metadata (state for User per campaign and globally)☛ Must hold data in memory, or at least indexes
    15. 15. ETL for OLAPOffline / Online processing•Going online is mostly simpler•Offline must take into account data irregularities (data validationpolicy driven by business needs)
    16. 16. ETL for OLAP ☛Custom Data transformation ☛Custom “continueOnError” implementation
    17. 17. AnalyticsFirst cut - Custom js server-side using $where
    18. 18. Analytics GWL Global Write Lock
    19. 19. Analytics In the real world
    20. 20. Your own mini transactionsBreak down Spring Batch steps in idempotent and non idempotentones•For idempotent steps, just replay them•For non idempotent, replace current state with last known goodstate before latest spring batch step invocation (undo log) and retrythe step
    21. 21. Your own mini transactions Issues •16MB document size limit... •Slow to replay •Hard to test using Selenium
    22. 22. Analytics In the real worldMap Reduce Implementation
    23. 23. Analytics In the real worldCaching layers✓ Caching in collections
    24. 24. Analytics In the real worldCaching layers✓ Caching in ehcache
    25. 25. Analytics using the Aggregation Framework{$project: { "rdd": { $isoDate: { year: {$year:"$_id.receivedDateHour"}, month: {$month:"$_id.receivedDateHour"},dayOfMonth: {$dayOfMonth:"$_id.receivedDateHour"}, hour: {$hour:"$_id.receivedDateHour"} } }, "value.diffDaysSum.0":1, "value.diffDaysSum.1":1, "value.diffDaysSum.2":1 }},{$project:{rdd:1, diffDaysSum : {$add : ["$value.diffDaysSum.0", "$value.diffDaysSum.1", "$value.diffDaysSum.2" ] } }},{$group: { _id:"$rdd", totalSumPerDay: { $sum: "$diffDaysSum" }}}
    26. 26. Analytics using the Aggregation FrameworkDouble project phase, followed by grouping results
    27. 27. Analytics using the Aggregation FrameworkPros: ✓ More flexible than it sounds ✓ Rapid development ✓ Easy debuggingCons: ✘ No custom js supported ✘ Memory limitation ✘ API still evolving
    28. 28. Fine grained write semantics and asynchronous magic Fine grained write semantics •WriteConcern.SAFE for most writes •WriteConcern.REPLICAS_SAFE for writes that are costly to recompute in case of failure Reactive Mongo •Asynchronous and non blocking scala driver for MongoDB •Async writes with WriteConcern.SAFE and callback retry policy in case of error
    29. 29. Lessons Learned Use replica sets Journaling Aggregation Framework MMS Dont use Development versions across the team Unbound datasets that cant fit in memory MapReduceif you dont need to
    30. 30. MongoDB on EC2 4 nodes with 6 mongod processes
    31. 31. MongoDB on EC2 Using LVMsFor high performance, use LVMs with RAID 0 or 10Have your guerilla team ready: http://goo.gl/8NbV7
    32. 32. MongoDB on EC2 Lesson LearnedUnix level tweaks: •Raise ulimit •Raise tcp timeout •Noatimenodirtime •Use XFS or ext4 •Use LVM for snapshotting Use journaling
    33. 33. Questions? alexandergiamas@yahoo.com alexandros.giamas@persado.com

    ×