Your SlideShare is downloading. ×

Leveraging MongoDB as a Data Store for Security Data

497

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
497
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
14
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Working set
    Only one database is “heated” with inserts
    Only one database must be in RAM, others may go in and out with queries/map reduces.

  • Again, In a perfect world, the rate at which I can throw documents at MongoDB would not be in any way related to total database size.


    As the total data size goes up, and the capped collection kicks in:
    <click>

    Remember, this is real disk backing this, so we must be absolutely 100% sure the database size always fits in the allocated disk space.

    We don’t have room to “accidently” raise the blue line.


    Same from before. It would be really nice if the documents per second would stay constant.
    <click>

    Remember, we just settled for the diminished throughput because we couldn’t afford to scale up the node to the point our data would fit in RAM, making it look more like this
    <click>

    Then we agreed that we really like the capped collection behavior, which made our graph look more like this.
    <click>

    Apply a little bit of customization and we help mongo keep the most recent, dare I say the most important data in RAM.

    We helped Mongo expire data, not on a per document level where it was forced to manage its giant B-TREE index, but on a “bucket” level, allowing Mongo to throw its data to the OS for removal.
  • Its noisy, but in general it maintains performance regardless of data size and whether it is working to prune data.

    With only a little bit of effort, we scaled-out a single mongod, without scaling-up any of the physical resources.

  • The system hums along happily when each bucket is sized somewhere between one-quarter to one-third RAM.

    This is MongoR implementation specific!

    Follow standard Mongo practices like fstab parameters and numa control first.

    Assumes the client application itself isn’t a HEAVY user of RAM.



    Our implementation has the client application poll MongoR, checking if there is a “new” database handle.

    If there are many client applications on each box, their polling may not be aligned and whenever that rotate happens, there may be 2 ‘active’ buckets for a period of minutes or hours or depending on how often the client polls for a new database.

    There needs to be headroom for queries / aggregation / map reduce for the historical data to be pulled up into RAM without kicking the “warm” data out of RAM.

  • Remember the Saw tooth pattern

    We set the rotation to occur overnight to have minimal impact around the possibility of 2 active buckets.

    MongoDB allocates 2GB buckets at a time, so if the system absorbs more than 2GB between rotation checks, it could go “over” the allocated space.

    We found no limit to the number of buckets. We run about 100 30GB buckets per pizza-box.
  • What do we want?
    I don’t think MongoR is the end-all solution to this problem. When I built this system, I got the feeling that a lot of people had this problem, but everyone dealt with it separately.

    We formed a great relationship with our MongoDB contact who gave us enough hints that we should be concerned with our working set.

    This behavior is valuable to us, I hope the behavior is valuable to others. The best case scenario is that we convince MongoDB to build a behavior set like this directly into MongoDB so I can abandon my implementation.
  • Transcript

    • 1. MongoDB as a Data Store for Security Data Scaling out the mongod node Daniel Bauman Sr. Cyber Intelligence Analyst LM-CIRT © 2012 Lockheed Martin Corporation. All Rights Reserved.
    • 2. Contexts 2 Information 01101100 01101101 01100011 01101111 Influence (Application) Intelligence © 2014 Lockheed Martin Corporation. All Rights Reserved.
    • 3. 3 Key Brick Walls © 2014 Lockheed Martin Corporation. All Rights Reserved.3 1 • Isolation 2 • Retention 3 • Access
    • 4. Isolated Information © 2014 Lockheed Martin Corporation. All Rights Reserved.4 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111
    • 5. Isolated Information © 2014 Lockheed Martin Corporation. All Rights Reserved.5 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111 01101100 01101101 01100011 01101111
    • 6. Pizza Boxes © 2014 Lockheed Martin Corporation. All Rights Reserved.6 ✔
    • 7. Single Pizza Box Throughput © 2014 Lockheed Martin Corporation. All Rights Reserved.7 ✔
    • 8. Pizza Boxes © 2014 Lockheed Martin Corporation. All Rights Reserved.8 ✔
    • 9. © 2014 Lockheed Martin Corporation. All Rights Reserved.9 2 • Retention
    • 10. The Dream – MongoD Standard Install © 2014 Lockheed Martin Corporation. All Rights Reserved.10 Documents per SecondData Size Data Size vs Documents/sec Size time Documents/sec
    • 11. Data Size vs Documents/sec The Reality – MongoD Standard Install © 2014 Lockheed Martin Corporation. All Rights Reserved.11 Documents per SecondData Size File size vs Inserts Size time Documents/sec
    • 12. The Dream – Data Retention © 2014 Lockheed Martin Corporation. All Rights Reserved.12 Documents per SecondData Size Data Size vs Documents/sec Size time Documents/sec
    • 13. Mongo DatabaseDisk Is FULL Single Pizza Box Data Retention © 2014 Lockheed Martin Corporation. All Rights Reserved.13 Trash
    • 14. The Reality – MongoD Capped Collection © 2014 Lockheed Martin Corporation. All Rights Reserved.14 Documents per SecondData Size File size vs Inserts Size time Documents/sec
    • 15. © 2014 Lockheed Martin Corporation. All Rights Reserved.15 3 • Access
    • 16. The Dream - Querying the Cloud © 2014 Lockheed Martin Corporation. All Rights Reserved.16 Query Response 01101100011011010110001101 11000110110101100011010110 01101011011000110110101100 01101011010110001101100011 11000110101101100011011010
    • 17. And now for something less technical © 2014 Lockheed Martin Corporation. All Rights Reserved.17
    • 18. 172.100.178.247 Information Retrieval 172.100.27.143 172.100.164.66 172.100.255.250 172.100.235.24 172.100.195.178 172.100.7.227 172.100.215.227 172.100.31.0 172.100.81.242 172.100.156.25 172.100.139.53 172.100.235.229 172.100.25.137 172.100.171.91 172.100.71.242 172.100.108.64 172.100.96.73 172.100.126.217 172.100.77.25 172.100.214.219 172.100.102.211 172.100.124.176 172.100.96.81 172.100.131.150 172.100.98.250 172.100.178.247 172.100.138.157 172.100.45.67 172.100.122.239 172.100.138.218 172.100.102.110 172.100.49.93 172.100.245.74 172.100.213.39 172.100.80.14 172.100.41.125 172.100.150.202 172.100.1.184 172.100.149.233 172.100.98.83 172.100.199.75 172.100.244.223 172.100.140.69 172.100.187.27 172.100.209.228 172.100.6.249 172.100.60.48 172.100.138.64 172.100.130.181 172.100.188.177 172.100.142.25 172.100.109.79 172.100.70.58 172.100.65.184 172.100.250.150 172.100.215.195 172.100.137.136 172.100.49.64 172.100.148.19 172.100.244.227 172.100.178.131 172.100.255.199 172.100.65.112 172.100.201.249 172.100.53.21 172.100.235.60 172.100.84.205 172.100.16.194 172.100.216.90 172.100.45.88 172.100.240.174 172.100.248.179 172.100.48.70 172.100.8.200 172.100.45.130 172.100.235.59 172.100.171.231 172.100.29.124 172.100.239.204 172.100.172.241 172.100.158.216 172.100.70.109 172.100.227.117 172.100.144.199 172.100.223.36 172.100.166.60 172.100.48.61 172.100.70.76 172.100.51.152 172.100.157.95 172.100.71.133 172.100.0.25 172.100.167.58 172.100.94.133 172.100.93.92 172.100.192.109 172.100.176.25 172.100.169.236 172.100.164.186 © 2014 Lockheed Martin Corporation. All Rights Reserved.18 “1.0 second is about the limit for the user’s flow of thought to stay uninterrupted” – Nielson (1993) J. Nielsen, "Response times: the three important limits," 1993
    • 19. Information Retrieval – 10 seconds © 2014 Lockheed Martin Corporation. All Rights Reserved.19 1968 R. Miller, "Response time in man-computer conversational transaction," “response delays of a standard ten seconds will not permit the kind of thinking continuity essential to sustained problem solving” – R. Miller(1968)
    • 20. Diving Back In © 2014 Lockheed Martin Corporation. All Rights Reserved.20
    • 21. Random Data Access © 2014 Lockheed Martin Corporation. All Rights Reserved.21 past recent Documents
    • 22. Python-MongoR (R for Retention) Distributed database expansion to MongoDB designed to optimize scale-out, write intensive document storage © 2014 Lockheed Martin Corporation. All Rights Reserved.
    • 23. Data Buckets © 2014 Lockheed Martin Corporation. All Rights Reserved.23 past recent Documents
    • 24. MongoR Buckets © 2014 Lockheed Martin Corporation. All Rights Reserved.24 past recent DB DB DB DB DB DB
    • 25. MongoR Automated Segmenting © 2014 Lockheed Martin Corporation. All Rights Reserved.25 past recent DB DB DB DB DBDB DB DB DB DBGenerator
    • 26. Mongo Disk Is Full Mongo MongoR Retention © 2014 Lockheed Martin Corporation. All Rights Reserved.26 Trash Mongo Mongo Mongo
    • 27. MongoR Mongo MongoR “Capped Collection” © 2014 Lockheed Martin Corporation. All Rights Reserved.27 Mongo Mongo Mongo
    • 28. MongoR Destructor © 2014 Lockheed Martin Corporation. All Rights Reserved.28 past recent DB DB DBDB GeneratorDestructor
    • 29. MongoR Destructor © 2014 Lockheed Martin Corporation. All Rights Reserved.29 past recent DB DB DB DB DB DB DB DBDB DB DB DB DB DB DB DB DB DB DBDB DB DB DBGenerator
    • 30. The Real © 2014 Lockheed Martin Corporation. All Rights Reserved.30 Documents per SecondData Size Data Size vs Documents/sec Size time Documents/sec
    • 31. MongoR Production Behavior. © 2014 Lockheed Martin Corporation. All Rights Reserved.31
    • 32. Best Practices – Bucket Size Bucket size = ¼ RAM size © 2014 Lockheed Martin Corporation. All Rights Reserved.32 System RAM Mongo Mongo Mongo Mongo
    • 33. Best Practices – Bucket Limit Bucket Limit = 85-90% Capacity © 2014 Lockheed Martin Corporation. All Rights Reserved.33 System Drive Capacity
    • 34. Python-mongor In Production • MIT Licensed – https://github.com/lmco/python-mongor © 2014 Lockheed Martin Corporation. All Rights Reserved.34
    • 35. Questions 35 © 2014 Lockheed Martin Corporation. All Rights Reserved.

    ×