Lean & Agile with MongoDB                 MongoMunich 2012#MongoDBMunich@comsysto
About us           2
About us•   first partner of 10gen in Germany (January 2012)                                                      3
About me•   Lead DevOps Engineer at comsysto•   @loomit•   Data Nerd•   3 years of high performance web ops•   joined comS...
Questions• Please ask during the presentation!                                        5
Lean?        6
Lean?Continuous Innovation   7
Lean?• Instant feedback from customers about  features• eliminate waste                                          8
Eliminate waste                  9
Agile?• Iterative and incremental                              10
SCRUM• Scrum is a framework for developing and  sustaining complex products                                            11
Kanban• Pull from a work queue• originated at Toyota in the 1950s                                      12
Agile Adoption• Ken Schwaber                           13
Agile Adoption• “There is no SCRUM police”                               14
Agile Adoption• “Use your intelligence”                              15
Agile Adoption• Dogmatic Slumber                            16
Don’t be the little girl                           17
Don’t be the Joker                     18
Cross functional teams                         19
Cross functional teams                         20
8 hats         21
Co-location              22
Appreciation for simplicity• “Everything should be as simple as possible,  but not simpler”• paraphrased Albert Einstein  ...
Look familiar?                 24
NOSQL        25
Schema Free“Your data schema is a direct corollary with how you view yourbusiness’ direction and tech goals. When you pivo...
Emergent Architectures                         27
Move fast and break things                             28
NOSQL        29
Scale out            30
AWS• MongoDB mostly I/O bound• Storage matters                             31
AWS•   EBS (anywhere from 70 to 300 ops/sec)•   EBS provisioned IOPS (stable)•   Ephemeral•   SSD (much higher ops/sec but...
MongoDB AWS Storage                      33
AWS• Naming really matters  – combine with Route 53  – ec2-174-129-227-92.compute-1.amazonaws.com?                        ...
Sharded Setup                35
MongoDB on AWS                 36
Infrastructure as code                         37
Use Cases• Real-Time Analytics Software• Operational Intelligence• High Volume Data Feeds• Hadoop                         ...
Patterns• Pre Aggregation• Batch  – Hadoop  – MapReduce (in MongoDB)  – Aggregation Framework                             39
Pre-Aggregation• Problem:  – You require up-to-the minute data, or up-to-the-second if    possible  – The queries for rang...
Pre-Aggregation• Best practises  – $inc and upsert are your friend  – pre-allocate documents  – use REST interface        ...
Batch• MapReduce• Aggregation Framework• Mongo-Hadoop Connector                           42
Mongo Hadoop ConnectorData Storage     Data Processing                                   43
Projects• What we have done so far...                                44
Real Time Twitter Heatmap                            45
Real Time Twitter Heatmap• The bubbles in the sea? Friendly Floatees!                               46
Friendly Floatees                    47
Flow       48
Real Time Twitter Heatmap•   MongoDB Capped Collections•   Flask•   Redis•   Google Maps•   heatmaps.js•   Server-Sent Eve...
Pizza Quattro Shardoni                         50
Quattro Shardoni•   Technology Showcase Product•   Complete End2End stack•   Real Time Charting•   Batch Reporting based o...
Quattro Shardoni                   52
Quattro Shardoni                   53
Quattro Shardoni                   54
Quattro Shardoni• Vortrag heute 12:15 BallSaal A     Tom Zorc                      Bernd Zuther   55
Operational Intelligence                           56
Operational Intelligence• Analyze behavior of users in web shop• Recommend NBA for business• Real Time Analytics          ...
Online ShopREST               58
Operational Intelligence•   Next best activity for support/callcenter•   interpret user session•   e.g. “RaspberryPi - str...
Operational Intelligence                           60
Operational Intelligence                           61
It’s Real Time!                  62
Big Data Project• “which analyzes and visualizes data of mobile  networks”                                              63
Big Data Project                   64
Big Data Project                   65
Big Data Project                   66
Big Data Project• started as prototype, in production now ;-)                                                66
Big Data Project• started as prototype, in production now ;-)• “beyond agile”                                             ...
Big Data Project• started as prototype, in production now ;-)• “beyond agile”• going from                                 ...
Big Data Project• started as prototype, in production now ;-)• “beyond agile”• going from  – fetch all, calculate in servi...
Big Data Project• started as prototype, in production now ;-)• “beyond agile”• going from  – fetch all, calculate in servi...
Big Data Project• started as prototype, in production now ;-)• “beyond agile”• going from  – fetch all, calculate in servi...
Big Data Project• started as prototype, in production now ;-)• “beyond agile”• going from  – fetch all, calculate in servi...
Big Data Project• started as prototype, in production now ;-)• “beyond agile”• going from  – fetch all, calculate in servi...
Big Data Project                   67
Big Data Project                   68
Big Data Project• why not use Aggregation Framework?  – we started with 2.0.6  – would have had to change data model  – M/...
Big Data Project• Numbers  – data comes in weekly increments  – 2TB raw data  – 14GB / week (into MongoDB)  – data grows i...
MongoDB on AWS                 71
Big Data Project• Geo Spatial Features  – $within queries (bounding box)  – $near queries                                 ...
Big Data Project                   73
Big Data ProjectRaw Data       MapReduce                              74
Big Data Project• more polygons -> more data  – key length can become an issue• using polygons to display cell metrics• tr...
Big Data Project• key-size per doc: 1.8KB  – bad: {very_descriptive_long_key : “yay”}  – good { v : “yay”}                ...
Big Data Project                100000 polygons           500000 polygons            0          100.0      200.0       300...
Big Data Project                   78
Big Data Project• 308GB of EBS storage => 332$ per year  – backups / snapshot not considered                              ...
Big Data Project• Future Plans  – new Use Case  – expecting about 1TB of data / week                                      ...
Conclusion•   rapidly changing business needs•   ease of collecting huge amounts of data•   infrastructure as part of code...
Comments?•   @comsysto•   #MongoMunich2012•   http://blog.comsysto.com•   Don’t forget the hallway track•   Mongo User Gro...
We are hiring!• http://careers.comsysto.com                                83
Lean & Agile with MongoDB                 MongoMunich 2012#MongoDBMunich@comsysto
Upcoming SlideShare
Loading in …5
×

Lean & agile with MongoDB

1,067 views

Published on

The slidedeck of my talk at MongoMunich 2012

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,067
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • Build measure learn\n
  • Messpunkte setzen\n
  • \n
  • \n
  • Always have a potentially shippable piece of software\n
  • Sprints, Backlog, Rollen, minimiert Risiko\n
  • basiert auf abgeschlossenen Arbeitseinheiten, Status muss sichtbar sein, Kanban Board\n
  • Scrum Day 2012 Walldorf SAP, \n
  • \n
  • \n
  • Immanuel Kant\nÜberleitung: in agilen Projekten geht es um Kommunikation, Grenzen abbauen, Bereichsdenken auflösen\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • zusammenfassend: Agile Entwicklung heisst Komplexität auflösen (Kommunikation, Meetings, Overhead)\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • PIOPS - EBS gespiegelt\nReplica Sets? zusätzliche Redundanz\nKosten sparen, Komplexität sparen\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Beispiel Web Logfiles\n
  • Single Threaded SpiderMonkey JS Engine\nv8?\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • SSEs open a single unidirectional channel between server and client\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • NBA: Anruf, Banneraussteuerung, Email\nNext Step: Recommendation Engine\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Daten in Arrays\nAbfragen über mehr als eine Collection\n
  • \n
  • \n
  • \n
  • Bounding Box -> Kartenausschnitt\nnear -> nächste 1000 Zellen, die geladen werden und POIs\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Lean & agile with MongoDB

    1. 1. Lean & Agile with MongoDB MongoMunich 2012#MongoDBMunich@comsysto
    2. 2. About us 2
    3. 3. About us• first partner of 10gen in Germany (January 2012) 3
    4. 4. About me• Lead DevOps Engineer at comsysto• @loomit• Data Nerd• 3 years of high performance web ops• joined comSysto in March 2012 4
    5. 5. Questions• Please ask during the presentation! 5
    6. 6. Lean? 6
    7. 7. Lean?Continuous Innovation 7
    8. 8. Lean?• Instant feedback from customers about features• eliminate waste 8
    9. 9. Eliminate waste 9
    10. 10. Agile?• Iterative and incremental 10
    11. 11. SCRUM• Scrum is a framework for developing and sustaining complex products 11
    12. 12. Kanban• Pull from a work queue• originated at Toyota in the 1950s 12
    13. 13. Agile Adoption• Ken Schwaber 13
    14. 14. Agile Adoption• “There is no SCRUM police” 14
    15. 15. Agile Adoption• “Use your intelligence” 15
    16. 16. Agile Adoption• Dogmatic Slumber 16
    17. 17. Don’t be the little girl 17
    18. 18. Don’t be the Joker 18
    19. 19. Cross functional teams 19
    20. 20. Cross functional teams 20
    21. 21. 8 hats 21
    22. 22. Co-location 22
    23. 23. Appreciation for simplicity• “Everything should be as simple as possible, but not simpler”• paraphrased Albert Einstein 23
    24. 24. Look familiar? 24
    25. 25. NOSQL 25
    26. 26. Schema Free“Your data schema is a direct corollary with how you view yourbusiness’ direction and tech goals. When you pivot, especially if it’sa significant one, your data may no longer make sense in thecontext of that change. Give yourself room to breath. A schema-lessdata model is MUCH easier to adapt to rapidly changingrequirements than a highly structured, rigidly enforced schema.”from:http://www.cleverkoala.com/2010/08/why-your-startup-should-be-using-mongodb/ 26
    27. 27. Emergent Architectures 27
    28. 28. Move fast and break things 28
    29. 29. NOSQL 29
    30. 30. Scale out 30
    31. 31. AWS• MongoDB mostly I/O bound• Storage matters 31
    32. 32. AWS• EBS (anywhere from 70 to 300 ops/sec)• EBS provisioned IOPS (stable)• Ephemeral• SSD (much higher ops/sec but costly)• use RAID on EC2 (or not?) 32
    33. 33. MongoDB AWS Storage 33
    34. 34. AWS• Naming really matters – combine with Route 53 – ec2-174-129-227-92.compute-1.amazonaws.com? 34
    35. 35. Sharded Setup 35
    36. 36. MongoDB on AWS 36
    37. 37. Infrastructure as code 37
    38. 38. Use Cases• Real-Time Analytics Software• Operational Intelligence• High Volume Data Feeds• Hadoop 38
    39. 39. Patterns• Pre Aggregation• Batch – Hadoop – MapReduce (in MongoDB) – Aggregation Framework 39
    40. 40. Pre-Aggregation• Problem: – You require up-to-the minute data, or up-to-the-second if possible – The queries for ranges of data (by time) must be as fast as possible 40
    41. 41. Pre-Aggregation• Best practises – $inc and upsert are your friend – pre-allocate documents – use REST interface 41
    42. 42. Batch• MapReduce• Aggregation Framework• Mongo-Hadoop Connector 42
    43. 43. Mongo Hadoop ConnectorData Storage Data Processing 43
    44. 44. Projects• What we have done so far... 44
    45. 45. Real Time Twitter Heatmap 45
    46. 46. Real Time Twitter Heatmap• The bubbles in the sea? Friendly Floatees! 46
    47. 47. Friendly Floatees 47
    48. 48. Flow 48
    49. 49. Real Time Twitter Heatmap• MongoDB Capped Collections• Flask• Redis• Google Maps• heatmaps.js• Server-Sent Events• http://bit.ly/Ou5SsP 49
    50. 50. Pizza Quattro Shardoni 50
    51. 51. Quattro Shardoni• Technology Showcase Product• Complete End2End stack• Real Time Charting• Batch Reporting based on Hadoop 51
    52. 52. Quattro Shardoni 52
    53. 53. Quattro Shardoni 53
    54. 54. Quattro Shardoni 54
    55. 55. Quattro Shardoni• Vortrag heute 12:15 BallSaal A Tom Zorc Bernd Zuther 55
    56. 56. Operational Intelligence 56
    57. 57. Operational Intelligence• Analyze behavior of users in web shop• Recommend NBA for business• Real Time Analytics 57
    58. 58. Online ShopREST 58
    59. 59. Operational Intelligence• Next best activity for support/callcenter• interpret user session• e.g. “RaspberryPi - strong interest”• exp. 2000 events per second 59
    60. 60. Operational Intelligence 60
    61. 61. Operational Intelligence 61
    62. 62. It’s Real Time! 62
    63. 63. Big Data Project• “which analyzes and visualizes data of mobile networks” 63
    64. 64. Big Data Project 64
    65. 65. Big Data Project 65
    66. 66. Big Data Project 66
    67. 67. Big Data Project• started as prototype, in production now ;-) 66
    68. 68. Big Data Project• started as prototype, in production now ;-)• “beyond agile” 66
    69. 69. Big Data Project• started as prototype, in production now ;-)• “beyond agile”• going from 66
    70. 70. Big Data Project• started as prototype, in production now ;-)• “beyond agile”• going from – fetch all, calculate in service layer 66
    71. 71. Big Data Project• started as prototype, in production now ;-)• “beyond agile”• going from – fetch all, calculate in service layer – use MongoDB MapReduce on single node 66
    72. 72. Big Data Project• started as prototype, in production now ;-)• “beyond agile”• going from – fetch all, calculate in service layer – use MongoDB MapReduce on single node – use MongoDB MapReduce on 5 shards 66
    73. 73. Big Data Project• started as prototype, in production now ;-)• “beyond agile”• going from – fetch all, calculate in service layer – use MongoDB MapReduce on single node – use MongoDB MapReduce on 5 shards – use MongoDB MapReduce on 24 shards (2 hi1.4xlarge instances) 66
    74. 74. Big Data Project• started as prototype, in production now ;-)• “beyond agile”• going from – fetch all, calculate in service layer – use MongoDB MapReduce on single node – use MongoDB MapReduce on 5 shards – use MongoDB MapReduce on 24 shards (2 hi1.4xlarge instances) – use EMR (around 10 m2.4xlarge instances) 66
    75. 75. Big Data Project 67
    76. 76. Big Data Project 68
    77. 77. Big Data Project• why not use Aggregation Framework? – we started with 2.0.6 – would have had to change data model – M/R seemed the way to go (data size) 69
    78. 78. Big Data Project• Numbers – data comes in weekly increments – 2TB raw data – 14GB / week (into MongoDB) – data grows in direct proportion to polygon count – currently 1 replica set of 3 m2.4xlarge instances 70
    79. 79. MongoDB on AWS 71
    80. 80. Big Data Project• Geo Spatial Features – $within queries (bounding box) – $near queries 72
    81. 81. Big Data Project 73
    82. 82. Big Data ProjectRaw Data MapReduce 74
    83. 83. Big Data Project• more polygons -> more data – key length can become an issue• using polygons to display cell metrics• tried different types of visualizations 75
    84. 84. Big Data Project• key-size per doc: 1.8KB – bad: {very_descriptive_long_key : “yay”} – good { v : “yay”} 76
    85. 85. Big Data Project 100000 polygons 500000 polygons 0 100.0 200.0 300.0 400.0 62GB / year 308 77
    86. 86. Big Data Project 78
    87. 87. Big Data Project• 308GB of EBS storage => 332$ per year – backups / snapshot not considered 79
    88. 88. Big Data Project• Future Plans – new Use Case – expecting about 1TB of data / week 80
    89. 89. Conclusion• rapidly changing business needs• ease of collecting huge amounts of data• infrastructure as part of code• MongoDB provides flexibility 81
    90. 90. Comments?• @comsysto• #MongoMunich2012• http://blog.comsysto.com• Don’t forget the hallway track• Mongo User Group Munich – http://www.meetup.com/Muenchen-MongoDB- User-Group/ 82
    91. 91. We are hiring!• http://careers.comsysto.com 83
    92. 92. Lean & Agile with MongoDB MongoMunich 2012#MongoDBMunich@comsysto

    ×