Lean & agile with MongoDB

Lean & Agile with MongoDB

MongoMunich 2012

#MongoDBMunich
@comsysto

About us
• ﬁrst partner of 10gen in Germany (January 2012)

3

About me
• Lead DevOps Engineer at comsysto
• @loomit
• Data Nerd
• 3 years of high performance web ops
• joined comSysto in March 2012

4

Questions
• Please ask during the presentation!

5

Lean?

Continuous Innovation 7

Lean?
• Instant feedback from customers about
features
• eliminate waste

8

Agile?
• Iterative and incremental

10

SCRUM
• Scrum is a framework for developing and
sustaining complex products

11

Kanban
• Pull from a work queue
• originated at Toyota in the 1950s

12

Agile Adoption
• Ken Schwaber

13

Agile Adoption
• “There is no SCRUM police”

14

Agile Adoption
• “Use your intelligence”

15

Agile Adoption

• Dogmatic Slumber
16

Don’t be the little girl

17

Don’t be the Joker

18

Cross functional teams

19

Cross functional teams

20

Appreciation for simplicity

• “Everything should be as simple as possible,
but not simpler”
• paraphrased Albert Einstein
23

Schema Free
“Your data schema is a direct corollary with how you view your
business’ direction and tech goals. When you pivot, especially if it’s
a signiﬁcant one, your data may no longer make sense in the
context of that change. Give yourself room to breath. A schema-less
data model is MUCH easier to adapt to rapidly changing
requirements than a highly structured, rigidly enforced schema.”

from:
http://www.cleverkoala.com/2010/08/why-your-startup-should-be-
using-mongodb/

26

Emergent Architectures

27

Move fast and break things

28

AWS
• MongoDB mostly I/O bound
• Storage matters

31

AWS
• EBS (anywhere from 70 to 300 ops/sec)
• EBS provisioned IOPS (stable)
• Ephemeral
• SSD (much higher ops/sec but costly)
• use RAID on EC2 (or not?)

32

MongoDB AWS Storage

33

AWS
• Naming really matters
– combine with Route 53
– ec2-174-129-227-92.compute-1.amazonaws.com?

34

Infrastructure as code

37

Use Cases
• Real-Time Analytics Software
• Operational Intelligence
• High Volume Data Feeds
• Hadoop

38

Patterns
• Pre Aggregation
• Batch
– Hadoop
– MapReduce (in MongoDB)
– Aggregation Framework

39

Pre-Aggregation
• Problem:
– You require up-to-the minute data, or up-to-the-second if
possible
– The queries for ranges of data (by time) must be as fast as
possible

40

Pre-Aggregation
• Best practises
– $inc and upsert are your friend
– pre-allocate documents
– use REST interface

41

Batch
• MapReduce
• Aggregation Framework
• Mongo-Hadoop Connector

42

Mongo Hadoop Connector

Data Storage Data Processing

43

Projects
• What we have done so far...

44

Real Time Twitter Heatmap

45

• The bubbles in the sea?

Friendly Floatees!

46

Friendly Floatees

47

• MongoDB Capped Collections
• Flask
• Redis
• Google Maps
• heatmaps.js
• Server-Sent Events
• http://bit.ly/Ou5SsP

49

Pizza Quattro Shardoni

50

Quattro Shardoni
• Technology Showcase Product
• Complete End2End stack
• Real Time Charting
• Batch Reporting based on Hadoop

51

Quattro Shardoni

52

Quattro Shardoni

53

Quattro Shardoni

54

Quattro Shardoni
• Vortrag heute 12:15 BallSaal A

Tom Zorc Bernd Zuther 55

Operational Intelligence

56

• Analyze behavior of users in web shop
• Recommend NBA for business
• Real Time Analytics

57

Online Shop

REST

58

• Next best activity for support/callcenter
• interpret user session
• e.g. “RaspberryPi - strong interest”
• exp. 2000 events per second

59


60


61

It’s Real Time!

62

Big Data Project
• “which analyzes and visualizes data of mobile
networks”

63

Big Data Project

64

Big Data Project

65

Big Data Project

66

Big Data Project
• started as prototype, in production now ;-)

66

Big Data Project
• “beyond agile”

66

Big Data Project
• going from

66

Big Data Project
• going from
– fetch all, calculate in service layer

66

Big Data Project
• going from
– use MongoDB MapReduce on single node

66

Big Data Project
• going from
– use MongoDB MapReduce on 5 shards

66

Big Data Project
• going from
– use MongoDB MapReduce on 24 shards (2
hi1.4xlarge instances)

66

Big Data Project
• going from
– use MongoDB MapReduce on 24 shards (2
hi1.4xlarge instances)
– use EMR (around 10 m2.4xlarge instances)
66

Big Data Project

67

Big Data Project

68

Big Data Project
• why not use Aggregation Framework?
– we started with 2.0.6
– would have had to change data model
– M/R seemed the way to go (data size)

69

Big Data Project
• Numbers
– data comes in weekly increments
– 2TB raw data
– 14GB / week (into MongoDB)
– data grows in direct proportion to polygon count
– currently 1 replica set of 3 m2.4xlarge instances

70

Big Data Project
• Geo Spatial Features
– $within queries (bounding box)
– $near queries

72

Big Data Project

73

Big Data Project

Raw Data MapReduce

74

Big Data Project
• more polygons -> more data
– key length can become an issue
• using polygons to display cell metrics
• tried different types of visualizations

75

Big Data Project
• key-size per doc: 1.8KB
– bad: {very_descriptive_long_key : “yay”}
– good { v : “yay”}

76

Big Data Project
100000 polygons 500000 polygons
0 100.0 200.0 300.0 400.0

62

GB / year
308

77

Big Data Project

78

Big Data Project
• 308GB of EBS storage => 332$ per year
– backups / snapshot not considered

79

Big Data Project
• Future Plans
– new Use Case
– expecting about 1TB of data / week

80

Conclusion
• rapidly changing business needs
• ease of collecting huge amounts of data
• infrastructure as part of code
• MongoDB provides ﬂexibility

81

Comments?
• @comsysto
• #MongoMunich2012
• http://blog.comsysto.com
• Don’t forget the hallway track
• Mongo User Group Munich
– http://www.meetup.com/Muenchen-MongoDB-
User-Group/

82

We are hiring!
• http://careers.comsysto.com

83

Lean & agile with MongoDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Lean & agile with MongoDB

Similar to Lean & agile with MongoDB (20)

Recently uploaded

Recently uploaded (20)

Lean & agile with MongoDB

Editor's Notes