Your SlideShare is downloading. ×
Mongodb beijingconf yottaa_3.3
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Mongodb beijingconf yottaa_3.3

4,065

Published on

Yottaa mongodb production in Mongodb Beijing 2011.3.3 conference.

Yottaa mongodb production in Mongodb Beijing 2011.3.3 conference.

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,065
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
22
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. MongoDB In Production:
    YottaaPractice
    XiangJun Wu
    System Engineer
    xwu@yottaa.com
    Yottaa Inc.
    2 Canal Park 5th Floor
    Cambridge MA 02141
    http://www.yottaa.com
  • 2. Overview
    • About Yottaa
    • 3. Engineering challenges
    • 4. System Architecture
    • 5. Collection Design
    • 6. Production environment
    • 7. Lesson Learned
    • 8. QA
    2
  • 9. What is Yottaa?
    3
  • 10. We Monitor More Sites Than Anyone Else
    4
  • 11. Demo
    We are recruiting!
    http://www.yottaa.com/about#jobs
    5
  • 12. Engineering Challenges
    • We collect lots of data
    • 13. 27,000+ URLs monitored
    • 14. ~300 samples per URL per day
    • 15. Some samples are >1mb (firebug)
    • 16. Missing a sample isn’t a big deal
    • 17. Collect over 10 kinds of metrics: DNS lookup, time to display, time to interactive, firebug, Yslowand so forth
    • 18. We try to make everything real time
    • 19. No batch jobs, everything displayed as it happens
    • 20. “Check Now” button runs tests on demand
    6
  • 21. Engineering Challenges
    • Small engineering team
    • 22. Started with team of 2
    • 23. Must be Agile
    • 24. We didn’t know exactly what features we’d need
    • 25. Requirements change daily
    • 26. Limited operations budget
    • 27. No full time operations staff
    • 28. 100% in the cloud: EC2, voxel, linode, rackspaceand so forth cloud provider
    7
  • 29. Sharding!
    High Concurrency
    Scale-Out
    App Server
    Passenger
    Collection
    Nginx
    Mongos
    Reporting
    Easy as Rails!
    MongoD
    Data Source
    MongoD
    Load
    Balancer
    User
    MongoD
    8
  • 30. Database Architecture
    Primary data store is broken into 5 part
    • Users - user related data.
    • 31. Web metrics - store DNS lookup, http connection, firebug etc. Web metrics data with different scales: daily, monthly. The purpose is to speed up data report from frontend. Raw data for query in the detailed.
    • 32. Alerts - monitor if some website/URL has performance downgrade.
    • 33. Summary - store the most frequently read URL information. also, used a message queue for worker to fetch URL access task.
    • 34. URL optimization logic - store optimization switch: enable CDN, enable compression, CSS minify and so forth.
    9
  • 35. Database Architecture
    MongoDBhas other usage cases:
    • System metrics - cpu/memory/network
    • 36. Application metrics - cache hit, process speed, health
    • 37. All log information - use logstash
    (http://code.google.com/p/logstash/)
    to feed and store log for different components inMongoDB. Search log via Rails. Plan to applySinatra interface for both log feed and query.
    10
  • 38. Database Architecture
    11
  • 39. { url: ‘www.google.com’,
    location: “Beijing”
    connect: 23,
    first_byte: 123,
    last_byte: 245,
    timestamp: 1234 }
    URL
    Location
    Connect
    First Byte
    Last Byte
    Timestamp
    { url: ‘www.google.com’,
    location: “Shanghai”
    connect: 23,
    first_byte: 123,
    last_byte: 245,
    timestamp: 2345 }
    Thinking in rows
    12
  • 40. URL
    Location
    Connect
    First Byte
    Last Byte
    Timestamp
    Thinking in rows
    What was the average connect time for google on friday?
    From Beijing?
    From Shanghai?
    Between 1AM-2AM?
    13
  • 41. Up to 100’s of samples per URL per day!!
    URL
    Location
    Connect
    First Byte
    Last Byte
    Timestamp
    An “average” chart had to hit 3000 rows
    30 days average query range
    Thinking in rows
    Day 1
    AVG
    Result
    Day 2
    AVG
    Day 3
    AVG
    14
  • 42. URL
    www.google.com
    Last Byte
    Sum
    2312
    SFO
    Sum
    1200
    Sum
    1112
    Thinking in Documents
    This document contains all data for www.google.com
    collected during 9/20/2010
    This tells us the average value for this metric for this url / time period
    Average value from Beijing
    Average value from Shanghai
    15
  • 43. URL
    Day
    <data>
    More efficient charts
    1 Document per URL per Day
    Day 1
    AVG
    Result
    Day 2
    Average chart hits 30 documents.
    AVG
    100x fewer
    Day 3
    AVG
    30 days == 30 documents
    16
  • 44. Which document we’re updating
    Atomically update the document
    Update the aggregate value
    Update the location specific value
    Create the document if it doesn’t already exist
    Storing a sample
    db.metrics.dailies.update(
    { url: ‘www.google.com’,
    day: new Date(2010,9,2)},
    { ‘$inc’: {
    ‘connect.sum’:1234,
    ‘connect.count’:1,
    ‘connect.bj.sum’:1234,
    ‘connect.bj.count’:1 } },
    true // upsert
    );
    17
  • 45. Putting it together
    Atomically update the daily data
    1
    { url: ‘www.google.com’,
    location: “Beijing”
    connect: 23,
    first_byte: 123,
    last_byte: 245,
    timestamp: 1234 }
    Atomically update the weekly data
    2
    Atomically update the monthly data
    3
    18
  • 46. Mongodb In Production
    • EC2 based large server, 2CPU, 8GB memory
    • 47. 4 MongoDBserver in 3 DB cluster
    • 48. Master/Slave setup in same datacenter
    • 49. One master and one slave for core database
    • 50. Backup the entire database everyday
    • 51. Restore the entire data to newMongoDBserver for data integrity.
    • 52. SaveMongoDBlog for slow query/ops analysis
    • 53. After 120 days, we have > 500GB of data
    • 54. Adding about 5gb / day today
    • 55. 101 read/s, 70.96 write/s
    • 56. Global lock rate 34.9%
    19
  • 57. Production: Sharding
    Write load evenly
    distributed
    Shard 1
    Collection Server
    Shard 2
    Shard by URL
    Reporting Server
    Shard 3
    Most reads hit a single shard
    Shard 4
    • Scale out architecture
    Mongo auto - shardingallows us to “just add servers” at rails & db tiers. Right now, no sharding is used.
    20
  • 58. Production:Monitor
    • Apply restful API to sendmongostatmetrics to own monitor system, we can watchMongoDBperformance in real time.
    21
  • 59. Lesson Learned
    • Consider collection sharding from first day
    • 60. Preallocateoplog before startingMongDBif you are using ext3/ext2 file system;ext4/xfs has better performance and don’t need to take care on oplog.
    • 61. Review all slow query and add proper index in staging env.

    • Be careful to add index in production; Try to add indexes in background or ‘off’ time.
    • 62. Avoid slow write operation or hold lock too long
    • 63. Watch MongoDBlogs after new deployment
    22
  • 64. We Are Hiring!
    Mongodb,Ruby ,Web and Java talents
    http://www.yottaa.com/about#jobs
    Thank you for viewing
    23

×