2014 05-07-fr - add dev series - session 6 - deploying your application-2

1,587 views
1,519 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,587
On SlideShare
0
From Embeds
0
Number of Embeds
170
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Basic explanation2 or more nodes form the setQuorum
  • Initialize -> ElectionPrimary + data replication from primary to secondaryHeartbeat every 2 seconds, timeout 10 seconds
  • Primary down/network failureAutomatic election of new primary if majority existsFailover usually takes a couple of seconds. Depending on your application code and configuration, this can be seamless/transparent.
  • New primary electedReplication established from new primary
  • Down node comes upRejoins setsRecovery and then secondary
  • Note that replication doesn’t always need to pull from the primary. Will pull from secondary if it is faster (less ping time).
  • PrimaryData memberSecondaryHot standbyArbitersVoting member
  • PriorityFloating point number between 0..1000Highest member that is up to date wins Up to date == within 10 seconds of primaryIf a higher priority member catches up, it will force election and win Slave DelayLags behind master by configurable time delay Automatically hidden from clientsProtects against operator errorsFat fingeringApplication corrupts data
  • ConsistencyWrite preferencesRead preferences
  • Using 'someDCs' so that in the event of an outage, at least a majority of the DCs would receive the change. This favors availability over durability.
  • Indexes should be contained in working set.
  • From mainframes, to RAC Oracle servers... People solved problems by adding more resources to a single machine.
  • Large scale operation can be combined with high performance on commodity hardware through horizontal scalingBuild - Document oriented database maps perfectly to object oriented languagesScale - MongoDB presents clear path to scalability that isn't ops intensive - Provides same interface for sharded cluster as single instance
  • _id could be unique across shards if used as shard key.we could only guarantee uniqueness of (any) attributes if the keys are used as shard keys with unique attribute equals true
  • Cardinality – Can your data be broken down enough?Query Isolation - query targeting to a specific shardReliability – shard outagesA good shard key can:Optimize routingMinimize (unnecessary) trafficAllow best scaling
  • Don’t use this setup in production!Only one Config server (No Fault Tolerance)Shard not in a replica set (Low Availability)Only one mongos and shard (No Performance Improvement)Useful for development or demonstrating configuration mechanics
  • MongoDB 2.2 and later only need <host> and <port> for one member of the replica set
  • This can be skipped for the intro talk, but might be good to include if you’re doing the combined sharding talk. Totally optional, you don’t really have enough time to do this topic justice but it might be worth a mention.
  • The mongos does not have to load the whole set into memory since each shard sorts locally. The mongos can just getMore from the shards as needed and incrementally return the results to the client.
  • 2014 05-07-fr - add dev series - session 6 - deploying your application-2

    1. 1. Tugdual Grall (@tgrall) Alain Hélaïli (@AlainHelaili) #MongoDBBasics @MongoDB Construire une application avec MongoDB Déploiement de l’application
    2. 2. 2 • Résumé des épisodes précédents • Replication • Sharding Agenda
    3. 3. 3 • Virtual Genius Bar – Utilisez la fenêtre de chat – Tug & Alain dispo pendant, et après… • MUGs à Paris, Toulouse, Bordeaux, Rennes, Lyon • Groupes LinkedIn « MongoDB France » et « MongoDB » sur Viadeo Q & A @tgrall, tug@mongodb.com - @AlainHelaili, alain.helaili@mongodb.cm
    4. 4. Résumé des épisodes précédents…
    5. 5. 5 • Agrégation de données… – Map Reduce – Hadoop – Rapports Pré-Agrégés – Aggregation Framework • Tuning avec Explain • Calcul à la volée ou calcul/stocke • Geospatial • Text Search Résumé
    6. 6. Replication
    7. 7. Why Replication? • How many have faced node failures? • How many have been woken up from sleep to do a fail-over(s)? • How many have experienced issues due to network latency? • Different uses for data – Normal processing – Simple analytics
    8. 8. Why Replication? • Replication is designed for – High Availability (HA) – Disaster Recovery (DR) • Not designed for scaling reads – You can but there are drawbacks: eventual consistency, etc. – Use sharding for scaling!
    9. 9. Replica Set – Creation
    10. 10. Replica Set – Initialize
    11. 11. Replica Set – Failure
    12. 12. Replica Set – Failover
    13. 13. Replica Set – Recovery
    14. 14. Replica Set – Recovered
    15. 15. Replica Set Roles & Configuration
    16. 16. Replica Set Roles Example with 2 data nodes + 1 arbiter
    17. 17. > conf = { // 5 data nodes _id : "mySet", members : [ {_id : 0, host : "A”, priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C”}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ] } > rs.initiate(conf) Configuration Options
    18. 18. Developing with Replica Sets
    19. 19. Strong Consistency
    20. 20. Delayed / Eventual Consistency
    21. 21. Write Concerns • Network acknowledgement (w = 0) • Wait for return info/error (w = 1) • Wait for journal sync (j = 1) • Wait for replication (w >=2)
    22. 22. Tagging • Control where data is written to, and read from • Each member can have one or more tags – tags: {dc: "ny"} – tags: {dc: "ny",
 subnet: "192.168",
 rack: "row3rk7"} • Replica set defines rules for write concerns • Rules can change without changing app code
    23. 23. { _id : "mySet", members : [ {_id : 0, host : "A", tags : {"dc": "ny"}}, {_id : 1, host : "B", tags : {"dc": "ny"}}, {_id : 2, host : "C", tags : {"dc": "sf"}}, {_id : 3, host : "D", tags : {"dc": "sf"}}, {_id : 4, host : "E", tags : {"dc": "cloud"}}], settings : { getLastErrorModes : { allDCs : {"dc" : 3}, someDCs : {"dc" : 2}} } } > db.blogs.insert({...}) > db.runCommand({getLastError : 1, w : "someDCs"}) Tagging Example
    24. 24. Read Preference Modes • 5 modes – primary (only) - Default – primaryPreferred – secondary – secondaryPreferred – Nearest When more than one node is possible, closest node is used for reads (all modes but primary)
    25. 25. Tagged Read Preference • Custom read preferences • Control where you read from by (node) tags – E.g. { "disk": "ssd", "use": "reporting" } • Use in conjunction with standard read preferences – Except primary
    26. 26. Sharding
    27. 27. Read/Write Throughput Exceeds I/O
    28. 28. Working Set Exceeds Physical Memory
    29. 29. Vertical Scalability (Scale Up)
    30. 30. Horizontal Scalability (Scale Out)
    31. 31. Partitioning • User defines shard key • Shard key defines range of data • Key space is like points on a line • Range is a segment of that line (chunk), smaller than 64MB • Chunks are migrated from one shard to another to maintain a balanced state
    32. 32. Shard Key • Shard key is immutable • Shard key values are immutable • Shard key must be indexed • Shard key limited to 512 bytes in size • Shard key used to route queries – Choose a field commonly used in queries • Only shard key can be unique across shards
    33. 33. Shard Key Considerations • Cardinality • Write Distribution • Query Isolation • Reliability • Index Locality
    34. 34. Initially 1 chunk Default max chunk size: 64mb MongoDB automatically splits & migrates chunks when max reached Data Distribution
    35. 35. Queries routed to specific shards MongoDB balances cluster MongoDB migrates data to new nodes Routing and Balancing
    36. 36. Partitioning - ∞ + ∞ shard 2 shard 3
    37. 37. Partitioning - ∞ + ∞ - ∞ { x : 1}, { x : 3} …. { x : 99} + ∞ shard 2 shard 3 shard 2 shard 3
    38. 38. Partitioning - ∞ + ∞ - ∞ { x : 1}, { x : 3} …. { x : 99} + ∞ - ∞ { x : 1} …. { x : 55} { x : 56} …. { x : 110} + ∞ shard 2 shard 3 shard 2 shard 3 shard 2 shard 3
    39. 39. Partitioning - ∞ + ∞ - ∞ { x : 1}, { x : 3} …. { x : 99} + ∞ - ∞ { x : 1} …. { x : 55} { x : 56} …. { x : 110} + ∞ shard 2 shard 3 shard 2 shard 3 shard 2 shard 3
    40. 40. Partitioning - ∞ + ∞ - ∞ { x : 1}, { x : 3} …. { x : 99} + ∞ - ∞ { x : 1} …. { x : 55} { x : 56} …. { x : 110} + ∞ shard 2 shard 3 shard 2 shard 3 shard 2 shard 3
    41. 41. MongoDB Auto-Sharding • Minimal effort required – Same interface as single mongod • Two steps – Enable Sharding for a database – Shard collection within database
    42. 42. Architecture
    43. 43. What is a Shard? • Shard is a node of the cluster • Shard can be a single mongod or a replica set
    44. 44. Meta Data Storage • Config Server – Stores cluster chunk ranges and locations – Can have only 1 or 3 (production must have 3) – Not a replica set
    45. 45. Routing and Managing Data • Mongos – Acts as a router / balancer – No local data (persists to config database) – Can have 1 or many
    46. 46. Sharding infrastructure
    47. 47. Configuration
    48. 48. Example Cluster
    49. 49. mongod --configsvr Starts a configuration server on the default port (27019) Starting the Configuration Server
    50. 50. mongos --configdb <hostname>:27019 For 3 configuration servers: mongos --configdb<host1>:<port1>,<host2>:<port2>,<host3>:<port3> Thisis always how to start a new mongos, even if the cluster is already running Start the mongos Router
    51. 51. mongod --shardsvr Starts a mongod with the default shard port (27018) Shard is not yet connected to the rest of the cluster Shard may have already been running in production Start the shard database
    52. 52. On mongos: – sh.addShard(„<host>:27018‟) Adding a replica set: – sh.addShard(„<rsname>/<seedlist>‟) Add the Shard
    53. 53. db.runCommand({ listshards:1 }) { "shards" : [{"_id”:"shard0000”,"host”:”<hostname>:27018”} ], "ok" : 1 } Verify that the shard was added
    54. 54. Enabling Sharding • Enable sharding on a database sh.enableSharding(“<dbname>”) • Shard a collection with the given key sh.shardCollection(“<dbname>.people”,{“country”:1}) • Use a compound shard key to prevent duplicates sh.shardCollection(“<dbname>.cars”,{“year”:1,”uniqueid”:1})
    55. 55. Tag Aware Sharding • Tag aware sharding allows you to control the distribution of your data • Tag a range of shard keys – sh.addTagRange(<collection>,<min>,<max>,<tag>) • Tag a shard – sh.addShardTag(<shard>,<tag>)
    56. 56. Routing Requests
    57. 57. Cluster Request Routing • Targeted Queries • Scatter Gather Queries • Scatter Gather Queries with Sort
    58. 58. Cluster Request Routing: Targeted Query
    59. 59. Routable request received
    60. 60. Request routed to appropriate shard
    61. 61. Shard returns results
    62. 62. Mongos returns results to client
    63. 63. Cluster Request Routing: Non-Targeted Query
    64. 64. Non-Targeted Request Received
    65. 65. Request sent to all shards
    66. 66. Shards return results to mongos
    67. 67. Mongos returns results to client
    68. 68. Cluster Request Routing: Non-Targeted Query with Sort
    69. 69. Non-Targeted request with sort received
    70. 70. Request sent to all shards
    71. 71. Query and sort performed locally
    72. 72. Shards return results to mongos
    73. 73. Mongos merges sorted results
    74. 74. Mongos returns results to client
    75. 75. Résumé
    76. 76. 76 • Replica set pour haute disponibilité • Sharding pour montée en charge • Write concern • Clé de sharding Résumé
    77. 77. 77 – Backup – Reprise sur incident Prochaine session – 3 Juin

    ×