Life After Sharding: Monitoring and Management of a Complex Data Cloud
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Life After Sharding: Monitoring and Management of a Complex Data Cloud

on

  • 2,254 views

Slides from Boris Livshutz' presentation at OSCON 2012.

Slides from Boris Livshutz' presentation at OSCON 2012.

Statistics

Views

Total Views
2,254
Views on SlideShare
2,189
Embed Views
65

Actions

Likes
1
Downloads
25
Comments
0

4 Embeds 65

http://livshutz.com 42
http://lanyrd.com 18
https://twimg0-a.akamaihd.net 3
https://si0.twimg.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Objective of Slide Thank you and introductions. Time check. Script Thank you for your time today. I look forward to an interactive discussion today on your application performance needs and the chance to present the AppDynamics solution to you. I had this meeting booked from x – y am/pm. Are you still available until then?
  • Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? Anything else you ’d like to add?
  • Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? Anything else you ’d like to add?
  • Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? Anything else you ’d like to add?
  • Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? Anything else you ’d like to add?
  • Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? Anything else you ’d like to add?
  • Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? Anything else you ’d like to add?
  • Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? Anything else you ’d like to add?
  • Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? Anything else you ’d like to add?
  • Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? Anything else you ’d like to add?
  • Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? Anything else you ’d like to add?
  • Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? Anything else you ’d like to add?
  • Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? Anything else you ’d like to add?
  • Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? Anything else you ’d like to add?
  • Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? Anything else you ’d like to add?
  • Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? Anything else you ’d like to add?
  • Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? Anything else you ’d like to add?

Life After Sharding: Monitoring and Management of a Complex Data Cloud Presentation Transcript

  • 1. Life After Sharding:Managing a Complex Data Cloud Boris Livshutz, AppDynamics
  • 2. Why are you here? • You already shard, plan to shard, or need to shard your data • You’re considering a NoSQL solution for production2 Copyright © AppDynamics. All rights reserved.
  • 3. About AppDynamics• Distributed application monitoring for enterprise applications• Data layer part of any enterprise app, monitored by us too• Collecting massive amounts of metrics from our customers, store it all on MySQL3 Copyright © AppDynamics. All rights reserved.
  • 4. About Me• 2 decades of experience building DB kernels, OLAP, server side development• 4 years at AppDynamics scaling our server and helping our largest customers4 Copyright © 2010 AppDynamics. All rights reserved.
  • 5. What is a Data Cloud? • Distinct set of data distributed across multiple nodes • Multiple nodes work together to manage data • Common examples: • Sharded RDBMS • NoSQL • Data nodes can be part of a rented cloud or on-premise5 Copyright © AppDynamics. All rights reserved.
  • 6. Before: The Monolithic DB • Monitoring Tools • Cacti, Nagios, MySQL Enterprise, Enterprise Manager, Foglight • Both open source and commercial systems, • Alerting: Emails to NOC and DBAs, regarding one database in trouble • Management • Query one database: SQL shell, Toad, etc. • Backup: Hot backup tools for each database • Schema upgrades: Connect to one database and run upgrade script6 Copyright © AppDynamics. All rights reserved.
  • 7. Why We Need a Data Cloud • The limits of vertical scale • One Dell box – 256GB RAM, 32 cores, 36 disks in raid-60 • MySQL wasn’t able to use more then 12-16 cores • 8 TB of data hard to backup, copy. • Alter table almost impossible on largest tables • No more growth option, no 256 core CPU! • Hardware very expensive ($50K), cannot duplicate in test env • Replication cannot keep up • Advantages to horizontal scale • Commodity hardware, easy to buy and expand • $4k per box, 8 core, 48GB Ram, 5 disks • MySQL is able to fully leverage the hardware, easier to tune7 Copyright © AppDynamics. All rights reserved.
  • 8. Choosing a Data Cloud • Shard existing RDBMS • Change application logic to be shard-aware (lots of code changes!) • Use a proxy (Scalebase, DbShards, Spock, HiveDB) • NoSQL • You are brave! • Give up on ACID, decades of stability, etc • Gain failover, auto-resharding, etc OOTB8 Copyright © AppDynamics. All rights reserved.
  • 9. Dev Complete - Now What ?? • Can you just throw it over the wall to Ops? • Almost no off the shelf tools to monitor and manage the data cloud • DIY: only choice is to do it yourself. Sorry 9 Copyright © AppDynamics. All rights reserved.
  • 10. What did we do? • We had one MySQL that kept growing and growing • Sharded MySql into 7 replica sets, 2 replicas each. • We couldn’t release it until Ops was ready to keep it up 24x7 • Built our own “glue” to manage and monitor this beast. • We ate our own dog food • We partnered and didn’t re-invent the wheel.10 Copyright © AppDynamics. All rights reserved.
  • 11. Managing the Data Cloud• ScaleBase • Central point of management for data cloud • The only source of truth: keeps track of each replica, location, naming, heartbeat, load11 Copyright © AppDynamics. All rights reserved.
  • 12. Instant access to data in the Data Cloud• Access DB data through the Scalebase LoadBalancer• Can set mode to send both query and DML to all replicas or just a subset or one• We send sql to specific replica without knowing its location • The only location we connect to is the Scalebase LoadBalancer• Other 3rd party tools can also connect to the Scalebase LoadBalancer without knowing about our Data Cloud12 Copyright © AppDynamics. All rights reserved.
  • 13. Measure performance across your data cloud13 Copyright © AppDynamics. All rights reserved.
  • 14. Measure performance – Replica deep dive14 Copyright © AppDynamics. All rights reserved.
  • 15. Unified Alerting • System wide alerts all come from single source - Scalebase • Alerts go to PagerDuty to reach the right people on duty • Alerts clearly identify replica set and replica node • Allows quick resolutions by pinpointing problems in the data cloud • NOC Response: SQL connection to troubleshoot via Scalebase • Only need to know the replica and replica set from alert and can immediately investigate with SQL queries • NOC Response: Use monitoring tool for deep dive investigation into the replica15 Copyright © AppDynamics. All rights reserved.
  • 16. Synchronized maintenance tasks • Backups • Synchronized • Backup is just a “job” in Scalebase engine, Scalebase runs it on every replica • Scalebase tracks the status of each job execution on each replica • Schema upgrades: upgrade program doesnt need to know about where things are in the data cloud • Upgrader just connects to Scalebase and upgrade sql will be sent to the whole data cloud automatically • Configuration Changes • global changes can be done in sql by just connecting to Scalebase and executing same change on ALL replicas. • One sql can be sent to all Replicas by Scalebase. Any errors will be logged16 Copyright © AppDynamics. All rights reserved.
  • 17. Conclusions • Lessons Learned • Development, test and Ops needs to work together. • Educate more of the team • Most problems that arise are operational, not code bugs • The right vendors really make it easier then doing everything yourself • Future • Automate failback with hot spare • Try new technologies like XtraDB Cluster.17 Copyright © AppDynamics. All rights reserved.
  • 18. Vendors18 Copyright © AppDynamics. All rights reserved.
  • 19. Questions?