SlideShare a Scribd company logo
1 of 19
Life After Sharding:
Managing a Complex Data Cloud
     Boris Livshutz, AppDynamics
Why are you here?

      • You already shard, plan to shard, or need to shard your data
      • You’re considering a NoSQL solution for production




2   Copyright © AppDynamics. All rights reserved.
About AppDynamics

• Distributed application monitoring for enterprise applications
• Data layer part of any enterprise app, monitored by us too
• Collecting massive amounts of metrics from our customers, store it
  all on MySQL




3   Copyright © AppDynamics. All rights reserved.
About Me

• 2 decades of experience
  building DB kernels,
  OLAP, server side
  development
• 4 years at AppDynamics
  scaling our server and
  helping our largest
  customers




4   Copyright © 2010 AppDynamics. All rights reserved.
What is a Data Cloud?
      • Distinct set of data distributed across multiple nodes
      • Multiple nodes work together to manage data
      • Common examples:
                  • Sharded RDBMS
                  • NoSQL

      • Data nodes can be part of a rented cloud or on-premise




5   Copyright © AppDynamics. All rights reserved.
Before: The Monolithic DB

        • Monitoring Tools
                    • Cacti, Nagios, MySQL Enterprise, Enterprise Manager, Foglight
                    • Both open source and commercial systems,
                    • Alerting: Emails to NOC and DBAs, regarding one database in trouble

        • Management
                    • Query one database: SQL shell, Toad, etc.
                    • Backup: Hot backup tools for each database
                    • Schema upgrades: Connect to one database and run upgrade script




6   Copyright © AppDynamics. All rights reserved.
Why We Need a Data Cloud

    • The limits of vertical scale
               • One Dell box – 256GB RAM, 32 cores, 36 disks in raid-60
               • MySQL wasn’t able to use more then 12-16 cores
               • 8 TB of data hard to backup, copy.
               • Alter table almost impossible on largest tables
               • No more growth option, no 256 core CPU!
               • Hardware very expensive ($50K), cannot duplicate in test env
               • Replication cannot keep up

    • Advantages to horizontal scale
               • Commodity hardware, easy to buy and expand
                           • $4k per box, 8 core, 48GB Ram, 5 disks

               • MySQL is able to fully leverage the hardware, easier to tune


7   Copyright © AppDynamics. All rights reserved.
Choosing a Data Cloud

      • Shard existing RDBMS
                  • Change application logic to be shard-aware (lots of code changes!)
                  • Use a proxy (Scalebase, DbShards, Spock, HiveDB)

      • NoSQL
                  • You are brave!
                  • Give up on ACID, decades of stability, etc
                  • Gain failover, auto-resharding, etc OOTB




8   Copyright © AppDynamics. All rights reserved.
Dev Complete - Now What ??

      • Can you just throw it over the wall to Ops?
      • Almost no off the shelf tools to monitor and manage the data
        cloud
      • DIY: only choice is to do it yourself. Sorry 




9   Copyright © AppDynamics. All rights reserved.
What did we do?

     • We had one MySQL that kept growing and growing
     • Sharded MySql into 7 replica sets, 2 replicas each.
     • We couldn’t release it until Ops was ready to keep it up 24x7
     • Built our own “glue” to manage and monitor this beast.
     • We ate our own dog food
     • We partnered and didn’t re-invent the wheel.




10   Copyright © AppDynamics. All rights reserved.
Managing the Data Cloud

• ScaleBase
      • Central point of management for data cloud
      • The only source of truth: keeps track of each replica,
        location, naming, heartbeat, load




11   Copyright © AppDynamics. All rights reserved.
Instant access to data in the Data Cloud

• Access DB data through the
  Scalebase LoadBalancer
• Can set mode to send both query
  and DML to all replicas or just a
  subset or one
• We send sql to specific replica
  without knowing its location
     • The only location we connect to is the
       Scalebase LoadBalancer

• Other 3rd party tools can also
  connect to the Scalebase
  LoadBalancer without knowing
  about our Data Cloud

12   Copyright © AppDynamics. All rights reserved.
Measure performance across your data cloud




13   Copyright © AppDynamics. All rights reserved.
Measure performance – Replica deep dive




14   Copyright © AppDynamics. All rights reserved.
Unified Alerting

     • System wide alerts all come from single source - Scalebase
             • Alerts go to PagerDuty to reach the right people on duty

     • Alerts clearly identify replica set and replica node
             • Allows quick resolutions by pinpointing problems in the data cloud

     • NOC Response: SQL connection to troubleshoot via Scalebase
             • Only need to know the replica and replica set from alert
               and can immediately investigate with SQL queries

     • NOC Response: Use monitoring tool for
       deep dive investigation into the replica




15    Copyright © AppDynamics. All rights reserved.
Synchronized maintenance tasks

     • Backups
             • Synchronized
             • Backup is just a “job” in Scalebase engine, Scalebase runs it on every
               replica
             • Scalebase tracks the status of each job execution on each replica
     • Schema upgrades: upgrade program doesn't need to know about where
       things are in the data cloud
             • Upgrader just connects to Scalebase and upgrade sql will be sent to
               the whole data cloud automatically
     • Configuration Changes
             • global changes can be done in sql by just connecting to Scalebase and
               executing same change on ALL replicas.
             • One sql can be sent to all Replicas by Scalebase. Any errors will be
               logged

16    Copyright © AppDynamics. All rights reserved.
Conclusions

 • Lessons Learned
          • Development, test and Ops needs to work together.
          • Educate more of the team
          • Most problems that arise are operational, not code bugs
          • The right vendors really make it easier then
            doing everything yourself
 • Future
          • Automate failback with hot spare
          • Try new technologies like XtraDB Cluster.




17   Copyright © AppDynamics. All rights reserved.
Vendors




18   Copyright © AppDynamics. All rights reserved.
Questions?

More Related Content

What's hot

January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
Yahoo Developer Network
 

What's hot (20)

MySQL as a Document Store
MySQL as a Document StoreMySQL as a Document Store
MySQL as a Document Store
 
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
 
How jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxHow jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStax
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3
 
Real-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackReal-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stack
 
Scalablity and benchmark in mysql performance
Scalablity and benchmark in mysql performanceScalablity and benchmark in mysql performance
Scalablity and benchmark in mysql performance
 
Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2
 
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document StoreConnector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
 
keyvi the key value index @ Cliqz
keyvi the key value index @ Cliqzkeyvi the key value index @ Cliqz
keyvi the key value index @ Cliqz
 
Backup and Recovery in MySQL Cluster
Backup and Recovery in MySQL ClusterBackup and Recovery in MySQL Cluster
Backup and Recovery in MySQL Cluster
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
 
Reporting from the Trenches: Intuit & Cassandra
Reporting from the Trenches: Intuit & CassandraReporting from the Trenches: Intuit & Cassandra
Reporting from the Trenches: Intuit & Cassandra
 
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
 
Cassandra Development Nirvana
Cassandra Development Nirvana Cassandra Development Nirvana
Cassandra Development Nirvana
 
MOUG17: DB Security; Secure your Data
MOUG17: DB Security; Secure your DataMOUG17: DB Security; Secure your Data
MOUG17: DB Security; Secure your Data
 
DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?DataStax C*ollege Credit: What and Why NoSQL?
DataStax C*ollege Credit: What and Why NoSQL?
 
2019 - OOW - Database Migration Methods from On-Premise to Cloud
2019 - OOW - Database Migration Methods from On-Premise to Cloud2019 - OOW - Database Migration Methods from On-Premise to Cloud
2019 - OOW - Database Migration Methods from On-Premise to Cloud
 
Review Oracle OpenWorld 2015 - Overview, Main themes, Announcements and Future
Review Oracle OpenWorld 2015 - Overview, Main themes, Announcements and FutureReview Oracle OpenWorld 2015 - Overview, Main themes, Announcements and Future
Review Oracle OpenWorld 2015 - Overview, Main themes, Announcements and Future
 

Viewers also liked

张平:JavaScript引擎实现
张平:JavaScript引擎实现张平:JavaScript引擎实现
张平:JavaScript引擎实现
taobao.com
 
完颜:移动网站的兼容性探索
完颜:移动网站的兼容性探索完颜:移动网站的兼容性探索
完颜:移动网站的兼容性探索
taobao.com
 
Html5环保小游戏
Html5环保小游戏Html5环保小游戏
Html5环保小游戏
taobao.com
 
Kind editor设计思路
Kind editor设计思路Kind editor设计思路
Kind editor设计思路
taobao.com
 
阅读类Web应用前端技术探索
阅读类Web应用前端技术探索阅读类Web应用前端技术探索
阅读类Web应用前端技术探索
taobao.com
 
高力:19楼现有前端架构
高力:19楼现有前端架构高力:19楼现有前端架构
高力:19楼现有前端架构
taobao.com
 
百度前端性能监控与优化实践
百度前端性能监控与优化实践百度前端性能监控与优化实践
百度前端性能监控与优化实践
taobao.com
 
前端Mvc探讨及实践
前端Mvc探讨及实践前端Mvc探讨及实践
前端Mvc探讨及实践
taobao.com
 
编辑器设计U editor
编辑器设计U editor编辑器设计U editor
编辑器设计U editor
taobao.com
 

Viewers also liked (9)

张平:JavaScript引擎实现
张平:JavaScript引擎实现张平:JavaScript引擎实现
张平:JavaScript引擎实现
 
完颜:移动网站的兼容性探索
完颜:移动网站的兼容性探索完颜:移动网站的兼容性探索
完颜:移动网站的兼容性探索
 
Html5环保小游戏
Html5环保小游戏Html5环保小游戏
Html5环保小游戏
 
Kind editor设计思路
Kind editor设计思路Kind editor设计思路
Kind editor设计思路
 
阅读类Web应用前端技术探索
阅读类Web应用前端技术探索阅读类Web应用前端技术探索
阅读类Web应用前端技术探索
 
高力:19楼现有前端架构
高力:19楼现有前端架构高力:19楼现有前端架构
高力:19楼现有前端架构
 
百度前端性能监控与优化实践
百度前端性能监控与优化实践百度前端性能监控与优化实践
百度前端性能监控与优化实践
 
前端Mvc探讨及实践
前端Mvc探讨及实践前端Mvc探讨及实践
前端Mvc探讨及实践
 
编辑器设计U editor
编辑器设计U editor编辑器设计U editor
编辑器设计U editor
 

Similar to Life After Sharding: Monitoring and Management of a Complex Data Cloud

Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
elliando dias
 
NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5
UniFabric
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
Qian Lin
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
Scott Miao
 

Similar to Life After Sharding: Monitoring and Management of a Complex Data Cloud (20)

Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
 
Moving Windows Applications to the Cloud
Moving Windows Applications to the CloudMoving Windows Applications to the Cloud
Moving Windows Applications to the Cloud
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
 
Embracing Database Diversity: The New Oracle / MySQL DBA - UKOUG
Embracing Database Diversity: The New Oracle / MySQL DBA -   UKOUGEmbracing Database Diversity: The New Oracle / MySQL DBA -   UKOUG
Embracing Database Diversity: The New Oracle / MySQL DBA - UKOUG
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
 
My sql tutorial-oscon-2012
My sql tutorial-oscon-2012My sql tutorial-oscon-2012
My sql tutorial-oscon-2012
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
MySQL Options in OpenStack
MySQL Options in OpenStackMySQL Options in OpenStack
MySQL Options in OpenStack
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5
 
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph Replication
 
YARN
YARNYARN
YARN
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
OpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackOpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStack
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
CON5451_Brydon-OOW2014_Brydon_CON5451 (1).pptx
CON5451_Brydon-OOW2014_Brydon_CON5451 (1).pptxCON5451_Brydon-OOW2014_Brydon_CON5451 (1).pptx
CON5451_Brydon-OOW2014_Brydon_CON5451 (1).pptx
 

More from OSCON Byrum

Big Data for each one of us
Big Data for each one of usBig Data for each one of us
Big Data for each one of us
OSCON Byrum
 
Declarative web data visualization using ClojureScript
Declarative web data visualization using ClojureScriptDeclarative web data visualization using ClojureScript
Declarative web data visualization using ClojureScript
OSCON Byrum
 

More from OSCON Byrum (20)

OSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom FifieldOSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
 
Protecting Open Innovation with the Defensive Patent License
Protecting Open Innovation with the Defensive Patent LicenseProtecting Open Innovation with the Defensive Patent License
Protecting Open Innovation with the Defensive Patent License
 
Using Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open DataUsing Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open Data
 
Finite State Machines - Why the fear?
Finite State Machines - Why the fear?Finite State Machines - Why the fear?
Finite State Machines - Why the fear?
 
Open Source Automotive Development
Open Source Automotive DevelopmentOpen Source Automotive Development
Open Source Automotive Development
 
How we built our community using Github - Uri Cohen
How we built our community using Github - Uri CohenHow we built our community using Github - Uri Cohen
How we built our community using Github - Uri Cohen
 
The Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in PythonThe Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in Python
 
Distributed Coordination with Python
Distributed Coordination with PythonDistributed Coordination with Python
Distributed Coordination with Python
 
An overview of open source in East Asia (China, Japan, Korea)
An overview of open source in East Asia (China, Japan, Korea)An overview of open source in East Asia (China, Japan, Korea)
An overview of open source in East Asia (China, Japan, Korea)
 
Oscon 2013 Jesse Anderson
Oscon 2013 Jesse AndersonOscon 2013 Jesse Anderson
Oscon 2013 Jesse Anderson
 
US Patriot Act OSCON2012 David Mertz
US Patriot Act OSCON2012 David MertzUS Patriot Act OSCON2012 David Mertz
US Patriot Act OSCON2012 David Mertz
 
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
 
Big Data for each one of us
Big Data for each one of usBig Data for each one of us
Big Data for each one of us
 
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
 
Declarative web data visualization using ClojureScript
Declarative web data visualization using ClojureScriptDeclarative web data visualization using ClojureScript
Declarative web data visualization using ClojureScript
 
Using and Building Open Source in Google Corporate Engineering - Justin McWil...
Using and Building Open Source in Google Corporate Engineering - Justin McWil...Using and Building Open Source in Google Corporate Engineering - Justin McWil...
Using and Building Open Source in Google Corporate Engineering - Justin McWil...
 
A Look at the Network: Searching for Truth in Distributed Applications
A Look at the Network: Searching for Truth in Distributed ApplicationsA Look at the Network: Searching for Truth in Distributed Applications
A Look at the Network: Searching for Truth in Distributed Applications
 
Faster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypesFaster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypes
 
Comparing open source private cloud platforms
Comparing open source private cloud platformsComparing open source private cloud platforms
Comparing open source private cloud platforms
 
State of the Art Web Mapping with Open Source
State of the Art Web Mapping with Open SourceState of the Art Web Mapping with Open Source
State of the Art Web Mapping with Open Source
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Life After Sharding: Monitoring and Management of a Complex Data Cloud

  • 1. Life After Sharding: Managing a Complex Data Cloud Boris Livshutz, AppDynamics
  • 2. Why are you here? • You already shard, plan to shard, or need to shard your data • You’re considering a NoSQL solution for production 2 Copyright © AppDynamics. All rights reserved.
  • 3. About AppDynamics • Distributed application monitoring for enterprise applications • Data layer part of any enterprise app, monitored by us too • Collecting massive amounts of metrics from our customers, store it all on MySQL 3 Copyright © AppDynamics. All rights reserved.
  • 4. About Me • 2 decades of experience building DB kernels, OLAP, server side development • 4 years at AppDynamics scaling our server and helping our largest customers 4 Copyright © 2010 AppDynamics. All rights reserved.
  • 5. What is a Data Cloud? • Distinct set of data distributed across multiple nodes • Multiple nodes work together to manage data • Common examples: • Sharded RDBMS • NoSQL • Data nodes can be part of a rented cloud or on-premise 5 Copyright © AppDynamics. All rights reserved.
  • 6. Before: The Monolithic DB • Monitoring Tools • Cacti, Nagios, MySQL Enterprise, Enterprise Manager, Foglight • Both open source and commercial systems, • Alerting: Emails to NOC and DBAs, regarding one database in trouble • Management • Query one database: SQL shell, Toad, etc. • Backup: Hot backup tools for each database • Schema upgrades: Connect to one database and run upgrade script 6 Copyright © AppDynamics. All rights reserved.
  • 7. Why We Need a Data Cloud • The limits of vertical scale • One Dell box – 256GB RAM, 32 cores, 36 disks in raid-60 • MySQL wasn’t able to use more then 12-16 cores • 8 TB of data hard to backup, copy. • Alter table almost impossible on largest tables • No more growth option, no 256 core CPU! • Hardware very expensive ($50K), cannot duplicate in test env • Replication cannot keep up • Advantages to horizontal scale • Commodity hardware, easy to buy and expand • $4k per box, 8 core, 48GB Ram, 5 disks • MySQL is able to fully leverage the hardware, easier to tune 7 Copyright © AppDynamics. All rights reserved.
  • 8. Choosing a Data Cloud • Shard existing RDBMS • Change application logic to be shard-aware (lots of code changes!) • Use a proxy (Scalebase, DbShards, Spock, HiveDB) • NoSQL • You are brave! • Give up on ACID, decades of stability, etc • Gain failover, auto-resharding, etc OOTB 8 Copyright © AppDynamics. All rights reserved.
  • 9. Dev Complete - Now What ?? • Can you just throw it over the wall to Ops? • Almost no off the shelf tools to monitor and manage the data cloud • DIY: only choice is to do it yourself. Sorry  9 Copyright © AppDynamics. All rights reserved.
  • 10. What did we do? • We had one MySQL that kept growing and growing • Sharded MySql into 7 replica sets, 2 replicas each. • We couldn’t release it until Ops was ready to keep it up 24x7 • Built our own “glue” to manage and monitor this beast. • We ate our own dog food • We partnered and didn’t re-invent the wheel. 10 Copyright © AppDynamics. All rights reserved.
  • 11. Managing the Data Cloud • ScaleBase • Central point of management for data cloud • The only source of truth: keeps track of each replica, location, naming, heartbeat, load 11 Copyright © AppDynamics. All rights reserved.
  • 12. Instant access to data in the Data Cloud • Access DB data through the Scalebase LoadBalancer • Can set mode to send both query and DML to all replicas or just a subset or one • We send sql to specific replica without knowing its location • The only location we connect to is the Scalebase LoadBalancer • Other 3rd party tools can also connect to the Scalebase LoadBalancer without knowing about our Data Cloud 12 Copyright © AppDynamics. All rights reserved.
  • 13. Measure performance across your data cloud 13 Copyright © AppDynamics. All rights reserved.
  • 14. Measure performance – Replica deep dive 14 Copyright © AppDynamics. All rights reserved.
  • 15. Unified Alerting • System wide alerts all come from single source - Scalebase • Alerts go to PagerDuty to reach the right people on duty • Alerts clearly identify replica set and replica node • Allows quick resolutions by pinpointing problems in the data cloud • NOC Response: SQL connection to troubleshoot via Scalebase • Only need to know the replica and replica set from alert and can immediately investigate with SQL queries • NOC Response: Use monitoring tool for deep dive investigation into the replica 15 Copyright © AppDynamics. All rights reserved.
  • 16. Synchronized maintenance tasks • Backups • Synchronized • Backup is just a “job” in Scalebase engine, Scalebase runs it on every replica • Scalebase tracks the status of each job execution on each replica • Schema upgrades: upgrade program doesn't need to know about where things are in the data cloud • Upgrader just connects to Scalebase and upgrade sql will be sent to the whole data cloud automatically • Configuration Changes • global changes can be done in sql by just connecting to Scalebase and executing same change on ALL replicas. • One sql can be sent to all Replicas by Scalebase. Any errors will be logged 16 Copyright © AppDynamics. All rights reserved.
  • 17. Conclusions • Lessons Learned • Development, test and Ops needs to work together. • Educate more of the team • Most problems that arise are operational, not code bugs • The right vendors really make it easier then doing everything yourself • Future • Automate failback with hot spare • Try new technologies like XtraDB Cluster. 17 Copyright © AppDynamics. All rights reserved.
  • 18. Vendors 18 Copyright © AppDynamics. All rights reserved.

Editor's Notes

  1. Objective of Slide Thank you and introductions. Time check. Script Thank you for your time today. I look forward to an interactive discussion today on your application performance needs and the chance to present the AppDynamics solution to you. I had this meeting booked from x – y am/pm. Are you still available until then?
  2. Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? <Read what they told you> Anything else you ’d like to add?
  3. Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? <Read what they told you> Anything else you ’d like to add?
  4. Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? <Read what they told you> Anything else you ’d like to add?
  5. Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? <Read what they told you> Anything else you ’d like to add?
  6. Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? <Read what they told you> Anything else you ’d like to add?
  7. Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? <Read what they told you> Anything else you ’d like to add?
  8. Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? <Read what they told you> Anything else you ’d like to add?
  9. Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? <Read what they told you> Anything else you ’d like to add?
  10. Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? <Read what they told you> Anything else you ’d like to add?
  11. Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? <Read what they told you> Anything else you ’d like to add?
  12. Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? <Read what they told you> Anything else you ’d like to add?
  13. Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? <Read what they told you> Anything else you ’d like to add?
  14. Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? <Read what they told you> Anything else you ’d like to add?
  15. Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? <Read what they told you> Anything else you ’d like to add?
  16. Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? <Read what they told you> Anything else you ’d like to add?
  17. Objective of Slide -------------------- Show them you ’ve been listening or that you’ve done some homework Get them to confirm whether we ’ve “written down” the information correctly Give them the opportunity to expand and say more… Script ------- From our previous conversation, I captured what I heard you say in that conversation about your application environment, your current challenges and what you may be looking for. Can you let me know if I captured these correctly? <Read what they told you> Anything else you ’d like to add?