Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015


Published on

Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

Data Modeling and Scale Out

451 Research:
- Key challenges in the data landscape
- Evolution of distributed database environments

- Pros and cons of abstracting complex databases topology
- Top strategies of distributed data modeling
- Advanced data modeling and “what-if” simulations with
- ScaleBase Analysis Genie
- Scaling real apps – From need to deployment

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Data Modeling and Scale Out - ScaleBase + 451-Group webinar 30.4.2015

  1. 1. Data Modeling and Scale Out Jason Stamper, 451 Research Vladi Vexler and Paul Campaniello, ScaleBase
  2. 2. 2 Agenda Data Modeling and Scale Out 1. 451 Research • Key challenges in the data landscape • Evolution of distributed database environments 2. ScaleBase • Pros and cons of abstracting complex databases topology • Top strategies of distributed data modeling • Advanced data modeling and “what-if” simulations with Analysis Genie • Scaling real apps – From need to deployment • Demo 3. Q & A (please type questions directly into the GoToWebinar side panel)
  3. 3. 3 Today’s Presenters Jason Stamper Analyst, Data Manage- ment and Analytics - 451 Research • Over 20 years of experience in IT • Formerly Editor of Computer Business Review & Technology Editor at The New Statesman Vladi Vexler Vice President, Tech. & Product Marketing - ScaleBase • Over 15 years experience in software development and product management • Author of patents in field of databases innovation, dynamic data caching and machine learning analytics Paul Campaniello Vice President, Worldwide Marketing - ScaleBase • Over 25 years of software marketing & sales experience • Held senior marketing and sales positions at Mendix, Lumigent, ESI, ComBrio, Savantis and Precise Software
  4. 4. 4 About 451 Research Founded in 2000 210+ employees, including over 100 analysts 1,000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers 10,000+ senior IT professionals in our research community Over 52 million data points each quarter Headquartered in New York with offices in Boston, San Francisco, Washington, London… Research & Data Advisory Services Events
  5. 5. 5 The Challenge Businesses and their users are facing what one might call a perfect storm – decision-makers need insight faster than ever, and yet IT is struggling to avoid becoming a bottleneck.
  6. 6. 6 The Facts Speak for Themselves… Recent survey by trade magazine Computer Business Review: 98% (of 200 UK CIOs) admit “significant gap” between what business expects and what IT can deliver.
  7. 7. 7 So What Does the Business Want? Speed Information, not data Flexibility Ease-of-use Mobility New ways of working Self-service Scale Collaboration
  8. 8. 8 What Causes IT to Become a Bottleneck? Governance Control Security Budget Legacy Staff
  9. 9. 9 What Have We Learned So Far? • So far, the emergence of so-called ‘hot’ data platform and analytics technologies have not solved the IT information bottleneck. • Hadoop isn’t going to save the world (and neither is NoSQL). • The ability to analyze large data sets, in real- or near real-time, is only set to grow in the era of the Internet of Things. • IT is still critical, but it needs to enable the business to help itself. The question is how to achieve the right blend of usability, value-for-money and scalability.
  10. 10. 10 A Word or Two on Hadoop Adoption 0 2000 4000 6000 8000 2013 2012DW and DBMS Unstructured file Virtualized server/OS Backup Archive Other Big data/Hadoop Average total storage capacity (TBs), and total storage footprint by workload illustrate the low level of adoption today
  11. 11. 11 451 Research’s View of the ‘Total Data Approach’
  12. 12. 12 What is Driving the Change? Developers Agile REST JSON Schemaless Schema-on-read Flexible Applications Web Social Mobile Always-on Interactive Local Architecture Cloud Scalable Elastic Virtual Distributed Flexible New applications require distributed architecture Distributed architecture encourages new development approaches New development approaches demand new architecture Distributed architecture enables new applications New app requirements demand new development approaches New dev approaches enable new lightweight apps
  13. 13. 13 The Database Challenge – The traditional relational database has been stretched beyond its normal capacity limits by the needs of high-volume, highly distributed or highly complex applications. – There are workarounds – such as DIY sharding – but manual, homegrown efforts can result in database administrators being stretched beyond their available capacity in terms of managing complexity. – Scalability – Performance – Relaxed consistency Increased willingness to look – Agility for emerging alternatives – Intricacy – Necessity
  14. 14. 14 Scalability, and Other Challenges • As usage of MySQL and MariaDB has grown, so has the usage of applications that depend on MySQL and MariaDB: – Games; Social; Customer Facing; Web; Business apps like Ad Networks; • This has highlighted a number of challenges – Scalability of master-slave architecture – Performance and predictability at scale – Lower latency; greater throughput; richer apps – User expectations rising – Manageability of increasing database/app sprawl • External factors driving greater complexity: – Distributed computing architectures – Proliferation of cloud and elasticity requirements – Geo-distributed application requirements – Viral success means growth can come very quickly
  15. 15. 15 Conclusions • The success of MySQL and MariaDB has led to complications in terms of scalability concerns • Distributed computing, proliferation of cloud, and geo- distributed applications are adding to the complexity • Manual sharding techniques transfer the strain from the database to the database administrator • MySQL – and MySQL administrators – has/have never been under so much strain • Database scalability software enables users to move beyond the limitations and complexity of DIY sharding; precisely how data is managed with a distributed database in the cloud or on premise is key.
  16. 16. Scale Out Designs
  17. 17. 17 About ScaleBase Distributed Database Management System Architected for the Cloud Simple. Reliable. Powerful.
  18. 18. 18 Quick Scale Out Medium scale needs Multiple database replicas performing load balancing with read/write splitting Designs of Distributed MySQL Environments Massive Scale Out High scale needs Complete distributed database environment, with policy-based data sharding/distribution
  19. 19. 19 Quick Scale-Out Read/Write Splitting and Continuous Availability Application Redirection (ip/port) MySQL Replicas MySQL Master R R R R/W
  20. 20. 20 Massive Scale-Out 0 1 2 etc. Master Replicas Master Replicas Master Replicas Shards:
  21. 21. 21 The Right Solution for You Depends on Your Goals • Scale (mostly) reads • Scale (mostly) writes • Performance of reads – Affected by joins and big tables scans of big tables • Performance of writes – Affected by IO r/wr, CPU and table indexes (a growing overhead) • Locks • CPU/IO/ RAM issues • Load peaks • Data growth • Geo-distribution, special data distribution needs
  22. 22. Pros and Cons of Abstracting Complex Database Topology
  23. 23. 23 Pros of Abstracting Complex Database Topology • Development Agility - Accelerates your innovation speed • Simplifies application code • Reduces maintenance costs and simplifies it • Operations Efficiency – Zero downtime for applications • Reduces operation costs • Better monitoring, analytics, HA, scale, elasticity, etc.
  24. 24. 24 Cons of Abstracting Complex Database Topology • Additional technology component may increase complexity • Additional layer to monitor and manage • Additional machines to monitor and manage (possible increased opex) • Less control on application code level (transparent)
  25. 25. 25 Scale Out Methodologies Comparison
  26. 26. Characteristics & Modeling in a Distributed Database System
  27. 27. 27 Characteristics of Distributed Table Types • MASTER – On master shard (0) only Site settings, Admin data tables • GLOBAL – Full copy on all shards Lookups, Frequently joined tables, Slow growing tables • DISTRIBUTED-ROOT – Distribution based on a key column User.Id • DISTRIBUTED-CASCADED (child) – Based on parent row User_Photos, User_Photos_Likes – depend on Users Shards: 0 1 2 3 Full table Full table Full table Full table Full table ¼ table ¼ table ¼ table ¼ table
  28. 28. 28 Characteristics of Distributed Queries • ONE-DB – 1 shard, 1 node. Most optimal. 1) Any call when data known to be in one shard (Distributed/Master) 2) Call to Global table (load balance) • ALL-DB – All shards, 1 node. 1) AGREGATED READs (like map-reduce) 2) DML (writes) on Global tables 3) DDL (create, drop, alter schema) • FULL-DB – All shards, all nodes. Session calls (USE, SET) • CROSS-DB – #n shards, 1 node. Least optimal, but critical Cross-shard conflict resolution. Note: Not all sharding platforms support all distributed query types.
  29. 29. 29 Why Data Modeling is Important? • DATA and LOAD – Efficient distribution of: – DATA - all / main tables and data – READS – WRITES • QUERIES – Handle ALL-DB Queries (Map-reduce concept) – Minimize (but support!) CROSS-DB Queries – higher performance and scale • OPTIMIZE DEVELOPMENT with SQL ANALYTICS – Insight into the real database usage
  30. 30. 30 Data Relationships Can be Extremely Complex Usually, scale out is applied to growing-mature apps. How do you define an optimal data distribution policy?
  31. 31. Analysis Genie: MySQL Visual Analysis & Optimal Distribution Policy Configuration
  32. 32. 32 ScaleBase Analysis Genie • A tool enabling MySQL visual analysis and building an optimal data distribution policy; Designed for DBAs, Architects & Dev. Managers • Two step-process: – Analysis Assistant – An agent captures app/DB information, including SQL traffic and database metrics – Obfuscates, summarizes and packages the App-DB data – Analysis Genie – a SaaS application, receives the AA package and presents the visual analysis and details the policy configuration Analysis Assistant Analysis Genie
  33. 33. 33 ScaleBase Analysis Genie • Advanced analytics – Schemas, data & queries – Semantic structure analysis – Usage, Load and Scale analytics • Data Modeling and Scale-out planning – Customized for the most complex applications – Auto identification of optimal data distribution policy – Complete policy control • Quality assurance – Review before production • Simulation of results – “What-if” analysis
  34. 34. 34 Relationship Identification Mapping includes: • Schemas structures • Tables & columns names matching • Queries parsing and identification of joined tables and columns • Statistics on every object size and access
  35. 35. 35 Analyzing Relationships: From Chaos to Order Understanding and mapping complex relationships
  36. 36. ScaleBase Genie Demo
  37. 37. 37 MySQL Visual Analysis Demo • Visual analysis • Distribution policy identification and configuration • Scale out load via data sharding (massive scale out) ScaleBase Enterprise Analysis Genie
  38. 38. Summary
  39. 39. 39 Reading Plus Who: • Online education company Problem: • Busy season (back-to-school) was approaching and they needed a solution that could be quickly implemented, while guaranteeing uptime • With increasing growth, they needed to implement a scale out solution quickly Alternatives Considered: • A clustering technology, which proved to be infeasible due to schema complexity and a lengthy re-architecture requirement Solution: • Used visual analysis to determine best scale out plan • ScaleBase Lite for instant scale out and continuous availability • 35 Tomcat application servers were connected to 3 ScaleBase controllers • ScaleBase performed automated read/write splitting and load balancing
  40. 40. 40 Next Gen SaaS ERP Company Who: • Inventory management ecommerce company • Hosted on Rackspace (ScaleBase Partner) Problem: • Largest available hardware could not support workload Alternatives Considered: • Initially went with a “black box” solution, encountering many issues Solution: • Scaled out a single MySQL instance to 8 clustered shards • On-demand growth – current workload over 20,000 TPS – Plan to double footprint in next quarter – Support all production customers during Black Friday & Cyber Monday
  41. 41. 41 Scale out to unlimited users Continuous availability Dynamic workload optimization Fast and simple deployment Easily scale out a single MySQL instance Optimized for the Cloud Reduces time-to-market No changes needed to app or database Database usage analytics Intelligent load balancing Centralized data management ScaleBase Distributed Database Management System
  42. 42. 42 Products and Editions Community Limited by Deployment Startup Free for Qualified Candidates Enterprise Massive Scale Out Also available on: Lite Quick Scale Out Analysis Genie Database Performance Analytics
  43. 43. 43 How Can I Learn More? Use visual analysis to plan your scale out strategy Download the Analysis Genie: Read the 451 report about ScaleBase (& the DB market) Download Jason’s Report (authored last week) whitepapers
  44. 44. Questions? Contact Info: Paul Campaniello Vladi Vexler Resources: (617) 630.2800