Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Auto Europe's ongoing journey with MariaDB and open source

181 views

Published on

Tom Girsch, Lead System Architect at Auto Europe, covers the use case that initially brought Auto Europe to MariaDB, as well as additional planned and ongoing projects. He goes on to discuss Auto Europe’s implementation of MariaDB using clustering, traditional replication and MaxScale. Next, he covers some of the problems and pitfalls encountered along the way, as well as some suggestions to further improve the product.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Auto Europe's ongoing journey with MariaDB and open source

  1. 1. “How Do You Solve A Problem Like Maria?” AutoEurope’s Ongoing Journey with Open Source Thomas J. Girsch Lead System Architect, AutoEurope
  2. 2. What We’ll Cover 1. About AutoEurope (AE), About Me 2. The AE Environment Before MariaDB 3. Our Original Use Case for MariaDB: Content Management 4. Additional Use Cases: Microservices, Session Tracking 5. Wins, Challenges, Solutions and Open Issues 6. Our MariaDB Wish List
  3. 3. About AutoEurope ● We are the global leader in international car rental services and have been helping travelers find the perfect rental vehicle for their trips around the world for over 60 years ● We have roughly 600 employees worldwide, including approximately 250 call center employees, working in offices and call centers spread across three continents ● Through our car rental partners, we offer car rentals in more than 20,000 locations in over 180 countries worldwide ● Visit https://www.autoeurope.com/about-us/ to learn more
  4. 4. About Me ● Nearly three decades of IT experience ● That hurt to type; I’m getting old! ● Certified in IBM Informix and MongoDB ● Both are a bit dated and need updating ● Proficient in Unix/Linux administration ● Specialize in high availability, DR and OLTP performance ● NOT A PROGRAMMER! ● Outside of work, volunteer as Vice Chair of the Rail Passengers Association
  5. 5. AutoEurope AM (Ante Maria) ● IBM Informix for most critical systems ● Quoting and reservations ● Web session tracking ● Legacy EDS system ● Ancient SAP ● Microsoft SQL Server for many ancillary systems ● Content management (FarCry) ● Additional web session tracking ● Supplier interfaces ● MongoDB for application logging ● Also InfluxDB, REDIS and a little bit of very old MySQL
  6. 6. SQL Server at AE ● One single VM (NO REDUNDANCY!) ● If SQL Server goes down, our websites don’t respond ● Thankfully, this rarely happens ● If SQL Server gets bogged down, websites respond poorly ● This was happening a lot ● Why not add redundancy? Insufficient licensing ● Too expensive to add more ● Lacked in-house expertise ● Desire to reduce Microsoft / Windows footprint
  7. 7. IBM Informix at AE ● Heavily invested, with lots of licenses ● Production: dedicated physical servers in two data centers ● We’ll cover in more detail (with pictures!) in a bit ● Dev/QA/Test: small VMs ● Small “Shopping Engine” Vms ● Why not do everything here? ● Very expensive to add licenses ● Most third-party tools don’t support it ● Overkill for many applications ● Use the right tool for the job
  8. 8. Web Content Management at AE ● Manages content for nearly 70 websites – Text of various pages – Locations of images and other graphic/support files – Translations to multiple other languages ● Resiliency is key! (No web = no bookings = no money) – Must be redundant within the data center – Must be redundant across data centers ● Workload distribution is a nice-to-have
  9. 9. Enter MariaDB ● Web team migrating to new content management platform, MuraCMS ● Initially started the migration on SQL Server, but we want to eliminate MS ● Migrating to IBM Informix not an option ● Mura supports two DB platforms – Microsoft SQL Server – MySQL ● MariaDB = MySQL – Oracle: MariaDB! ● But is MariaDB resilient enough?
  10. 10. MariaDB Professional Services ● AE had no in-house MySQL or MariaDB knowledge ● Needed to get up and running quickly ● Solution: MariaDB Contractor – Recommend topology to fit our needs – Do initial configuration of MariaDB – Cross-train AE DBAs on MariaDB setup and basic operation – Be available for future questions
  11. 11. Informix Replication ● “High-Availability” Data Replication, or HDR – One primary (required), called PRI – One “full” secondary (near-sync), called HDR – “Shared Disk” secondaries, called SDS – “Remote Standby Secondaries,” called RSS ● At AE we have 1xPRI, 1xHDR and 2xRSS (see next slide)
  12. 12. AE’s Informix Setup Portland (Main DC) Brunswick (DR DC) P1 - Primary P2 -Secondary B1 - DR B2 - DR (delay) ← Connection Managers
  13. 13. MariaDB Replication ● Traditional (“Master-Slave” or “Primary Target”) – Simple to set up and administer – One primary, one target to many targets – Failover is manual (auto-failover technically supported, but not yet recommended) ● Galera Cluster (Multi-Master*) – Load-balancing and write-anywhere* – Typically three MariaDB servers participate – No “failover” per se, because every server is the master – Requires very fast network connections among all participating nodes (usually 3)
  14. 14. Galera at AE Portland (Main DC) Brunswick (DR DC) “P1” Server (NOT USED) “P2” Server “B1” Server “B2” Server ← MaxScale VMs
  15. 15. MaxScale at AE ● Three MaxScale VMs – Load balancing – Redundancy ● Used ONLY to route connections – Not filtering or using firewall features – Slows things down a bit, because our apps are very “talky” – A redirect mode would be nice ● Initially configured in “ReadConnRoute” mode
  16. 16. Migrating from SQL Server to MariaDB ● MariaDB consultant set up SymmetricDS – Steep learning curve – Allowed us to run in parallel, but lots of holes – Still better than doing a lengthy outage w/manual migration – Encoding issues were a big problem
  17. 17. Trouble in Paradise, Volume 1 ● Some queries were very slow – Started by enabling query cache, which helped, but there are big downsides – Also enabled slow query log: VERY useful – Used ANALYZE FORMAT=JSON to see query paths and fix issues – Once the worst offenders were fixed, disabled the query cache ● Spoiler: MariaDB seems to have trouble recognizing (and ignoring) BAD indexes, and Mura indexes nearly every column
  18. 18. Trouble in Paradise, Volume 2 ● MariaDB servers would occasionally lock up (Very Bad) ● No web content = no web bookings = no money made ● MuraCMS is usually read-heavy but a “publish” operation is very write-heavy – Problem was caused by deadlocking in Galera Cluster – ReadConnRoute is bad with Galera, in less you can guarantee that multiple sessions won’t update the same table at the same time – Solution: Use “ReadWriteSplit” mode
  19. 19. Trouble in Paradise, Volume 3 ● MaxScale was not playing nicely with JDBC connection pools ● Mystery disconnects: “Could not find master” – The master hadn’t gone anywhere! – The errors being reported were misleading ● Could work around by increasing max_sescmd_history; but why? – Bug: MaxScale was not resetting when user context switched – Fixed in 2.3.4
  20. 20. How’s it Working Now? ● All in all, extremely well ● During peak times, over 600 concurrent user sessions ● Millions of queries per day ● Very few performance complaints – And when we get them, the problem is usually NOT the DB – long_query_time is set to 1 second (very aggressive), and we only see about a dozen slow queries per day ● No unscheduled outages in months
  21. 21. Use Case #2: Text Search ● Complex searches of location information ● Currently being migrated from SQL Server ● Uses FULLTEXT indexes ● Very aggressive: token_size = 2 (default is 4) ● Because of the very small word size, need to set up a “stopwords” table ● Coexists with Mura on our Galera cluster
  22. 22. Use Case #3: Session Booking Information ● Stores huge JSON documents – Web session history – Information needed to turn a quote into a booking ● Extremely write-heavy ● Use PAGE_COMPRESSED and compressed communication ● Highly transient data (12 hour maximum useful life) ● For this, we use traditional replication on separate VMs (not the Galera nodes) ● The JSON can contain sensitive data, so we use at-rest encryption and SSL ● Important to disable dedup and compression on the storage side
  23. 23. Use Case #4: MicroServices ● Small databases that can be highly distributed – Supplier information – Locations and hours – Reference data – Holy grail: Rates and availability, to serve quotes ● Containerized via Kubernetes ● Ideally, “master” would live on Galera Cluster and use traditional replication to fan out to many “slaves” around the world and in the cloud.
  24. 24. Big Wins ● Performance for most apps meets or exceeds what we achieved with SQL Server ● Using Galera with MaxScale allows rolling upgrades of MariaDB with minimal or no user-facing downtime ● Intra- and Inter-Data Center redundancy ● Fully-supported, still costs a fraction of what same configuration would cost with IBM Informix or Microsoft SQL Server ● Virtually unlimited growth potential, thanks to virtualization and the cloud
  25. 25. Remaining Challenges ● MaxScale adds an expensive layer – “Talky” apps + required additional network hop = higher latency – Redirect mode would be nice, but would bypass most MaxScale features ● User security is a challenge – By default, MaxScale obscures real host of origin for client; from MariaDB, client appears to just be coming in from the MaxScale VM – MaxScale cannot currently handle multiple authentication methods on the same IP/port (so PAM requires a separate listener from traditional MariaDB users or UNIX users) ● I/O performance is a potential issue (no easy way to intelligently split I/O loads)
  26. 26. Wish List ● MaxScale “redirect” mode (Didn’t I just say that? Yep, I did.) ● Easy way to require SSL for ALL users, rather than on a user-by-user basis ● Get session ID for other sessions ● Heterogeneous Replication! ● Non-blocking mariabackup with Galera ● Single-step restore ● More granularity on binary log rotation
  27. 27. This Could Be YOU!
  28. 28. THANK YOU! Tom Girsch tgirsch@autoeurope.com

×