Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How QBerg scaled to store data longer, query it faster

185 views

Published on

The continuous increase in terms of services and countries to which QBerg delivers its services requires an ever-increasing load of resources. During the last year QBerg has reached a critical point, storing so much transactional data that standard relational databases were unable to meet the SLAs, or support the features, required by customers. As an example, they had to cap web analytics to running on a maximum of four months of history. The introduction of MariaDB ColumnStore, flanked by existing MariaDB Server databases, not only will allow them to store multiple years’ worth of historical data for analytics – it decreased overall processing time by one order of magnitude right off the bat. The move to a unified platform was incremental, using MariaDB MaxScale as both a router and a replicator. QBerg is now able to replicate full InnoDB schemas to MariaDB ColumnStore and incrementally update big tables without impacting the performance of ongoing transactions.

Published in: Software
  • Be the first to comment

  • Be the first to like this

How QBerg scaled to store data longer, query it faster

  1. 1. How QBerg scaled to store data longer, query it faster Openworks 2019
  2. 2. QBerg: the company ▪ QBerg is a market research institute ▪ QBerg deals with consumer goods’ price intelligence in Italy, Europe and Latam. ▪ What we do: - Collect price & presence of products into stores, flyers, e-commerce sites and newsletters; - Manage the data collected with automatic and human activity; - Deliver aggregated data or raw-data format to our customers in many ways (portal with analysis and research functions, Spredsheets, alert e- mails, PPTx, csv, etc)
  3. 3. QPoint ▪ QBerg has lanched the new innovative App in early February 2019: https://vimeo.com/channels/qpointeng/316057717
  4. 4. Data figures ▪ Data Collection: ▪ Web: 1M observations/day ▪ Flyer: 500K observations/month ▪ Store: 500K observations/month ▪ Common Data: ▪ #Products: 2M ▪ #Stores: 120K ▪ MariaDB Master (TX): ▪ #Schemas: 175 (master, datamarts) ▪ #Tables: 11,756 ▪ #Space: 650 GB
  5. 5. Master common schemas ▪ Common data (3GB) ▪ Store observations (3GB) ▪ Flyer observations (14GB) ▪ Web observations (70GB) ▪ User logs/actions (18GB) ▪ Third party catalogues (3GB) ▪ User segmentations (3GB) TOTAL 114 GB (Master schema InnoDB tables)
  6. 6. «One-day» schemas (Datamarts) ▪ Every night batch procedures produce several «one-day» databases (datamarts). These databases are used by users’ frontend and backend procedures that process and produce outputs in several formats. ▪ A datamart is defined by: ▪ Type: store, web, flyer ▪ Time period: last 2 years, last 6 monts, last 36 weeks, etc… ▪ Countries or regions: Italy, Spain, Colombia, etc ▪ Product Families: Flat TV, Washing machines, Bakery and pastries, etc. ▪ Current procedures used a massive quantity of: ▪ CREATE TABLE <DMs> SELECT FROM <MASTER DB> ▪ INSERT INTO <DMs> SELECT FROM <MASTER DB>
  7. 7. DB timeline mainly activities
  8. 8. Production architecture t0 MariaDB Server (secondary) MariaDB Server (primary) Applications Queries Replication Backups
  9. 9. Issues ▪ General issues: ▪ Crawler queue was very heavy (200 concurrencies) ▪ Having OLTP and OLAP operations on the same db machine is not a good idea… ▪ Web datamarts ▪ The creation with ETL CREATE SELECT was very slow ▪ The customer queries were slow ▪ The amount of periods (historical data time span) were too little
  10. 10. Targets ▪ Make customer queries faster ▪ Uncouple OLTP and OLAP operations ▪ Increase datamarts periods (from 4 to 24 months on web prices)
  11. 11. Solution phase 1 ▪ Introduced MariaDB AX using INNODB and COLUMSTORE: ▪ INNODB Engine to manage master schemas ▪ Column Store Engine to manage store and web datamart schemas ▪ Datamart schemas are produced with the current procedure and copied from TX to AX with cpimport ▪ Introduced MariaDB Maxscale: ▪ Routing query to TX (master / slave) or AX, based on schema used by query (using regex) ▪ Duplicates DDL (Data Definition Language) statements on MariaDB AX
  12. 12. MariaDB MaxScale Production phase 1 Applications MariaDB Server (secondary) MariaDB Server (primary) MariaDB Server (UM) MariaDB MaxScale Storage (PM) ColumnStore (web/store dm) InnoDB (master) Backups Reads (current data) Writes (all data), + Reads (historical data) Replication cpimport Writes (all data) Reads (current data)
  13. 13. Replicate Table On-the-fly ▪ When merging data between TX and AX is needed, it’s possible to copy data from TX to AX using a simple script like this: mysql -h $DBSRC -q -e "$QUERY;" -N temp | cpimport -n1 -s 't' $DBDST $TABLEDST ▪ Note: ▪ To be ran on AX (UM) server. ▪ The destination table must exist in advance
  14. 14. Replicate schema ▪ It’s possible to replicate an entire schema using the script seen in the former slide, from every table of the source schema. ▪ It could be necessary to change datatypes: ▪ ENUM is not supported in CS -> CHAR could be good ▪ TIMESTAMP is not supported in CS -> DATETIME could be good ▪ MEDIUMINT is not supported in CS -> BIGINT could be good ▪ BINARY is not supported in CS -> BIGINT could be good ▪ Note: ▪ 246 tables imported in 1,136 secs (~19’) ▪ 43M rows table imported in 400 secs (6’40")
  15. 15. Master-slave delay ▪ Maxscale implements a policy to route query to slave only if the replication delay is under a threshold (configurable i.e. 1s) ▪ Maxscale polls the slave delay every xx seconds (configurable i.e 0.5s) ▪ If you have a classic master-detail interface (a master list and the details for the currently selected item), there are several solutions to retrieve a list with the last inserted record: ▪ Insert a sleep delay waiting slave updates into application; ▪ Introduce statement to forcefully route the query to the master (for example <space>SELECT); ▪ Exclude readings from slave. ▪ QBerg have a lot of PHP code written in more than 10 years. At the moment we choose the last option. We’ll use the query to slave for «SELECT only» when we’ll have completed the migration to new application architecture.
  16. 16. MariaDB MaxScale Production phase 2 Applications MariaDB Server (secondary) MariaDB Server (primary) MariaDB Server (UM) MariaDB MaxScale Storage (PM) ColumnStore (web/store dm) InnoDB (master) Backups Writes (all data), Reads (historical data) Replication cpimport Writes (all data), Reads (current data)
  17. 17. MariaDB MaxScale Staging MariaDB Server (primary) MariaDB Server (UM + PM) MariaDB MaxScale Storage (PM) ColumnStore (web/store dm) InnoDB (master) Writes (all), Reads (historical) cpimport Writes (all), Reads (current) Single server
  18. 18. A Team job ▪ 21 support requests in 5 months ▪ 8 different support engineers working to help ▪ Average time on big issues resolution : 4.5 days ▪ Found bug in maxscale 2.2.13 (MXS-2103) immediately resolved with a custom fix by Marko Mäkelä
  19. 19. THANKS

×