Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Giuliana Benedetti - Can Magento handle 1M products?

636 views

Published on

Your catalog is way too big, you e-commerce won’t work” Well..wrong!
We shared our approach to huge catalogs, and our experience in handling an e-commerce with 700K products.
Following this peculiar case, we decided to go further and we created a 1M products catalog to test Magento response to an environment with such a huge amount of data. We talked about Magento potentiality and analyzed the points of weakness that emerged during our developments.

  • Be the first to comment

Giuliana Benedetti - Can Magento handle 1M products?

  1. 1. About me • Project Manager @ Webformat • Magento and TYPO3 projects • Requirements analysis • Planning of development and support activities
  2. 2. What’s in the menu? • Huge catalog • What we did • What we are doing
  3. 3. Once upon a time.. • The project began as a migration from a proprietary platform to Magento 1 Community • Shoes and accessories E-commerce • We developed the integration between their management software, that was handling products anagraphic, warehouse and orders anagraphic • Integration with Amazon e Ebay
  4. 4. Products Database • The original products database counted around 150k products • Configurable products • On average, 10 simple products for each configurable • Virtual products
  5. 5. Continued Growth • In one year only we reached the amount of 700K products stored in Magento • 66k configurable products
  6. 6. Challenges faced Alignment between catalog and management software Updating warehouse Reindexing Generating images Server response time Backoffice operations Third parts modules integration Export of feed for Google shopping & Co. Marketplace synchronization Disk space
  7. 7. Updating the catalog (1/2) • Initially 150k products, this is what we planned: • Massive initial import • Frequent update during the day via webservice • When the catalog started growing, the data exchange volumes via webservice began unsustainable. The exchange procedure needed a redesign.
  8. 8. Updating the catalog (2/2) • Today we have700k products • Based on Magmi and CSV file exchange (product anagraphic) • Nighttime update – the DIFF • Exceptional whole catalog update • The client accepted that the new products will be published with a delay of 1 day
  9. 9. Warehouse update (1/2) • No warehouse fully dedicated to web • Shared with the offline shops • It’s not possible to update the warehouse nighttime only and use that stock during the day • Frequent updates
  10. 10. Warehouse update (2/2) • Every 15 minutes  update from management software by loading the DIFF • Only stock update • Via CSV file writing directly on database (Magmi)
  11. 11. Reindex (1/2) • The bigger the catalog, the slower the reindex • Initially, the reidex was lauched after each update (15 min) • After a while, the reindex started being too much time demanding: the update cycle was starting when the previous update reindex cycle was still running.
  12. 12. Reindex (2/2) • Solution: • All the reindexes have been disabled, except for the stock reindex • All reindexes are now performed after the nighttime import • Today a full reindex takes around 75 minutes and generates a heavy load on the database
  13. 13. Catalog_url_rewrite (1/2) • Magento 1 has a critical point with URL rewrite process: • All product URLs are rewritten, also simple products that are «Not visible individually» and exist only to be associated to a configurable. • With 700k products catalog, this meant: • Creating millions of rows in the catalog_url_rewrite table • An URL rewrite process that takes hours to be completed
  14. 14. Catalog_url_rewrite (2/2) • A patch has been installed, to avoid the simple and not visible individually products url generation • Module Dnd_Patchindexurl: https://www.magentocommerce.com/magento-connect/dn-d-patch- index-url-1.html • Now the reindex process takes around 20 minutes
  15. 15. Images generation (1/2) • One of the main problems that we had to face was the product thumbnails generation, done by Imagemagik • Every day hundreds of products are published  We verified that the frontend CPUs were often stressed because of Imagemagik process and the writing operations on database
  16. 16. Images generation (2/2) • We found a solution in generating the thumbnails during the massive import, so Imagemagik could work together with the import procedure • Nighttime, the images are generated and saved in a dedicated server, without interfering with user navigation • Today we have around 881K images saved
  17. 17. Server response time • With such a huge catalog, some categories hold even hundreds of products • The first loading time (if they are not cached) is indeed high • We activated caching on Redis and Varnish • Not enough, the first loading time was anyway too heavy
  18. 18. Solutions 1/2 • Moving the cache clearing process during the night • At 8 in the morning, the website navigation was starting to suffer • We planned a job to pre-cache all the critical pages • Minimized cache invalidation • Clear cache only for products for which the stock quantity was updated via WS
  19. 19. Solutions 2/2 • Client training to better handle the cache erasing • Minimized the number of filters in layered navigation • Each filter increases the reindex time and the pages combinations not cached
  20. 20. Backoffice operations • Initially all the catalog update activities were performed from Magento backoffice • Problems: • Frequent reindexes • Frequent cache updates • Server load (the backoffice product list filters are CPU demanding and they charge MySql) • Common operations were slown • Several BE users ended to be concurrent
  21. 21. Solutions • Initially a new backoffice server have been introduced • MySql load problem was not solved. Reindex re-caching as well. • We introduced a new process to handle the catalog, using an excel file • This improved the efficiency of who was managing the anagraphic data • Massive excel file import performed each 3 days via FTP • Categories still handled from backoffice
  22. 22. Third party modules integration • Critical point • Not all the modules found in the Marketplace are developed in an optimal way • They «simply» load the products collection without pagination • They execute nested query • There are cycles on collections that initialize all products unnecessarily • … • A big profiling and optimization work was needed
  23. 23. Feed export (Google Shopping & Co.) 1/2 • While the catalog was growing, the feed time export was encreasing as well • In the very beginning, the exports were handled by a Magento module
  24. 24. Feed export (Google Shopping & Co.) 2/2 • Solution steps: • The module have been replaced with ad-hoc procedures, with high level of optimization • The exportation jobs are executed on backoffice server during the night, to not load the frontend • It have been introduced a MySql slave as data source, to not load the master and the website as a consequence
  25. 25. Marketplace synch • We are using M2E Pro • Client side: EAN code full check • Tech side: handling the automatic synchronization process • An automatic full synchronization is too heavy. When synchronize? • What synchronize? • Magmi
  26. 26. Disk space (1/2) • Well, here we are: even if disk space is quite cheap, using too much of it it’s not convenient.. • Data exchange logs very heavy • Frequent data exchange and huge amount of data • Log files were growing fast • Log rotate was activated hourly • Log are archived after few days
  27. 27. Disk space (2/2) • High image quantity, continuously growing • Huge feed export • Huge CSV import files • … • Solutions applied: • Constant monitoring activated • Activated automatic procedures to clean log, old images, expired feed, etc.
  28. 28. Challenges to be faced Elasticsearch integration Growing catalog, until 1M products More sells, more page views Magento 2 migration
  29. 29. Elasticsearch • For two reasons: • Improve the search functionality offered to the client • Minimize the load produced by the Magento internal search engine • Critical issues to be faced: • Catalog index time • Only configurable products? • What about the sizes?
  30. 30. 1M products • Expected growth: in 1 year we’ll have 1M products • At the moment we are performing tests with fake products • We didn’t detect other critical aspects • At the moment, we had to develop some more data exchange and feed generation procedures optimization
  31. 31. More sells, more page views • Sessions are increasing  the number of not cached pages views is increasing • Pre – caching extension • Increasing Varnish cache TTL • Minimize products in categories and filters used • Sales are increasing  increasing also frequency of out-of stock products • To be evaluated: the impact of new reindex and re-caching politics on client
  32. 32. What if..? • We’re planning with the client a Magento 2 migration • We started our tests by migrating the actual Magento 1 environment (700K products) to a Magento 2 installation • We collected the results and still performing some other tests
  33. 33. HW specs All tests were run on a VirtualBox VM with Linux Ubuntu 16.04.1 LTS, 8 GB RAM, 1 x 2,60 GHz cpu Lamp configuration was featuring PHP version 5.6, Apache 2.4.18, MySQL 14.14 Migration was performed from Magento version 1.9.2.2 through 2.1.3
  34. 34. Magento 2 migration (1/4) • DB migration times: 1h 20‘ • BE performances: BE Operation Magento 1 with cache Magento 2 with cache Access to catalog almost 5' 7'' Access to product 3'' 10'' Access to categories 7'' 6'' Product searching 1'5'' 3''
  35. 35. Magento 2 migration (2/4) • FE performaces for catalog browsing: FE Operation Magento 1with cache Magento 2with cache Catalog browsing / categories 30'' 7''
  36. 36. Magento 2 migration – Reindex Times (3/4) M1 M2 Total: 2h 55‘’11’’ Total: 2h 53‘ 47’’
  37. 37. Magento 2 migration (4/4) • We had some issues with the Catalog Fullsearch reindex (Magento 2) • we had to apply a patch  https://github.com/magento/magento2/issues/5146 • Catalog Fullsearch reindex without patch takes around 2 hours with patch applied took around 1 hour, so the times are quite comparable 02:12:37 02:12:37
  38. 38. Catalog URL rewrite • M1 with Dnd_Patchindexurl module: 00:14:34 • M1 without Dnd_Patchindexurl module: 01:03:50 • M2: no catalog URL rewrite. URL Rewrite is handled at the product saving
  39. 39. Tools Xdebug New Relic AOE Profiler
  40. 40. Conclusions
  41. 41. Yes, we can! • It’s possible, but not without effort • Large initial analysis • Special attention to optimization processes • What about Magento 2?
  42. 42. Q & A • Giuliana Benedetti – giuliana.benedetti@webformat.com • WEBFORMAT srl - www.webformat.com

×