Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to import 1 million SKUs in under 10 minutes

184 views

Published on

How to import 1 million SKUs in under 10 minutes

Published in: Technology
  • Be the first to comment

How to import 1 million SKUs in under 10 minutes

  1. 1. How to import 1 million SKUs in under 10 minutes 1/26
  2. 2. Building an import correctly is hard 2/26
  3. 3. 2008 it all started with DataFlow 3/26
  4. 4. DataFlow 4/26
  5. 5. DataFlow Stores CSV records into database 4/26
  6. 6. DataFlow Stores CSV records into database Imports product by product via AJAX call 4/26
  7. 7. DataFlow Stores CSV records into database Imports product by product via AJAX call Uses product model to save data 4/26
  8. 8. DataFlow Stores CSV records into database Imports product by product via AJAX call Uses product model to save data Closing the browser window stops the whole process 4/26
  9. 9. Speed 2-3 products per second 5/26
  10. 10. Speed 2-3 products per second ~ 20 minutes for 5k products 5/26
  11. 11. 2011 Import/Export saved us all 6/26
  12. 12. ImportExport 7/26
  13. 13. ImportExport Stored batches of data into the database during validation 7/26
  14. 14. ImportExport Stored batches of data into the database during validation Processes stored data in one HTTP request 7/26
  15. 15. ImportExport Stored batches of data into the database during validation Processes stored data in one HTTP request Validates product data without using product model 7/26
  16. 16. ImportExport Stored batches of data into the database during validation Processes stored data in one HTTP request Validates product data without using product model Uses multi-row inserts to populate tables 7/26
  17. 17. ImportExport Stored batches of data into the database during validation Processes stored data in one HTTP request Validates product data without using product model Uses multi-row inserts to populate tables Does not run indexers 7/26
  18. 18. Speed 41 product per second 8/26
  19. 19. Speed 41 product per second ~ 2 minutes for 5k products 8/26
  20. 20. But there are some drawbacks 9/26
  21. 21. High memory usage on large datasets But there are some drawbacks 9/26
  22. 22. High memory usage on large datasets Slow in generating primary keys for new products But there are some drawbacks 9/26
  23. 23. 2015 Magento 2.x Import Export 10/26
  24. 24. ImportExport M2 11/26
  25. 25. Same base functionality as in M1 ImportExport M2 11/26
  26. 26. Same base functionality as in M1 More complex file format to edit and parse ImportExport M2 11/26
  27. 27. Same base functionality as in M1 More complex file format to edit and parse Slower on complex product data ImportExport M2 11/26
  28. 28. Same base functionality as in M1 More complex file format to edit and parse Slower on complex product data Adds additional single statement inserts ImportExport M2 11/26
  29. 29. 2019 I got an idea and a project to implement it on 12/26
  30. 30. Separate Feeds 13/26
  31. 31. Separate Feeds Main entity (sku, type, set) 13/26
  32. 32. Separate Feeds Main entity (sku, type, set) Attributes (sku, attribute, store, value) 13/26
  33. 33. Separate Feeds Main entity (sku, type, set) Attributes (sku, attribute, store, value) Category (sku, category slug, position) 13/26
  34. 34. Separate Feeds Main entity (sku, type, set) Attributes (sku, attribute, store, value) Category (sku, category slug, position) Configurable Options (sku, attribute, label) 13/26
  35. 35. Separate Feeds Main entity (sku, type, set) Attributes (sku, attribute, store, value) Category (sku, category slug, position) Configurable Options (sku, attribute, label) Images (sku, image) 13/26
  36. 36. Separate Feeds Main entity (sku, type, set) Attributes (sku, attribute, store, value) Category (sku, category slug, position) Configurable Options (sku, attribute, label) Images (sku, image) ... 13/26
  37. 37. Lazy Entity Resolving 14/26
  38. 38. Lazy Entity Resolving Reduce memory requirements of the import 14/26
  39. 39. Lazy Entity Resolving Reduce memory requirements of the import Cleaner and more readable feed processing 14/26
  40. 40. Lazy Entity Resolving Reduce memory requirements of the import Cleaner and more readable feed processing Possibility of acquiring entity ids in batches automatically 14/26
  41. 41. Lazy Entity Resolving $resolver = $this->resolverFactory->createSingleValueResolver( 'catalog_product_entity', 'sku', 'entity_id' );   $insert = InsertOnDuplicate::create( 'catalog_product_entity_varchar', ['entity_id', 'attribute_id', 'store_id', 'value'] )->withResolver($resolver);   $insert ->withRow($resolver->unresolved('sku1'), 1, 0, 'some value') ->withRow($resolver->unresolved('sku2'), 1, 0, 'some value1') ->withRow($resolver->unresolved('sku3'), 1, 0, 'some value2'); 15/26
  42. 42. Configure table lookup information Lazy Entity Resolving $resolver = $this->resolverFactory->createSingleValueResolver( 'catalog_product_entity', 'sku', 'entity_id' );   $insert = InsertOnDuplicate::create( 'catalog_product_entity_varchar', ['entity_id', 'attribute_id', 'store_id', 'value'] )->withResolver($resolver);   $insert ->withRow($resolver->unresolved('sku1'), 1, 0, 'some value') ->withRow($resolver->unresolved('sku2'), 1, 0, 'some value1') ->withRow($resolver->unresolved('sku3'), 1, 0, 'some value2'); 15/26
  43. 43. Pass resolver into insert builder Lazy Entity Resolving $insert = InsertOnDuplicate::create( 'catalog_product_entity_varchar', ['entity_id', 'attribute_id', 'store_id', 'value'] )->withResolver($resolver); $resolver = $this->resolverFactory->createSingleValueResolver( 'catalog_product_entity', 'sku', 'entity_id' );     $insert ->withRow($resolver->unresolved('sku1'), 1, 0, 'some value') ->withRow($resolver->unresolved('sku2'), 1, 0, 'some value1') ->withRow($resolver->unresolved('sku3'), 1, 0, 'some value2'); 15/26
  44. 44. Use resolver to create identifier containers Lazy Entity Resolving ->withRow($resolver->unresolved('sku1'), 1, 0, 'some value') ->withRow($resolver->unresolved('sku2'), 1, 0, 'some value1') ->withRow($resolver->unresolved('sku3'), 1, 0, 'some value2'); $resolver = $this->resolverFactory->createSingleValueResolver( 'catalog_product_entity', 'sku', 'entity_id' );   $insert = InsertOnDuplicate::create( 'catalog_product_entity_varchar', ['entity_id', 'attribute_id', 'store_id', 'value'] )->withResolver($resolver);   $insert 15/26
  45. 45. Insert on duplicate will skip any unresolved entries Lazy Entity Resolving $resolver = $this->resolverFactory->createSingleValueResolver( 'catalog_product_entity', 'sku', 'entity_id' );   $insert = InsertOnDuplicate::create( 'catalog_product_entity_varchar', ['entity_id', 'attribute_id', 'store_id', 'value'] )->withResolver($resolver);   $insert ->withRow($resolver->unresolved('sku1'), 1, 0, 'some value') ->withRow($resolver->unresolved('sku2'), 1, 0, 'some value1') ->withRow($resolver->unresolved('sku3'), 1, 0, 'some value2'); 15/26
  46. 46. 16/26
  47. 47. Batch auto-increment generation START TRANSACTION;   INSERT INTO catalog_product_entity (sku) VALUES ('sku1'), ('sku2'), ('sku3'), ('sku4');   SELECT entity_id, sku FROM catalog_product_entity WHERE sku IN ('sku1', 'sku2', 'sku3', 'sku4');   ROLLBACK; 17/26
  48. 48. Start a transaction Batch auto-increment generation START TRANSACTION;   INSERT INTO catalog_product_entity (sku) VALUES ('sku1'), ('sku2'), ('sku3'), ('sku4');   SELECT entity_id, sku FROM catalog_product_entity WHERE sku IN ('sku1', 'sku2', 'sku3', 'sku4');   ROLLBACK; 17/26
  49. 49. Populate table with un-resolved keys Batch auto-increment generation   INSERT INTO catalog_product_entity (sku) VALUES ('sku1'), ('sku2'), ('sku3'), ('sku4');   START TRANSACTION; SELECT entity_id, sku FROM catalog_product_entity WHERE sku IN ('sku1', 'sku2', 'sku3', 'sku4');   ROLLBACK; 17/26
  50. 50. Retrieve new identifiers Batch auto-increment generation SELECT entity_id, sku FROM catalog_product_entity WHERE sku IN ('sku1', 'sku2', 'sku3', 'sku4');   START TRANSACTION;   INSERT INTO catalog_product_entity (sku) VALUES ('sku1'), ('sku2'), ('sku3'), ('sku4');   ROLLBACK; 17/26
  51. 51. Rollback transaction Batch auto-increment generation ROLLBACK; START TRANSACTION;   INSERT INTO catalog_product_entity (sku) VALUES ('sku1'), ('sku2'), ('sku3'), ('sku4');   SELECT entity_id, sku FROM catalog_product_entity WHERE sku IN ('sku1', 'sku2', 'sku3', 'sku4');   17/26
  52. 52. Prepared Statements 18/26
  53. 53. Compile query for constant batch size Prepared Statements 18/26
  54. 54. Compile query for constant batch size Send only data instead of generating new queries Prepared Statements 18/26
  55. 55. Compile query for constant batch size Send only data instead of generating new queries Reduces query processing on MySQL side by half Prepared Statements 18/26
  56. 56. Speed 450 products per second 19/26
  57. 57. Speed 450 products per second ~ 11 seconds for 5k products 19/26
  58. 58. But it was still not good enough 20/26
  59. 59. But it was still not good enough 45 minutes to import 1 million SKUs 20/26
  60. 60. Sure, because it's a sequential process... 21/26
  61. 61. So I made it asynchronous as PoC 22/26
  62. 62. Under the hood 23/26
  63. 63. Under the hood Each target tabletarget table receives a separate connection to MySQL 23/26
  64. 64. Under the hood Each target tabletarget table receives a separate connection to MySQL Identity resolver is attached to the source table connection 23/26
  65. 65. Under the hood Each target tabletarget table receives a separate connection to MySQL Identity resolver is attached to the source table connection Each feed is processed concurrently by using round robinround robin strategy 23/26
  66. 66. Under the hood Each target tabletarget table receives a separate connection to MySQL Identity resolver is attached to the source table connection Each feed is processed concurrently by using round robinround robin strategy During MySQL query execution PHP prepares the next batch 23/26
  67. 67. Speed 1,850 products per second 24/26
  68. 68. Speed 1,850 products per second ~ 9 minutes for 1m products 24/26
  69. 69. It is coming this fall as an open source tool for everyone! 25/26
  70. 70. Questions ivan@ecomdev.org IvanChepurnyi.GitHub.io 26/26

×