Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Multithreaded
XML Import
…for Magento

San Francisco Magento Meetup Group - October 23, 2013
Fabrizio
Branca
Lead System Developer at
E-Commerce:
Magento

CMS:
TYPO3

Global
Enterprise
Projects

Portals:
ZF, FLOW,…

High
Performance
/Scale

Mobile

Searchp...
Aoe_Import
github.com/AOEmedia/Aoe_Import

git clone --recursive …
Will Aoe_Import be the fastest
product importer around?

YES, of course!
Well, maybe…
Actually, Aoe_Import is only a
XML I...
for large XML files
XML! Not CSV.

full flexibility in
processor implementation

Aoe_Import
multi-thread support!
Subscrib...
memory

single product

Problem
Memory limit

time
Memory limit

time

memory

memory

Trivial Solution
Memory limit

time
Beat the memory Leak
by forking
Waiting for other
thread to terminate

Threading
overhead

Process
import
Forking?
In PHP?
$pid = pcntl_fork();
if ($pid) {
// parent process runs what is here
echo "parentn";
} else {
// child pr...
Threadi

github.com/AOEmedia/Threadi
Clean OOP interface for PHP to
forking and process management

Threadi
Batch Processor
Collect a bunch of imports …

…fork…

…and process them in
a child process.
No imports are processed in the main thread.
So there’s no memory leak happing here

Main thread
memory

Memory limit

tim...
Multi-threading? Sure!

Number of
threads
processed
in parallel

Number of items in a batch
Problems?
Database Connection
Database
connection
doesn’t like to
be cloned!
Mage::getSingleton('core/resource')
->getConn...
Problems?
Thread Safety
Problems?
Thread Safety
--- a/app/code/core/Enterprise/Catalog/Model/Index/Action/Catalog/Category/Product/Refresh.php
+++...
Other Use-Cases?
Scheduler
Queue processing

Indexes

Everything
that’s batchable
Thank you!
Any questions?
My blog

http://www.aoemedia.com
http://www.fabrizio-branca.de
@fbrnc
Follow me on twitter!
Upcoming SlideShare
Loading in …5
×

Multithreaded XML Import (San Francisco Magento Meetup)

1,593 views

Published on

Author: Fabrizio Branca

1. Multithreaded XML Import …for Magento San Francisco Magento Meetup Group - October 23, 2013
2. Fabrizio Branca Lead System Developer at
3. E-Commerce: Magento CMS: TYPO3 Global Enterprise Projects Portals: ZF, FLOW,… High Performance /Scale Mobile Searchperience: SOLR 120 people in 7 offices world-wide
4. Aoe_Import github.com/AOEmedia/Aoe_Import git clone --recursive …
5. Will Aoe_Import be the fastest product importer around? YES, of course! Well, maybe… Actually, Aoe_Import is only a XML Importer “Framework”. It’s up to you to decide how to handle the xml snippets…
6. for large XML files XML! Not CSV. full flexibility in processor implementation Aoe_Import multi-thread support! Subscribe your “Processors” to xpaths Stream processing (XMLReader) “event” driven
7. memory single product Problem Memory limit time
8. Memory limit time memory memory Trivial Solution Memory limit time
9. Beat the memory Leak by forking Waiting for other thread to terminate Threading overhead Process import
10. Forking? In PHP? $pid = pcntl_fork(); if ($pid) { // parent process runs what is here echo "parentn"; } else { // child process runs what is here echo "childn"; }
11. Threadi github.com/AOEmedia/Threadi
12. Clean OOP interface for PHP to forking and process management Threadi
13. Batch Processor Collect a bunch of imports … …fork… …and process them in a child process.
14. No imports are processed in the main thread. So there’s no memory leak happing here Main thread memory Memory limit time Create process collection Waiting for other thread to terminate Threading Process imports in process collection overhead Forks Every fork starts with the low memory footprint of the main thread Find the number of imports that can be processed at a time without hitting the memory limit
15. Multi-threading? Sure! Number of threads processed in parallel Number of items in a batch
16. Problems? Database Connection Database connection doesn’t like to be cloned! Mage::getSingleton('core/resource') ->getConnection('core_write') ->closeConnection();
17. Problems? Thread Safety
18. Problems? Thread Safety --- a/app/code/core/Enterprise/Catalog/Model/Index/Action/Catalog/Category/Product/Refresh.php +++ b/app/code/core/Enterprise/Catalog/Model/Index/Action/Catalog/Category/Product/Refresh.php @@ -326,7 +326,7 @@ class Enterprise_Catalog_Model_Index_Action_Catalog_Category_Product_Refresh ->setComment('Catalog Category Product Index Tmp'); $this->_connection->dropTable($this->_getMainTmpTable()); $this->_connection->createTable($table); $this->_connection->createTemporaryTable($table); + } /**
19. Other Use-Cases? Scheduler Queue processing Indexes Everything that’s batchable
20. Thank you! Any questions? My blog http://www.aoemedia.com http://www.fabrizio-branca.de @fbrnc Follow me on twitter!

Published in: Technology
  • Be the first to comment

Multithreaded XML Import (San Francisco Magento Meetup)

  1. 1. Multithreaded XML Import …for Magento San Francisco Magento Meetup Group - October 23, 2013
  2. 2. Fabrizio Branca Lead System Developer at
  3. 3. E-Commerce: Magento CMS: TYPO3 Global Enterprise Projects Portals: ZF, FLOW,… High Performance /Scale Mobile Searchperience: SOLR 120 people in 7 offices world-wide
  4. 4. Aoe_Import github.com/AOEmedia/Aoe_Import git clone --recursive …
  5. 5. Will Aoe_Import be the fastest product importer around? YES, of course! Well, maybe… Actually, Aoe_Import is only a XML Importer “Framework”. It’s up to you to decide how to handle the xml snippets…
  6. 6. for large XML files XML! Not CSV. full flexibility in processor implementation Aoe_Import multi-thread support! Subscribe your “Processors” to xpaths Stream processing (XMLReader) “event” driven
  7. 7. memory single product Problem Memory limit time
  8. 8. Memory limit time memory memory Trivial Solution Memory limit time
  9. 9. Beat the memory Leak by forking Waiting for other thread to terminate Threading overhead Process import
  10. 10. Forking? In PHP? $pid = pcntl_fork(); if ($pid) { // parent process runs what is here echo "parentn"; } else { // child process runs what is here echo "childn"; }
  11. 11. Threadi github.com/AOEmedia/Threadi
  12. 12. Clean OOP interface for PHP to forking and process management Threadi
  13. 13. Batch Processor Collect a bunch of imports … …fork… …and process them in a child process.
  14. 14. No imports are processed in the main thread. So there’s no memory leak happing here Main thread memory Memory limit time Create process collection Waiting for other thread to terminate Threading Process imports in process collection overhead Forks Every fork starts with the low memory footprint of the main thread Find the number of imports that can be processed at a time without hitting the memory limit
  15. 15. Multi-threading? Sure! Number of threads processed in parallel Number of items in a batch
  16. 16. Problems? Database Connection Database connection doesn’t like to be cloned! Mage::getSingleton('core/resource') ->getConnection('core_write') ->closeConnection();
  17. 17. Problems? Thread Safety
  18. 18. Problems? Thread Safety --- a/app/code/core/Enterprise/Catalog/Model/Index/Action/Catalog/Category/Product/Refresh.php +++ b/app/code/core/Enterprise/Catalog/Model/Index/Action/Catalog/Category/Product/Refresh.php @@ -326,7 +326,7 @@ class Enterprise_Catalog_Model_Index_Action_Catalog_Category_Product_Refresh ->setComment('Catalog Category Product Index Tmp'); $this->_connection->dropTable($this->_getMainTmpTable()); $this->_connection->createTable($table); $this->_connection->createTemporaryTable($table); + } /**
  19. 19. Other Use-Cases? Scheduler Queue processing Indexes Everything that’s batchable
  20. 20. Thank you! Any questions? My blog http://www.aoemedia.com http://www.fabrizio-branca.de @fbrnc Follow me on twitter!

×