Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Wordcamp Ottawa WP Big Mig

1,226 views

Published on

Enterprise WordPress content migrations overview.
Migration tool at https://github.com/ivankruchkoff/wp-bigmig
WordCamp Ottawa 2017 presentation, json content imports using wp cli from the command line.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Wordcamp Ottawa WP Big Mig

  1. 1. ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Enterprise Migrations How to move 1 MILLION+ posts from ANY CMS to WordPress
  2. 2. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig About Me • Engineer at 10up and yes, we're hiring • Work and travel • ivan@kruchkoff.com • @ivan on WP Slack
  3. 3. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig 10up Engineering Best Practices • https://10up.github.io/Engineering-Best-Practices/ migrations/ • Includes things like: • wp_defer_term_counting( true ) • define( 'WP_POST_REVISIONS', 0 ) • Using php 7.1 • define( 'WP_IMPORTING', true )
  4. 4. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig WordPress Single to Multisite • https://github.com/10up/MU-Migration/ • Get as many single sites as you need into a multi site. • Handle deltas by remigrating. • Gotchas: Theme Mods and custom tables
  5. 5. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig The Problem Statement • Acme corp is frustrated using BoringCMS: • Editorial find the interface horrible to use • The performance is incredibly slow • Countless hours spent keeping the CMS running • They've evaluated other options and have decided on WordPress. • Your team is responsible for getting them there.
  6. 6. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Our Aim • Migrate a live site containing over a million posts • Migrate all assets (e.g. images) • Run two CMSs simultaneously for a period of time • Make delta migrations
  7. 7. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Our Approach • while ( WordPress not live in Prod() ) { • export new or updated posts from BoringCMS() • export new or updated assets to S3() • update and improve posts parser() • import posts() • }
  8. 8. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig The 3 CMS stages • BoringCMS live in prod • BoringCMS is still live, WordPress is ready in prod • WordPress live in Prod
  9. 9. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig BoringCMS in prod • No WordPress • WordPress development in progress. • Content snapshot in WordPress • WordPress development complete • Latest content in WordPress
  10. 10. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig BoringCMS is live, WP is ready in prod • Editorial in BoringCMS • Latest content in WordPress • Editorial in WordPress
  11. 11. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig WP Live in Prod • Profit • :partyparrot:
  12. 12. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Export new or updated posts from BoringCMS() • Fetch all posts • Fetch posts updated since • Fetch posts with min/max id
  13. 13. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Export new or updated assets to S3() • Assets is just a fancy way of saying images + more. • Fetch all assets • Fetch all assets created / modified since • Use s3-parallel-put for performance
  14. 14. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Update and Improve Posts Parser • Start with a few content exports, ideally as JSON • Find your unique identifier (ID / File Path) • Parse the wp_posts fields first: • post_author, post_date/post_modified, post_content , post_title, post_name, post_parent, guid, post_type • Log all the things • Any time an error occurs, pause handling this file, log a message, move to /investigate, continue to next file.
  15. 15. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Import Posts • First few runs, ignore the data. • Create four folders: • pending - any file we haven't yet read / processed • processed - files that were successfully imported into WP. • ignored - files that aren't being handled by WP • investigate - files that should be handled by WP, but threw an error. • Log every step
  16. 16. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Logging • https://github.com/jbroadway/analog • Processing $filename • Error $descriptive_error_message in $filename • Parsed $post_type in $filename • Processed $filename as $post_id
  17. 17. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Code Structure • WP Cli command • File parser, this takes a text file and makes it a PHP data type (json_decode( file_get_contents( $fn) ) • Content Parser, marshalls data from old format into something that can be used to create a post. • Post, contains params for wp_insert_post() post_meta array and taxonomy terms
  18. 18. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Example Dataset • Thanks to the team at Web Hose • https://webhose.io/datasets
  19. 19. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Look at 1 post • news_0003607.json • How do I get the post title? (thread -> title) • How do I get the post slug? ( thread -> uuid? ) • How do I get the post content? ( text ) • How do I get taxonomy? (entities)
  20. 20. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Migrate Overview • Enable plugin • Add your parser* to ExportFilesIterator -> maybe_insert_content() • Setup your dir structure (e.g. your content is in ./ exports) • mkdir -p migrate/pending/123 && mv exports migrate/ pending/123 • wp bigmig -f migrate
  21. 21. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Your JSON parser /**
 * Parse a json file, and if it's valid call @maybe_insert_content
 * @param $filename
 *
 * @return bool
 */
 function parse_json_file( &$filename ) {
 $content_string = file_get_contents( $filename );
 $content = json_decode( $content_string, true);
 if ( ! is_array( $content ) ) {
 Logger::log( "Could not parse JSON file into array in {$filename}", Analog::ERROR );
 FileMover::move_to_investigate( $filename );
 return false;
 }
 $this->maybe_insert_content( $content, $filename );
 }
  22. 22. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Create WebHoseArticleContentParser <?php
 namespace WP_CLIBigMigParsers;
 
 class WebHoseArticleContentParser {
 
 protected $filename;
 protected $parsed_content = array();
 protected $content;
 
 /**
 * Parse a news article json file, returns an array for creating a story WP_Post.
 */
 public function parse() {
 $this->set_title();
 $this->set_body();
 
 return $this->get_parsed_content();
 }
 
 public function __construct( &$content, &$filename ) {
 $this->content = $content;
 $this->filename = $filename;
 }
 
 public function get_parsed_content() {
 return $this->parsed_content;
 }
 
 function set_title() {
 $this->parsed_content[ 'title' ] = $this->content['title'];
 }
 
 
 }
  23. 23. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Create a WebHoseArticlePost <?php
 namespace WP_CLIBigMigPosts;
 
 use WP_CLIBigMigLogger;
 
 class WebHoseArticlePost extends Post {
 
 public function prepare() {
 $this->type = 'WebHoseArticle';
 $this->set_insert_param_from_content( 'post_content', 'body' );
 $this->set_insert_param_from_content( 'post_title', 'title' );
 
 $this->insert_params[ 'meta_input' ] = $this->meta_input;
 }
 }

  24. 24. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Look at your logs • localhost - 2017-07-22 19:35:34 - 1 - Now processing /mnt/tmpfs/ migrate/pending/articles/654/news_0000111.json, processed 1 files. • localhost - 2017-07-22 19:35:34 - 1 - Moved /mnt/tmpfs/migrate/ pending/articles/654/news_0000111.json to /mnt/tmpfs/migrate/ processed/654/news_0000111.json status: processed • localhost - 2017-07-22 19:35:34 - 1 - Finished processing content export /mnt/tmpfs/migrate/pending/articles/654/news_0000111.json as WP Post 30. • localhost - 2017-07-22 19:35:34 - 1 - Now processing /mnt/tmpfs/ migrate/pending/articles/654/news_0000110.json, processed 2 files.
  25. 25. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Skip the boring bits • Content will be broken. • Tax mapping • Edge cases. • That custom format they only used in 1998 for 3 months. • That 1 article type that is XML
  26. 26. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Speeding Up the Process • What are your bottlenecks? Latency everywhere • Reading from a local disk / networked disk / ram • Reading / writing from a local db / networked db. • Replication to read nodes. • Single Process • Term counting etc. - look at 10up Best Practices :)
  27. 27. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Speedups • Provision your machine with lots of ram • defaults['memory'] = 8192 for VVV • Make a ramdisk • sudo mount -o size=1G -t tmpfs none /mnt/tmpfs • Parallelise your processes • dirsplit -m -e1 -s 10M migrate • Gnu Parallel, Bash, Gearman, pcntl_fork()
  28. 28. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig A Demo? • https://github.com/ivankruchkoff/wp-bigmig
  29. 29. Ivan@Kruchkoff.com https://github.com/ivankruchkoff/wp-bigmig Questions? • ivan@kruchkoff.com

×