SlideShare a Scribd company logo
1 of 61
Scaling Magento
Reid Parham, Aaron Edmonds, and Kyle Terry
Public distribution: sensitive information omitted.
www.Copiousinc.com
COPIOUS
● User-Centered Digital Experience Agency
● Strategy
● Experience
● Engineering
● http://copio.us/
Scale Your Code
A.K.A. Magento is hard
Code Management
● Magento is big!
o Our project has over 820,000 lines of PHP
● Multi-lingual, multi-currency, multi-store
● Classes can have complex names
o *cough*
Enterprise_Reward_Block_Adminhtml_Customer_Edit_T
ab_Reward_History_Grid_Column_Renderer_Reason
*cough*
Code Management (cont.)
● Configuration is driven by XML
● The dreaded EAV
● Magento Indices
● Event-Observer
Code Management (Tools)
Good tools make the job easier!
● A good IDE
o Magicento
● Commerce Bug 2
● n98-magerun
Code Management
● NEVER modify core files
o Magento’s forum never helped
● NEVER* add files to app/code/local/Mage
o Magento was built to be modular**
● Test your code with flat catalog enabled
and disabled
● Before overwriting classes, check for events
Code Optimization (Quick Wins)
Caching Magento Blocks
● DIY! Event to add cache data:
core_block_abstract_to_html_before
● OR use a module
https://github.com/aligent/CacheObserver
Code Optimization (Quick Wins)
Mage::getModel(‘catalog/product’)->load($_product-
>getId());
● This is bad in templates and when looping
over product collections
● Load with initial data select
o used_in_product_listing attribute option
Code Optimization
Make efficient use of Magento indices
● Example: Catalog URL Rewrites
o Includes all products by default (including products
marked as “Not Visible Individually”)
o Do you need SEO friendly URLs for products that
will never be seen???
o Reduce your index size by up to 95%
o Mage_Catalog_Model_Resource_Url::_getProducts
Code Optimization (Quick Wins)
Mage_Catalog_Model_Resource_Product_Typ
e_Configurable_Product_Collection::isEnabled
Flat?
FALSE
Systems
● Hardware Profile
● Cluster Design
● Scaling
Hardware Profile (overview)
● 2 racks of hardware and dozens of servers
● Top quality of available (and compatible)
chipsets and memory
● Buffered DDR3; 1 channel per CPU
● 126 kW of stable, reliable, redundant, and
backed up power
● Minor kernel tweaks
Hardware Profile (network)
● NetScaler for load balancing
○ Vserver pools
○ Balances web, database, admin and endeca
○ Monitors will remove downed hosts
● Redundant Network Infrastructure
○ Backplane uses LACP (link aggregation) for
redundancy, load balancing and failover
○ HA pairing of configurations
Hardware Profile (network)
Dynamic port forwarding for browsing:
kyle@localhost $ ssh -L 2221:127.0.0.1:2221 whitelistedhost.example.com
kyle@whitelistedhost $ ssh -D 2221 cluster.example.com
Static port forwarding for Navicat SSH tunneling (tunneling through a tunnel):
kyle@localhost $ ssh -L 2222:127.0.0.1:2222 whitelistedhost.example.com
kyle@whitelistedhost $ ssh -L 2222:127.0.0.1:22 cluster.example.com
Hardware Profile (web)
● Dual Intel Xeon E3-1230 @ 3.30GHz
● 32 GB RAM
● Dozens of servers
● nginx and PHP5-FPM
● 6:1 ratio of PHP processes to CPU cores
Hardware Profile (database)
● Redundant database hosts
● MySQL 5.6 chosen for scaling capability
● tcmalloc further improves throughput
● Master/slave replication
● Standby hosts for warm failover
● Failure point: > 4,000 checkouts/hour
Hardware Profile (database)
● Quad Intel Xeon E7-2860
○ 10 cores + hyperthreading each totalling 80 threads
● 128 GBs of RAM
● RAID10 SSDs for data
○ writeback cache; noatime,noexec mount options
● RAID1 HDDs for OS
Oops!
Hardware Profile (cache)
● Powering discrete instances of Redis
○ Sessions
○ Full page cache
○ Magento back end cache
○ Background processing queues
● Discrete instances are for threading, differing
memory limits, differing backup rules, and
multi-db deprecation
Hardware Profile (cache)
● Content is compressed with LZF
○ Compression and decompression with LZF is faster
than gzip so it’s an ideal solution
● Decreased utilization of network capacity
● Sentinel for failover (soon)
● RDB BGSAVE: prime number intervals
Compression Outcomes
Hardware Profile (cache)
● Quad Intel Xeon E5-2620 @ 2.00GHz
● 128 GBs of RAM
● 4 bonded network interfaces
○ Prevents saturation of private network
○ 4 Gb/s
○ Bonding mode 5 (balance-tlb)
■ No special switch support
■ Nice when the colo manages the switch
Hardware Profile (utility)
● Cron and systems jobs
● Scripts
● Deploys
● Chef Server 10 for deploy and configuration
● Tests
○ Database test suite in Perl (Test::DatabaseRow)
● Backups (and copies)
Cluster Overview
● Production
○ Most hardware serves production
● Staging
○ Some data promoted to production nightly
● Preview{1..n}
○ Instances for testing and previewing new features,
bug fixes and design changes.
● Aggregate hardware availability exceeds
six nines (99.9999%)
● Software availability is ~99.999%
● Software, including deployments: 99.98%
● Software, including maintenance: 99.9%
● Non-recoverable human errors: 98%
Production Uptime
Scale Your Team
Team Profile
● 16 committers; 8.25 FTE
● 4 Project Managers
● 5 departments
● 31 vendors
● 5 time zones
Team Values
● State your needs; respect others’
● Respect is given, then adjusted
● Process can always change and improve
● Work/life balance
● Mature and non-aggressive; mediate conflict
● Honesty and transparency
Team Mantras
● Trust (relevant) data; make things visible
● Measurable, repeatable, falsifiable
(scientific method)
● Redundancy reduces risks (if documented)
● Set expectations (timing, contents, formats)
and deliver on them
Team Mantras
● Automate what is repeated
● Use known patterns and
proven architectures
● Grow talent from within
● Compartmentalization of some data,
code, and knowledge
10 Integrated Vendors
Adobe, Akamai, tax calculation,
legacy software, Ebay, gift cards,
ERP (fulfillment and inventory), Oracle,
Tierpoint (Dallas, Seattle, Spokane),
Endeca provider
advertising, application analytics, email,
hardware analysis and functionality, maps,
offsite storage, promotions, payment gateways,
remarketing, shipping estimates, SMS,
social networks, uptime
21 Accessory Vendors
● Group emails: avoid general questions,
assign actions to people, minimize
distribution lists
● Identify urgency of requests
● Use email filters
● Coach and mentor
Effective Communication
● Daily phone calls: only while needed
● Set an agenda; keep to a schedule
● Encourage people to skip calls
or to leave early
● End the call when completed
Effective Communication
Tools
● GitHub
● Google Docs
● Pivotal Tracker
● Conference calls, Skype, and IM
● BugHerd
Launch Day
Release Day
QA Preparation
Productive!
Off-hours chaos
Build Knowledge
● Document the “obvious”
● 1000-line README
● Capture failures and solutions
● What happens when?
● Which database and server?
Automation Schedule
“This is how we work.”
Example Git Workflow
Learn from previous failures.
Code Review
● Standardize pull request structures
● Constructive feedback; ask questions
● emoji-cheat-sheet.com
Code Review
Pull requests can also be workspaces
Releases and Git flow: rhythm, ownership, and pride.
Deployments
● Monday through Thursday only!
● Communication: tickets, cross references,
pull requests, QA status, and releases
● Set expectations: timings for outages,
maintenance, and degraded functionality
● Are we done, yet?
● Explain outcomes and options
Community Participation
● Patches submitted
o Redis
o Cm_RedisSession
o Cm_Cache_Backend_Redis
o https://github.com/magento/magento2
● Modules improved
o CacheObserver
o VF_CustomMenu
Community Participation
● http://magento.stackexchange.com/
● http://stackoverflow.com/
● phpredis bug(s)
● Spence, Muneera U. Collaborative
Processes lecture. 13 Apr. 2006.
● Marks, Andrea. "The Role of Writing in a
Design Curriculum." AIGA: Design Education
(2004).
● Katzenbach, Jon R., and Douglas K. Smith.
The Wisdom of Teams. HarperCollins, 2003.
Collaboration Texts
● Bennis, Warren, and Patricia W. Biederman.
Organizing Genius. Perseus, 1997.
● Marcum, James W. After the Information
Age. Peter Lang, 2006.
● https://en.wikipedia.org/wiki/Collaboration
(and collaborative method)
Collaboration Texts
See Also
GitHub (and Gist)
@parhamr
@kyleterry
@aedmonds
Questions?

More Related Content

What's hot

Drupal commerce performance profiling and tunning using loadstorm experiments...
Drupal commerce performance profiling and tunning using loadstorm experiments...Drupal commerce performance profiling and tunning using loadstorm experiments...
Drupal commerce performance profiling and tunning using loadstorm experiments...Andy Kucharski
 
Postgres Vision 2018: WAL: Everything You Want to Know
Postgres Vision 2018: WAL: Everything You Want to KnowPostgres Vision 2018: WAL: Everything You Want to Know
Postgres Vision 2018: WAL: Everything You Want to KnowEDB
 
Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15Bobby Curtis
 
How many ways to monitor oracle golden gate-Collaborate 14
How many ways to monitor oracle golden gate-Collaborate 14How many ways to monitor oracle golden gate-Collaborate 14
How many ways to monitor oracle golden gate-Collaborate 14Bobby Curtis
 
Ren cao kafka connect
Ren cao   kafka connectRen cao   kafka connect
Ren cao kafka connectNitin Kumar
 
BlackRay - The open Source Data Engine
BlackRay - The open Source Data EngineBlackRay - The open Source Data Engine
BlackRay - The open Source Data Enginefschupp
 
Troubleshooting K1000
Troubleshooting K1000Troubleshooting K1000
Troubleshooting K1000Dell World
 
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)Bobby Curtis
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceEnkitec
 
Oracle GoldenGate DB2 to Oracle11gR2 Configuration
Oracle GoldenGate DB2 to Oracle11gR2 ConfigurationOracle GoldenGate DB2 to Oracle11gR2 Configuration
Oracle GoldenGate DB2 to Oracle11gR2 Configurationgrigorianvlad
 
Inventory Tips & Tricks
Inventory Tips & TricksInventory Tips & Tricks
Inventory Tips & TricksDell World
 
OSMC 2010 | Monitoring mit Icinga by Icinga Team
OSMC 2010 | Monitoring mit Icinga by Icinga TeamOSMC 2010 | Monitoring mit Icinga by Icinga Team
OSMC 2010 | Monitoring mit Icinga by Icinga TeamNETWAYS
 
Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...
Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...
Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...Andrejs Prokopjevs
 
Reporting Large Environment Zabbix Database
Reporting Large Environment Zabbix DatabaseReporting Large Environment Zabbix Database
Reporting Large Environment Zabbix DatabaseAlain Ganuchaud
 
Nagios Conference 2012 - Scott Wilkerson - Passive Monitoring Solutions For R...
Nagios Conference 2012 - Scott Wilkerson - Passive Monitoring Solutions For R...Nagios Conference 2012 - Scott Wilkerson - Passive Monitoring Solutions For R...
Nagios Conference 2012 - Scott Wilkerson - Passive Monitoring Solutions For R...Nagios
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2PgTraining
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under controlMarcin Przepiórowski
 

What's hot (20)

Drupal commerce performance profiling and tunning using loadstorm experiments...
Drupal commerce performance profiling and tunning using loadstorm experiments...Drupal commerce performance profiling and tunning using loadstorm experiments...
Drupal commerce performance profiling and tunning using loadstorm experiments...
 
Postgres Vision 2018: WAL: Everything You Want to Know
Postgres Vision 2018: WAL: Everything You Want to KnowPostgres Vision 2018: WAL: Everything You Want to Know
Postgres Vision 2018: WAL: Everything You Want to Know
 
20 cool things python
20 cool things python20 cool things python
20 cool things python
 
Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15Extreme replication at IOUG Collaborate 15
Extreme replication at IOUG Collaborate 15
 
How many ways to monitor oracle golden gate-Collaborate 14
How many ways to monitor oracle golden gate-Collaborate 14How many ways to monitor oracle golden gate-Collaborate 14
How many ways to monitor oracle golden gate-Collaborate 14
 
Ren cao kafka connect
Ren cao   kafka connectRen cao   kafka connect
Ren cao kafka connect
 
BlackRay - The open Source Data Engine
BlackRay - The open Source Data EngineBlackRay - The open Source Data Engine
BlackRay - The open Source Data Engine
 
Troubleshooting K1000
Troubleshooting K1000Troubleshooting K1000
Troubleshooting K1000
 
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture Performance
 
Oracle GoldenGate DB2 to Oracle11gR2 Configuration
Oracle GoldenGate DB2 to Oracle11gR2 ConfigurationOracle GoldenGate DB2 to Oracle11gR2 Configuration
Oracle GoldenGate DB2 to Oracle11gR2 Configuration
 
Inventory Tips & Tricks
Inventory Tips & TricksInventory Tips & Tricks
Inventory Tips & Tricks
 
OSMC 2010 | Monitoring mit Icinga by Icinga Team
OSMC 2010 | Monitoring mit Icinga by Icinga TeamOSMC 2010 | Monitoring mit Icinga by Icinga Team
OSMC 2010 | Monitoring mit Icinga by Icinga Team
 
Ipc mysql php
Ipc mysql php Ipc mysql php
Ipc mysql php
 
Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...
Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...
Optimize DR and Cloning with Logical Hostnames in Oracle E-Business Suite (OA...
 
Reporting Large Environment Zabbix Database
Reporting Large Environment Zabbix DatabaseReporting Large Environment Zabbix Database
Reporting Large Environment Zabbix Database
 
Nagios Conference 2012 - Scott Wilkerson - Passive Monitoring Solutions For R...
Nagios Conference 2012 - Scott Wilkerson - Passive Monitoring Solutions For R...Nagios Conference 2012 - Scott Wilkerson - Passive Monitoring Solutions For R...
Nagios Conference 2012 - Scott Wilkerson - Passive Monitoring Solutions For R...
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2
 
Kace & SQL
Kace & SQLKace & SQL
Kace & SQL
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under control
 

Viewers also liked

Simple Helix Presentation | The 2nd Annual eCommerce Expo South Florida
Simple Helix Presentation | The 2nd Annual eCommerce Expo South FloridaSimple Helix Presentation | The 2nd Annual eCommerce Expo South Florida
Simple Helix Presentation | The 2nd Annual eCommerce Expo South FloridaRand Internet Marketing
 
Midwest PHP - Scaling Magento
Midwest PHP - Scaling MagentoMidwest PHP - Scaling Magento
Midwest PHP - Scaling MagentoMathew Beane
 
Zendcon scaling magento
Zendcon scaling magentoZendcon scaling magento
Zendcon scaling magentoMathew Beane
 
Costruire un sito e-commerce in alta affidabilità con Magento e Zend Server C...
Costruire un sito e-commerce in alta affidabilità con Magento e Zend Server C...Costruire un sito e-commerce in alta affidabilità con Magento e Zend Server C...
Costruire un sito e-commerce in alta affidabilità con Magento e Zend Server C...Zend by Rogue Wave Software
 
Angrybirds Magento Cloud Deployment
Angrybirds Magento Cloud DeploymentAngrybirds Magento Cloud Deployment
Angrybirds Magento Cloud DeploymentAOE
 
Optimizing Magento Performance with Zend Server
Optimizing Magento Performance with Zend ServerOptimizing Magento Performance with Zend Server
Optimizing Magento Performance with Zend Servervarien
 
High-Performance Magento in the Cloud
High-Performance Magento in the CloudHigh-Performance Magento in the Cloud
High-Performance Magento in the CloudAOE
 
Rock-solid Magento Deployments (and Development)
Rock-solid Magento Deployments (and Development)Rock-solid Magento Deployments (and Development)
Rock-solid Magento Deployments (and Development)AOE
 
Magento scalability from the trenches (Meet Magento Sweden 2016)
Magento scalability from the trenches (Meet Magento Sweden 2016)Magento scalability from the trenches (Meet Magento Sweden 2016)
Magento scalability from the trenches (Meet Magento Sweden 2016)Divante
 
Real use cases of performance optimization in magento 2
Real use cases of performance optimization in magento 2Real use cases of performance optimization in magento 2
Real use cases of performance optimization in magento 2Max Pronko
 

Viewers also liked (10)

Simple Helix Presentation | The 2nd Annual eCommerce Expo South Florida
Simple Helix Presentation | The 2nd Annual eCommerce Expo South FloridaSimple Helix Presentation | The 2nd Annual eCommerce Expo South Florida
Simple Helix Presentation | The 2nd Annual eCommerce Expo South Florida
 
Midwest PHP - Scaling Magento
Midwest PHP - Scaling MagentoMidwest PHP - Scaling Magento
Midwest PHP - Scaling Magento
 
Zendcon scaling magento
Zendcon scaling magentoZendcon scaling magento
Zendcon scaling magento
 
Costruire un sito e-commerce in alta affidabilità con Magento e Zend Server C...
Costruire un sito e-commerce in alta affidabilità con Magento e Zend Server C...Costruire un sito e-commerce in alta affidabilità con Magento e Zend Server C...
Costruire un sito e-commerce in alta affidabilità con Magento e Zend Server C...
 
Angrybirds Magento Cloud Deployment
Angrybirds Magento Cloud DeploymentAngrybirds Magento Cloud Deployment
Angrybirds Magento Cloud Deployment
 
Optimizing Magento Performance with Zend Server
Optimizing Magento Performance with Zend ServerOptimizing Magento Performance with Zend Server
Optimizing Magento Performance with Zend Server
 
High-Performance Magento in the Cloud
High-Performance Magento in the CloudHigh-Performance Magento in the Cloud
High-Performance Magento in the Cloud
 
Rock-solid Magento Deployments (and Development)
Rock-solid Magento Deployments (and Development)Rock-solid Magento Deployments (and Development)
Rock-solid Magento Deployments (and Development)
 
Magento scalability from the trenches (Meet Magento Sweden 2016)
Magento scalability from the trenches (Meet Magento Sweden 2016)Magento scalability from the trenches (Meet Magento Sweden 2016)
Magento scalability from the trenches (Meet Magento Sweden 2016)
 
Real use cases of performance optimization in magento 2
Real use cases of performance optimization in magento 2Real use cases of performance optimization in magento 2
Real use cases of performance optimization in magento 2
 

Similar to Scaling Magento: Optimize Code, Hardware, and Team

SANDcamp 2014 - A Perfect Launch, Every Time
SANDcamp 2014 - A Perfect Launch, Every TimeSANDcamp 2014 - A Perfect Launch, Every Time
SANDcamp 2014 - A Perfect Launch, Every TimeJon Peck
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using AnsibleAlok Patra
 
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedis Labs
 
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriThinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production 6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production Hung Lin
 
Liferay portals in real projects
Liferay portals  in real projectsLiferay portals  in real projects
Liferay portals in real projectsIBACZ
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthNicolas Brousse
 
Programming for non tech entrepreneurs
Programming for non tech entrepreneursProgramming for non tech entrepreneurs
Programming for non tech entrepreneursRodrigo Gil
 
Designing for operability and managability
Designing for operability and managabilityDesigning for operability and managability
Designing for operability and managabilityGaurav Bahrani
 
Php Inspections (EA Extended): The Cookbook
Php Inspections (EA Extended): The CookbookPhp Inspections (EA Extended): The Cookbook
Php Inspections (EA Extended): The CookbookVladimir Reznichenko
 
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps WayDevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Waysmalltown
 
DrupalCon 2014: A Perfect Launch, Every Time
DrupalCon 2014: A Perfect Launch, Every TimeDrupalCon 2014: A Perfect Launch, Every Time
DrupalCon 2014: A Perfect Launch, Every TimePantheon
 
My benchmarks brings all the boys to the yard
My benchmarks brings all the boys to the yardMy benchmarks brings all the boys to the yard
My benchmarks brings all the boys to the yardIon Dormenco
 
Eko10 Workshop Opensource Database Auditing
Eko10  Workshop Opensource Database AuditingEko10  Workshop Opensource Database Auditing
Eko10 Workshop Opensource Database AuditingJuan Berner
 
MySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the WireMySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the WireSimon J Mudd
 
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Martin Spier
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksSenturus
 

Similar to Scaling Magento: Optimize Code, Hardware, and Team (20)

SANDcamp 2014 - A Perfect Launch, Every Time
SANDcamp 2014 - A Perfect Launch, Every TimeSANDcamp 2014 - A Perfect Launch, Every Time
SANDcamp 2014 - A Perfect Launch, Every Time
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using Ansible
 
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
 
Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017
 
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriThinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production 6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production
 
Liferay portals in real projects
Liferay portals  in real projectsLiferay portals  in real projects
Liferay portals in real projects
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
The Accidental DBA
The Accidental DBAThe Accidental DBA
The Accidental DBA
 
Programming for non tech entrepreneurs
Programming for non tech entrepreneursProgramming for non tech entrepreneurs
Programming for non tech entrepreneurs
 
Designing for operability and managability
Designing for operability and managabilityDesigning for operability and managability
Designing for operability and managability
 
Php Inspections (EA Extended): The Cookbook
Php Inspections (EA Extended): The CookbookPhp Inspections (EA Extended): The Cookbook
Php Inspections (EA Extended): The Cookbook
 
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps WayDevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
DevOpsDays Taipei 2019 - Mastering IaC the DevOps Way
 
DrupalCon 2014: A Perfect Launch, Every Time
DrupalCon 2014: A Perfect Launch, Every TimeDrupalCon 2014: A Perfect Launch, Every Time
DrupalCon 2014: A Perfect Launch, Every Time
 
My benchmarks brings all the boys to the yard
My benchmarks brings all the boys to the yardMy benchmarks brings all the boys to the yard
My benchmarks brings all the boys to the yard
 
Eko10 Workshop Opensource Database Auditing
Eko10  Workshop Opensource Database AuditingEko10  Workshop Opensource Database Auditing
Eko10 Workshop Opensource Database Auditing
 
MySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the WireMySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the Wire
 
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
 

Recently uploaded

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Recently uploaded (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Scaling Magento: Optimize Code, Hardware, and Team

  • 1. Scaling Magento Reid Parham, Aaron Edmonds, and Kyle Terry Public distribution: sensitive information omitted. www.Copiousinc.com
  • 2. COPIOUS ● User-Centered Digital Experience Agency ● Strategy ● Experience ● Engineering ● http://copio.us/
  • 3. Scale Your Code A.K.A. Magento is hard
  • 4. Code Management ● Magento is big! o Our project has over 820,000 lines of PHP ● Multi-lingual, multi-currency, multi-store ● Classes can have complex names o *cough* Enterprise_Reward_Block_Adminhtml_Customer_Edit_T ab_Reward_History_Grid_Column_Renderer_Reason *cough*
  • 5. Code Management (cont.) ● Configuration is driven by XML ● The dreaded EAV ● Magento Indices ● Event-Observer
  • 6. Code Management (Tools) Good tools make the job easier! ● A good IDE o Magicento ● Commerce Bug 2 ● n98-magerun
  • 7. Code Management ● NEVER modify core files o Magento’s forum never helped ● NEVER* add files to app/code/local/Mage o Magento was built to be modular** ● Test your code with flat catalog enabled and disabled ● Before overwriting classes, check for events
  • 8. Code Optimization (Quick Wins) Caching Magento Blocks ● DIY! Event to add cache data: core_block_abstract_to_html_before ● OR use a module https://github.com/aligent/CacheObserver
  • 9. Code Optimization (Quick Wins) Mage::getModel(‘catalog/product’)->load($_product- >getId()); ● This is bad in templates and when looping over product collections ● Load with initial data select o used_in_product_listing attribute option
  • 10. Code Optimization Make efficient use of Magento indices ● Example: Catalog URL Rewrites o Includes all products by default (including products marked as “Not Visible Individually”) o Do you need SEO friendly URLs for products that will never be seen??? o Reduce your index size by up to 95% o Mage_Catalog_Model_Resource_Url::_getProducts
  • 11. Code Optimization (Quick Wins) Mage_Catalog_Model_Resource_Product_Typ e_Configurable_Product_Collection::isEnabled Flat? FALSE
  • 12. Systems ● Hardware Profile ● Cluster Design ● Scaling
  • 13.
  • 14. Hardware Profile (overview) ● 2 racks of hardware and dozens of servers ● Top quality of available (and compatible) chipsets and memory ● Buffered DDR3; 1 channel per CPU ● 126 kW of stable, reliable, redundant, and backed up power ● Minor kernel tweaks
  • 15. Hardware Profile (network) ● NetScaler for load balancing ○ Vserver pools ○ Balances web, database, admin and endeca ○ Monitors will remove downed hosts ● Redundant Network Infrastructure ○ Backplane uses LACP (link aggregation) for redundancy, load balancing and failover ○ HA pairing of configurations
  • 16. Hardware Profile (network) Dynamic port forwarding for browsing: kyle@localhost $ ssh -L 2221:127.0.0.1:2221 whitelistedhost.example.com kyle@whitelistedhost $ ssh -D 2221 cluster.example.com Static port forwarding for Navicat SSH tunneling (tunneling through a tunnel): kyle@localhost $ ssh -L 2222:127.0.0.1:2222 whitelistedhost.example.com kyle@whitelistedhost $ ssh -L 2222:127.0.0.1:22 cluster.example.com
  • 17. Hardware Profile (web) ● Dual Intel Xeon E3-1230 @ 3.30GHz ● 32 GB RAM ● Dozens of servers ● nginx and PHP5-FPM ● 6:1 ratio of PHP processes to CPU cores
  • 18. Hardware Profile (database) ● Redundant database hosts ● MySQL 5.6 chosen for scaling capability ● tcmalloc further improves throughput ● Master/slave replication ● Standby hosts for warm failover ● Failure point: > 4,000 checkouts/hour
  • 19. Hardware Profile (database) ● Quad Intel Xeon E7-2860 ○ 10 cores + hyperthreading each totalling 80 threads ● 128 GBs of RAM ● RAID10 SSDs for data ○ writeback cache; noatime,noexec mount options ● RAID1 HDDs for OS
  • 20. Oops!
  • 21. Hardware Profile (cache) ● Powering discrete instances of Redis ○ Sessions ○ Full page cache ○ Magento back end cache ○ Background processing queues ● Discrete instances are for threading, differing memory limits, differing backup rules, and multi-db deprecation
  • 22. Hardware Profile (cache) ● Content is compressed with LZF ○ Compression and decompression with LZF is faster than gzip so it’s an ideal solution ● Decreased utilization of network capacity ● Sentinel for failover (soon) ● RDB BGSAVE: prime number intervals
  • 24. Hardware Profile (cache) ● Quad Intel Xeon E5-2620 @ 2.00GHz ● 128 GBs of RAM ● 4 bonded network interfaces ○ Prevents saturation of private network ○ 4 Gb/s ○ Bonding mode 5 (balance-tlb) ■ No special switch support ■ Nice when the colo manages the switch
  • 25. Hardware Profile (utility) ● Cron and systems jobs ● Scripts ● Deploys ● Chef Server 10 for deploy and configuration ● Tests ○ Database test suite in Perl (Test::DatabaseRow) ● Backups (and copies)
  • 26. Cluster Overview ● Production ○ Most hardware serves production ● Staging ○ Some data promoted to production nightly ● Preview{1..n} ○ Instances for testing and previewing new features, bug fixes and design changes.
  • 27. ● Aggregate hardware availability exceeds six nines (99.9999%) ● Software availability is ~99.999% ● Software, including deployments: 99.98% ● Software, including maintenance: 99.9% ● Non-recoverable human errors: 98% Production Uptime
  • 29. Team Profile ● 16 committers; 8.25 FTE ● 4 Project Managers ● 5 departments ● 31 vendors ● 5 time zones
  • 30. Team Values ● State your needs; respect others’ ● Respect is given, then adjusted ● Process can always change and improve ● Work/life balance ● Mature and non-aggressive; mediate conflict ● Honesty and transparency
  • 31. Team Mantras ● Trust (relevant) data; make things visible ● Measurable, repeatable, falsifiable (scientific method) ● Redundancy reduces risks (if documented) ● Set expectations (timing, contents, formats) and deliver on them
  • 32. Team Mantras ● Automate what is repeated ● Use known patterns and proven architectures ● Grow talent from within ● Compartmentalization of some data, code, and knowledge
  • 33.
  • 34. 10 Integrated Vendors Adobe, Akamai, tax calculation, legacy software, Ebay, gift cards, ERP (fulfillment and inventory), Oracle, Tierpoint (Dallas, Seattle, Spokane), Endeca provider
  • 35. advertising, application analytics, email, hardware analysis and functionality, maps, offsite storage, promotions, payment gateways, remarketing, shipping estimates, SMS, social networks, uptime 21 Accessory Vendors
  • 36. ● Group emails: avoid general questions, assign actions to people, minimize distribution lists ● Identify urgency of requests ● Use email filters ● Coach and mentor Effective Communication
  • 37. ● Daily phone calls: only while needed ● Set an agenda; keep to a schedule ● Encourage people to skip calls or to leave early ● End the call when completed Effective Communication
  • 38. Tools ● GitHub ● Google Docs ● Pivotal Tracker ● Conference calls, Skype, and IM ● BugHerd
  • 39.
  • 40.
  • 46. Build Knowledge ● Document the “obvious” ● 1000-line README ● Capture failures and solutions ● What happens when? ● Which database and server?
  • 48. “This is how we work.”
  • 50. Learn from previous failures.
  • 51. Code Review ● Standardize pull request structures ● Constructive feedback; ask questions ● emoji-cheat-sheet.com
  • 52. Code Review Pull requests can also be workspaces
  • 53. Releases and Git flow: rhythm, ownership, and pride.
  • 54.
  • 55. Deployments ● Monday through Thursday only! ● Communication: tickets, cross references, pull requests, QA status, and releases ● Set expectations: timings for outages, maintenance, and degraded functionality ● Are we done, yet? ● Explain outcomes and options
  • 56. Community Participation ● Patches submitted o Redis o Cm_RedisSession o Cm_Cache_Backend_Redis o https://github.com/magento/magento2 ● Modules improved o CacheObserver o VF_CustomMenu
  • 57. Community Participation ● http://magento.stackexchange.com/ ● http://stackoverflow.com/ ● phpredis bug(s)
  • 58. ● Spence, Muneera U. Collaborative Processes lecture. 13 Apr. 2006. ● Marks, Andrea. "The Role of Writing in a Design Curriculum." AIGA: Design Education (2004). ● Katzenbach, Jon R., and Douglas K. Smith. The Wisdom of Teams. HarperCollins, 2003. Collaboration Texts
  • 59. ● Bennis, Warren, and Patricia W. Biederman. Organizing Genius. Perseus, 1997. ● Marcum, James W. After the Information Age. Peter Lang, 2006. ● https://en.wikipedia.org/wiki/Collaboration (and collaborative method) Collaboration Texts
  • 60. See Also GitHub (and Gist) @parhamr @kyleterry @aedmonds

Editor's Notes

  1. How would you build the world’s largest, fastest, most complex Magento ecommerce store? Join three COPIOUS engineers as they share their approaches to this problem. This one-hour presentation will include the best practices, code samples, and system configurations necessary to scale Magento up to 100,000 daily orders with a catalog of 100,000 products. Client is publicly traded, so we’re constrained by federal regulations on some details. US retail sector; busiest periods, in order: 1. Cyber Monday 2. Pre-Christmas 3. Post-Christmas 4. Back to School 5. Spring Break Site-wide average response time: 282 ms
  2. Founded in early 2000s. Native iOS and Android Ecommerce clusters Product configurations Complex integrations Marketing and content strategy We are hiring Business Development Director Sr. Software Engineer / Engineering Manager Studio Manager DevOps Engineer Sr. Ruby on Rails Developer Sr. Strategist Mobile Engineer
  3. Keep things specific to Magento, not basic
  4. These are more like ground rules If you’re modifying core files, you’re doing it wrong! All too common to see Magento forum recommendations telling people to just modify app/code/core/… Events: 406 events fired for homepage, 663 for category page, 1038 for PDP, 836 for cart
  5. Blocks are where the rubber meets the road for Magento, the last piece in the chain of getting data to the end-user. Many blocks are not cached (some rightly so for customer session) For instance, Magento CMS blocks go through the rendering process for each page they are displayed on Many modules available for this. Open source options available. Production cache host has ~1 million keys for Back End cache. commands per second: > 3,000 expirations per second: ~100 hit rate: ~85%
  6. This is common to see this on product listing pages, cart page, checkout review Magento has accounted for this!
  7. Optimizing what is included in the indexes can be difficult but it can provide some big payoffs if you have a large catalog. Rewrite Mage_Catalog_Model_Resource_Url::_getProducts Current runtime for catalog_url index: ~30 minutes
  8. What does this method return? This method is called in product list blocks as well as PDPs and other small pages like the cart and each step of the checkout via collectTotals.
  9. We don’t actually use 126 kW of power :) Sandy Bridge: not the latest and greatest but still good Kernel tweaks include: socket limits, shared memory limits, open file limits, larger queues for networking, and IPv4 stability/security/capacity IPv6 ignored at nginx layer
  10. MySQL and HTTP monitors will remove hosts from the pool that go down. Maximum period between failure and pool removal is 7 seconds. Scripts try to recover downed instances by restarting services. Outcomes from outages are emailed to the group. See ARP table corruption? Is it every 4 hours? Do you have Cisco switches? This is the ARP cache lifetime :) Important NetScaler configs… * Services: -cip ENABLED X-Forwarded-For -cltTimeout 30 -svrTimeout 120 -CKA YES * Virtual server, port 80: -persistenceType NONE * Virtual server, port 443: -persistenceType SSLSESSION
  11. SKIP if time is an issue A locked down network with no VPN means you need to get creative when working from home.
  12. These CPUs are fast enough and a great value; not *extreme* power. $240 each Average daily Load average of 0.7–1.2: sustained normal Load average of 5: target maximum load (35% performance degradation) Load average of 7+: “failure” load NGINX and PHP5-FPM We are targeting a comfortable performance level. Ratio of PHP processes to CPU cores found through trial and error. This is the lowest process count we could deploy without socket resets under crush loads. This quantity of PHP processes is possible with 32 GB of RAM in each web host. Several boxes were shipped with extra/junk/mismatched RAM (1 GB sticks) and review was necessary
  13. https://github.com/blog/1422-tcmalloc-and-mysql MySQL 5.6 versus 5.5: https://dev.mysql.com/tech-resources/articles/mysql-5.6-rc.html TODO What makes 5.6 scale better? • Better linear performance and scale on systems supporting multi-processors and high CPU thread concurrency • InnoDB has been re-factored to minimize legacy mutex contentions and bottlenecks Better multi-processor support We are interested in Percona and MariaDB but do not have operational capacity to use either. (discover, tune, configure, automate, etc) Failure defined as connection timeouts and socket resets for ~3 percent of users. It took us nearly a month to produce enough load to cause minor failures. The real/hard failure point is higher than this, but that’s the best we’ve been able to do! :P Configs: thread_cache_size = 512 (possibly too low!) table_open_cache = 12288 tmp_table_size = 512M query_cache_type = 1 (on) query_cache_limit = 4M (supports SOAP and REST API integrations) query_cache_size = 512M (larger than this is problematic; it’s typically ~60% full) innodb_buffer_pool_size = 32G innodb_log_buffer_size = 2G innodb_log_file_size = 512M innodb_file_per_table Statistics: 42 TB of transmitted data 23 TB of innodb writes in 90 days 8.4 TB of innodb log churn 100% thread cache hit rate 99.996% table cache hit rate (large number of open tables possibly related to MySQL bugs #16244691 and #65384) 99.9999993% of table locks are immediate (most are nightly processes) Innodb_buffer_pool_wait_free: 0 Innodb_log_waits: 0 85% query cache hit rate; this doesn’t really mean anything with such high churn rate of data 94.5% of temp tables are in memory (only nightly processes require disk tables) 99.97% of queries are faster than 200 ms; we’ve reached a plateau of optimization The average row lock is a bit slow as a consequence of Magento indexing architecture and our background processing queues Moderate rate of random and sequential reads (table scans!), but we can absorb that overhead with hardware and focus on improving PHP code
  14. YES Lots of memory DB cache, For things like placing orders, SSDs provide fast write speeds. MAYBE We have just over 10,000 write IOPS capacity (and sustain ~125 IOPS). YES Remote management, configuration, and validation of hardware RAID can be difficult; push the colocation facility to assign knowledgeable technicians. Partition mount configurations assume power will never be lost; optimal throughput and security. (RAID controllers do not have battery backup units installed) These CPUs are about $2,000 each!
  15. We now know what a load average of 400 looks like! o.O (Ugly SQL that wanted a temp table of 2.4 quadrillion rows) endeca query was wanting to create a temp table
  16. SKIP this if needed YES 3,000 commands per second; 0.7 ms average response time MAYBE All Redis instances persist to disk with RDB. Only sessions’ RDB files are backed up off-server (Sessions point to carts! We see a lot of anonymous users.) YES Downside of multiple instances: quadrupling of file descriptors and socket connections required some kernel ulimit tweaks NO PHP Redis libraries not quite mature enough to support persistent connections with PHP5-FPM: https://github.com/nicolasff/phpredis/issues/70 zend diables pconnect
  17. Compressing cache contents also increases storage capacity! We can afford the increased CPU overhead to improve RAM and network capability. Network throughput went from 500 Mbit/sec to 125 Mbit/sec Sustained disk IO went from 80% to 12% utilization Prime number intervals on the RDB BGSAVE reduce contention over disk IO, as write activity is less likely to overlap
  18. Several boxes were shipped with extra/junk RAM (2 GB sticks) and review was necessary Balances transmitted data by changing the mac address on the outgoing packages No special switch support; 32 Gbps backplane :) Nice when the colo runs the switch “The outgoing traffic is distributed according to the current load (computed relative to the speed) on each slave.” - Linux Foundation CPUs are $450 each.
  19. SKIP this if needed Daily offsite backups; verified functional :) Bash, Ruby, Perl, Python, PHP Chef *really* wants to run every 30 minutes; prevent that with the `--once` argument. Test failures catch human errors; emails sent from failures are intentionally obnoxious
  20. Ubuntu, nginx, php-fpm and MySQL are reliable, predictable, and scriptable. Deployments take about 10 minutes because we’re cautious about database schema and large caches take time to clear. Magento’s architecture for indexing in 1.12 greatly constrains our uptime. We’ve written automation, adjusted sources of authority, and standardized communication workflows to prevent most human error. Human errors will quickly drag uptime down to 94% if response time is slow and mitigation/recovery plans are not documented.
  21. This is slightly above the size of most effective teams; managed through limiting scope of engagements for timing and components. We also cluster around sprints and product/feature development teams. We hire smart people who work like craftspersons. They enjoy building things that delight other people. Expertise is not required, but the ability to learn is. Client had some turnover; we have to be careful to not be perceived as a threat. Regions: England (vendor), East Coast US, Midwest, West Coast US, East Coast Australia.
  22. QUICK slide This is intended for peer groups with moderate homogeneity and similar cultural backgrounds. (Not a license for monoculture, though.) Give full respect up front—these are your peers. If person is disrespectful or burdensome, provide suggestions, and then gently reduce respect. Some people worked *some* long weeks; not always. Many people even took vacations! Admitting mistakes is better for all.
  23. QUICK slide Scientific method Measure everything up front and develop your questions later (Borrowed from Big Data™, NoSQL, etc)
  24. QUICK slide Humans make errors; machines are made for repetition. We do not test in production! :) Standard POSIX process signaling and Ubuntu init scripts. Generally: Systems engineers need to know a moderate amount about a lot. Software engineers need to know a lot about a little. Project managers need to know a little about a lot. Operations engineers need to know a little about a lot.
  25. Easy mantra/value: nourish people with free beverages and comfortable, low-distraction environments. Feed your team! We worked many lunch hours and a few late nights. Bosses bought food and accommodated dietary needs.
  26. Introduction to vendors Names, titles, emails, time zones (and business hours), escalation procedures Optimal scenario: “we speak for [client] and [vendor] deals directly with us” Approach with a gentle demeanor (not here to take over and rule how everything goes) Small talk provides something to relate to; people seem quite affable toward the PNW Share: Design goals, objectives, and values Push the vendors to deliver; this should be done by company superiors.
  27. SKIP if needed Trust, and verify. Some documentation was wrong or missing.
  28. Urgencies have varying levels and definitions; find what works! Skype and instant message are need-to-know basis Cell phone contact should be rare and with explicit boundaries
  29. SKIP if needed Phone calls are hard!
  30. Contact lists, issues triage, process documentation, collaborative editing, task delegation, history and context BOUNDARIES: I check in with others when I see their timestamps are outside of business hours. Documents with sensitive info are marked CONFIDENTIAL and shared with a minimal group. Some documents are internal only.
  31. As proof of our decent work/life balance, we see most commits are business hours in local time. (Committers are in 2 time zones and have flexible schedules) Some late nights and weekends were had, but typically for specific sprints, maintenance, deployments, and chores.
  32. Cowboy coding in production: help or go away
  33. Launch day! June 28/29 The state of the codebase was compatible with release because we were making limited and deliberate changes.
  34. Release day. Features passing UAT are accepted before release.
  35. Some releases have pretty complex preparations. A team familiar with Git is an effective team. (It shouldn’t get in the way!)
  36. SKIP if needed. The network graph can be quite elegant with such a large and effective team.
  37. SKIP if needed. Off hours maintenance is sometimes a frenzied, guess and check process. We mitigate these risks with supporting people available and our commit/deploy rigor for safeguards.
  38. Some of my flurries of commits are documentation updates. I’m a risk in that I understand nearly everything; others hold me accountable by requesting documentation. The preview environment has a Chef HTML template that lists tickets, branches, URLs, known issues, and general notes. I’ve provided links and listings of which files determine application states and integrations. (Single sources of authority preferred)
  39. Standardize what gets included in a pull request. “What changed” (list) and “How to test” (steps, outcomes, caveats) are my favorites. Opportunity to teach and learn; see how others do things, and provide DRYing or refactoring advice. It’s a vulnerable moment that deserves to be uplifting and positive.
  40. Pull requests can provide advance notice to the group at large.
  41. Friday and weekend deployments eat up budgets and human capital by requiring people be available. Educate people and standardize language regarding ticketing systems and GitHub flow. Quality assurance: always include steps to reproduce and expected outcome. Define what failure is; consider releasing incremental fixes. Do you coordinate releases by dates? Version numbers? Names? Standardize and schedule! Build routines. Explaining if issues have been solved or when they’re expected to be fixed. Build and grow trust by defining risks, mitigation options, rollback criteria, and recovery steps and timings.
  42. Reid’s 2007 undergraduate thesis was a survey of modern and contemporary literature with analysis for educational settings.
  43. Expand upon the “dropping people from email CC if they’re difficult” advice, please? Our most common habit is taking email threads internal to determine our preferred response. We explicitly state [internal thread] in the first line when this has been done. Some people are very knowledgeable but tend to provide advice or input past their job titles. It’s nice and well intentioned, but slows things down. Our habit has been to only ask questions of those people for subjects specifically under their job titles. What issues have you seen with splitting MySQL reads and writes? Checkout theoretically can experience problems with replication delay, but we haven’t seen it happen. Slaves will stop on any foreign key errors, and we’ve seen some. Features most frequently causing that problem have been reports. We’ve disabled reports because the client uses Analytics products for business intelligence. Database unit tests and validation tests have helped us catch human errors that would cause slaves to stop Why use real hardware? “Walk before we can run.” Previous Magento partners could only optimize the site to a point where it required 64 web servers and very high IOPS. That wasn’t going to be easy on the cloud. The current application state would probably run pretty well on the cloud, but we’re risk averse and want full control. We’re planning to eventually get to the cloud, which should occur when this hardware is out of date. That will be opportunity for full investigation of a custom ecommerce product (service oriented architecture; onmichannel integration).