Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
"Preparing for the future"            By : ~/Piyush
   5+ years experience designing, setting    up, testing & running production web systems in    varied deployment environ...
   Scalable   Robust and Always Available   Manageable   Resilience   Operationally Visible (Monitor Everything)   C...
   Avoid unnecessary change by selecting a    long-term supported distribution on which to    base your platform.    ◦ RH...
   Use your capacity model to drive a decision    on how you build infrastructure : Check SLAs    & Cost constraints    ◦...
   Split each service(/layer) out across its own    set of servers for easier scale-out and    management.    ◦ Traffic M...
   Use redundant pairs(on devices/appliances)    , /HA/ & clustering or failover to ensure    availability of service(s)....
◦ Dev , QA and staging platforms (both application &  N/W platform) to prove application and  configuration changes before...
   Virtualization is key here :) ...actually this is    changing world ...not the cloud !!   + Selecting the Right Virtu...
   Package Management - YUM repositories    (Distribution + Own)   Create you own Repository servers for    packages + C...
   Use a central service for identity and    password management    ◦ OpenLDAP    ◦ Active Directory    ◦ TACACS+ (N/w de...
◦ Version Control:-  SVN / GIT◦ Use continuous integration and deployment tools to  test and release software  Jenkins (...
   Starting from Site Availability Checks &    External Dependencies Checks to much more    detailed data to Capture as m...
   So, source could be anything starting from    DB, logs, SNMP, http etc   + have Real time reporting over it    (Dashb...
   Continue to plan your resource requirements    based on growth expectations, new features    and performance targets ...
   Varnish cache    ◦ Reverse proxy, flexible configuration with inline C      support   Nginx    ◦ Event based / Lightw...
   As a first exercise - do have a IT Infrastructure &    Application Threat Modeling done along with    Risk Assessment ...
   Diagnosing / Troubleshooting and Fixing    production issues   Change Management and Delivery   Automate as much as ...
Questions if Any !! Ping Me on:-IRC /freenode/ : PiyushK ##infra-talkGtalk: piykumarTwitter @piykumar
Infrastructure Considerations : Design : "webops"
Upcoming SlideShare
Loading in …5
×

Infrastructure Considerations : Design : "webops"

2,037 views

Published on

Published in: Technology
  • Be the first to comment

Infrastructure Considerations : Design : "webops"

  1. 1. "Preparing for the future" By : ~/Piyush
  2. 2.  5+ years experience designing, setting up, testing & running production web systems in varied deployment environments Experience setting up colocation IDCs with Active-Active DR sites for India’s No. 1 OTA Experience working on public cloud platforms like AWS and setting up private cloud infrastructure …Generation G : Gamification /engineer/  Tags: techie, open source enthusiast, engineer, geek, DevOps, web ops, security , Tripper(MMYT),Ex-Nextag-ian :)
  3. 3.  Scalable Robust and Always Available Manageable Resilience Operationally Visible (Monitor Everything) Cost effective
  4. 4.  Avoid unnecessary change by selecting a long-term supported distribution on which to base your platform. ◦ RHEL / CentOS ◦ Ubuntu LTS (Long Term Support) ◦ Debian Stable My preference:- RHEL / CentOS (Red Hat Stability & yum wins)
  5. 5.  Use your capacity model to drive a decision on how you build infrastructure : Check SLAs & Cost constraints ◦ 100% dedicated hardware (Self Managed / Outsourced) ◦ 100% cloud (May consider AWS /or Rackspace) ◦ Hybrid Cloud success relies on “automating” key service management processes to optimize the run-time operation of /dynamic workloads/ in a shared-resource environment.
  6. 6.  Split each service(/layer) out across its own set of servers for easier scale-out and management. ◦ Traffic Management / (both Global Traffic & Local traffic management) ◦ Application Servers ◦ Data Store Servers ◦ Email Services ◦ + Minimize Distribution of State:-  Keep services that require storage to a minimum, for ease of backups and management - like Data Services (backups)
  7. 7.  Use redundant pairs(on devices/appliances) , /HA/ & clustering or failover to ensure availability of service(s). ◦ Minimum down-time. ◦ Application & services redundancy + Load Balanced cluster on one site & DR too ◦ DB HA+ Data Store(MySQL) Backup and Recovery ◦ Choose and implement best suited Failover strategy ◦ Redundant Network on each node (+ on Server: Linux NIC bond)
  8. 8. ◦ Dev , QA and staging platforms (both application & N/W platform) to prove application and configuration changes before they go live into production.◦ Most of the Live site issues are due to lack of similar configuration environment / platform for Dev / QA / Staging Testing.◦ LAB Env:-  Performance/Stress LAB  Experimentation LAB (A/B or Multivariate experiment) support with Live traffic
  9. 9.  Virtualization is key here :) ...actually this is changing world ...not the cloud !! + Selecting the Right Virtualization Technology Use network boot and installer tools; or templated provisioning to build servers identically ◦ PXE Boot + Kickstart ◦ VMWare ESXi Template /Citrix Xenserver ◦ Amazon AMI (EC2) ◦ OpenNebula
  10. 10.  Package Management - YUM repositories (Distribution + Own) Create you own Repository servers for packages + Code both Use configuration management tools to deploy configuration automatically from a central location. ◦ Puppet / Facter ◦ Chef ◦ CFEngine (Nova) ◦ RANCID (N/w Devices)
  11. 11.  Use a central service for identity and password management ◦ OpenLDAP ◦ Active Directory ◦ TACACS+ (N/w devices) Have proper accounting/audit Logging Inventory Management : ◦ Use facter facts + CMDB based Inventory Management
  12. 12. ◦ Version Control:-  SVN / GIT◦ Use continuous integration and deployment tools to test and release software  Jenkins (Hudson) / Go  Capistrano / Fabric◦ ....Deploy more frequently ...so as to build confidence in the whole system for change management
  13. 13.  Starting from Site Availability Checks & External Dependencies Checks to much more detailed data to Capture as much data as possible. Store time-series data for trend analysis, and alert when thresholds are breached. ◦ CPU / RAM / IO / Network usage per server ◦ Application metrics ◦ Disc space usage ◦ Network bandwidth ◦ MySQL numbers ◦ ...etc
  14. 14.  So, source could be anything starting from DB, logs, SNMP, http etc + have Real time reporting over it (Dashboards) + Real time data extraction Tools to consider: ◦ Ganglia / Centreon / Nagios ◦ OpManager for URL monitoring ◦ Selenium RC based checks (Functional tests) etc Alerting on both Minimum/Maximum Thresholds (OK, WARN, CRITICAL)!
  15. 15.  Continue to plan your resource requirements based on growth expectations, new features and performance targets Use data from: ◦ Your monitoring system! ◦ Business requirements Continuously Improve: ◦ Profile applications and reduce resource usage (Dtrace) ◦ Review performance against capacity model ◦ Feed a “Top 10” hitlist back to developers may be slow queries etc
  16. 16.  Varnish cache ◦ Reverse proxy, flexible configuration with inline C support Nginx ◦ Event based / Lightweight ◦ Runs more than 8% of the web PHP-FPM ◦ Best FastCGI implementation available for PHP MySQL Server tuning / optimization Caching:- In memory data store - Memcached / Redis
  17. 17.  As a first exercise - do have a IT Infrastructure & Application Threat Modeling done along with Risk Assessment then…..consider having ◦ HIDS (OSSEC) /IPTABLES ◦ WAF (Web Application Firewall) ◦ IPS (Intrusion prevention system) ◦ Linux Hardening ◦ DLP (Data Leakage Prevention) ◦ Data Encryption considerations wrt Data Classification Security Monitoring & Attack Detection Key thing is to "Enable continuous compliance" ...maybe PCI-DSS for an e-comm.
  18. 18.  Diagnosing / Troubleshooting and Fixing production issues Change Management and Delivery Automate as much as possible with centralized management of Scripting etc Backup/restore : Always do test drills for them Don’t re-invent the wheel & try to Go with proven and solid technologies when you can Last :) Keep-on Re-architecting the infrastructure (may be small things) to optimize efficiency (every 6 months) ...learn from mistakes (yours/ others too :))
  19. 19. Questions if Any !! Ping Me on:-IRC /freenode/ : PiyushK ##infra-talkGtalk: piykumarTwitter @piykumar

×