Successfully reported this slideshow.

Scalability

1,495 views

Published on

Published in: Technology
  • Be the first to comment

Scalability

  1. 1. SCALABILITY  Hodicska Gergely – Web Engineering Manager as Ustream  email: felho@ustream.tv  twitter: @felhobacsiNOW PLAYINGSCALABILITY – BME 1 May 7, 2012
  2. 2. DEFINING SCALABILITY  It is not: – Performance  Easier to scale – HA  It is the ability to handle growing amount of work in a capable manner  Not just technology but people and processNOW PLAYINGSCALABILITY – BME 2 May 7, 2012
  3. 3. SCALABILITY TYPES  Vertical – Bigger – Typically more expensive – Sometimes feasible (DB – SSD)  Horizontal – More – Typically we need thisNOW PLAYINGSCALABILITY – BME 3 May 7, 2012
  4. 4. SCALABILITY RULES  KISS  Command and conquer  Approximate correctness  Shared nothingNOW PLAYINGSCALABILITY – BME 4 May 7, 2012
  5. 5. DATABASE  Most tough to scale  Read -> Replication – Lag, cascading  Write -> Sharding – App logic, vertical vs. horizontal, key server  HA: Multi Master Replication – DRDB, MMM, MySQL clusterNOW PLAYINGSCALABILITY – BME 5 May 7, 2012
  6. 6. DATABASE – MULTI MASTER MYSQL Virtual IP Master DB 1 20-30k req/sec Web cluster Virtual IP Load balancer Slave cluster cluster Master DB 2 Normal write Failover write Normal read Normal replication Failover replicationNOW PLAYINGSCALABILITY – BME 6 May 7, 2012
  7. 7. DATABASE – BACKUP STRATEGY • MAATKIT Master DB 1 Delayed backup • Slave cluster Master DB 2 • PERCONA XTRABACKUP Offsite backupNOW PLAYINGSCALABILITY – BME 7 May 7, 2012
  8. 8. NOSQL  CAP theorem (availability, consistency, partition tolerance -> eventual consistency)  Diverse world  Automatic partitioning, sharding, elasticity  Transparent for the application  Extendable without downtime  Fault tolerant  Redis, Riak, Voldemort, Cassandra, CouchBaseNOW PLAYINGSCALABILITY – BME 8 May 7, 2012
  9. 9. CACHING  Strategies – Write-through cache – Write-back cache – Implicit/explicit invalidation  Consistent hashing  RestartNOW PLAYINGSCALABILITY – BME 9 May 7, 2012
  10. 10. MEMCACHE  Local vs. remote  LRU  Storing lists  Versioning  Race condition (cas)  Object sizeNOW PLAYINGSCALABILITY – BME 10 May 7, 2012
  11. 11. APP LEVEL CACHE  Code  Shared memory  APC, Ehcache  (Opcode cache)NOW PLAYINGSCALABILITY – BME 11 May 7, 2012
  12. 12. HTTP LEVEL CACHING  Cache-Control header – Local vs. proxy  Static versioning  Huge expire timeNOW PLAYINGSCALABILITY – BME 12 May 7, 2012
  13. 13. CACHING Web server Highlights Lightwght Request kernel Response  DRSA: globally distributed reverse proxy to serve the content to the users Running Cached from a geographically close server Internet application response? users  Cache servers: we try to cache as much Browser requests on these servers as possible cache Cached to offload our web cluster data? Load data  Browser cache: all of our static assets Varnish cache are automatically versioned for optimal server serving (using huge expire times) Local Image NginX + Varnish Custom  Application level: caching whole pages cache PERL to avoid running all the code or the Distributed pieces of data to offload the database Remote Site Accelerator  Our framework automatically package and compress the JS and CSS files to Other Disk Mogile FS reduce the number of HTTP requests Static serverNOW PLAYINGSCALABILITY – BME 13 May 7, 2012
  14. 14. STATIC CONTENT  NFS (mount problems)  DB  MogileFS  Authentication, access control – Perlbal  FS limitations  Low hit ratioNOW PLAYINGSCALABILITY – BME 14 May 7, 2012
  15. 15. LOAD BALANCING  Dedicated hardware vs. software based – Price  HA proxy  Nginx (HTTPS termination)  LVS (direct routing)  DNS loadbalance, Anycast (multi site)  Layer 4 vs. 7NOW PLAYINGSCALABILITY – BME 15 May 7, 2012
  16. 16. SESSION  Sticky (shared)  Centralized – DB / NoSQL – Memcached  Cookie – Sysop will like youNOW PLAYINGSCALABILITY – BME 16 May 7, 2012
  17. 17. ASYNCHRONOUS OPERATIONS  Decoupling  Capacity handling  Node.js  Jobs – Gearman  Message queue – Q4M, RabbitMQ, ZeroMQNOW PLAYINGSCALABILITY – BME 17 May 7, 2012
  18. 18. CHALLENGES OF THE WEB STACK  Web requests: – 200 requests / sec / server (peak)  Cache server requests: – 10,000 requests / sec / server (peak)  Social stream requests: – 15,000 requests / sec / server (peak)  Database requests: – 25,000 requests / sec / server (peak)NOW PLAYINGSCALABILITY – BME 18 May 7, 2012
  19. 19. WEB ARCHITECTURE DMZ SOLR SOLR Site Search cluster API GW SCM/Monitor Distributed 3rd party IRC cluster Remote Site Job manager Accelerator Web cluster Varnish Memcache NginX Load balancer Internet MogileFS users Storage Static cluster Site MQ API Main Social stream GW Logs, Stats cluster Facebook MMM Monitor Memcache Slave DB cluster Database clusterNOW PLAYINGSCALABILITY – BME 19 May 7, 2012
  20. 20. SOCIAL STREAM: UNDER THE HOOD Highlights  Generated 2.5M visits in the last 30 Social Stream UI Message days (the 20% of this is new visitor) storage  0.8-1.1M messages per day Web cluster  Justin Bieber has ~230k messages per day  Jonas Brothers concert: 10k messages per minute in peak  Peak: Q4M message queue Varnish/Nginx/ – 5k new connection / sec Redis slave – 15k requests / sec – 600 Mbit / sec Redis master Twitter Facebook AIM MySpace Mixi Approval bufferNOW PLAYINGSCALABILITY – BME 20 May 7, 2012
  21. 21. DEVELOPMENT BEST PRACTICES  Continuous integration – Automated builds – Unit tests – Acceptance test  TDD, (ADD, FDD ;))  Abstract branching (feature switch)  Code review, pair programming, topic experts  DevOps cultureNOW PLAYINGSCALABILITY – BME 21 May 7, 2012
  22. 22. DEVOPS TOOLING  Provisioning  Configuration management (cfengine, chef, puppet)  Application deployment (capistrano, fabric)  Orchestrator (mcollective)  Monitoring (system/application level) – Nagios, Munin, Cacti, Graphite etc.  Supervisors (monit, god)  Log management/analysisNOW PLAYINGSCALABILITY – BME 22 May 7, 2012
  23. 23. DEVELOPMENT BEST PRACTICES  Visualizing – Graphite (StatsD, logster) – Custom dashboards with KPIs, alerting  Runtime vs. build time  Automated code deployment  Load testing (automated better)  MVCNOW PLAYINGSCALABILITY – BME 23 May 7, 2012
  24. 24. CONFIGURATION MANAGEMENT  Automation  Versioning  Accountability  Chef, Puppet – Same local dev environmentNOW PLAYINGSCALABILITY – BME 24 May 7, 2012
  25. 25. CLOUD ENVIRONMENTS  Amazon, Google App Engine etc.  Early stage  Cost savings  Backup for peaksNOW PLAYINGSCALABILITY – BME 25 May 7, 2012
  26. 26. ORGANIZATIONAL BEST PRACTICES  Architect board  Scrum of scrum  Product board  WESK (who else should know)  Internal demosNOW PLAYINGSCALABILITY – BME 26 May 7, 2012
  27. 27. SOURCE CODE MANAGEMENT • • • • • •NOW PLAYINGSCALABILITY – BME 27 May 7, 2012
  28. 28. MONITORING Highlights Tools  Proprietary dashboard to • Munin: Several custom plugins oversee key system • Cacti: Mainly for network devices performance charts in one • Nagios: More than 1200 location checks, Active checks  Real-time information about • Monit: Ensuring that a given process streaming related runs and it doesn’t consume too much servers, web/cache resources, Active checks servers, database servers • Query watchdog: Automatically stops  Summary of the Nagios checks and reports excessive read queries  Ability to roll back for historic charts  Provides shortcut to system toolsNOW PLAYINGSCALABILITY – BME 28 May 7, 2012
  29. 29. WHERE TO IMPROVE  English  Enjoy programming  Soft skills (communication, team working, presentation, cooperation, management, leadership, time management etc.)  Agile development methods (Scrum, Kanban)  Continuous learning (blogs, books, conferences, code)  Craftsmanship – Clean Coder, Agile Software Development, Martin Fowler books / signature series, The Mythical Man-Month, Code Complete, The Pragmatic Programmer, Peopleware, The Passionate ProgrammerNOW PLAYINGSCALABILITY – BME 29 May 7, 2012
  30. 30. WHERE TO IMPROVE  Network programming, protocols  Algorithms  DHT  Database  Big Data (Hadoop, HBase, Hive/Pig, BI tools etc.)  API design (REST, SOAP, oAuth etc.)  Playing with open source tools of big companies (Twitter, FB, Linkedin)  Blogging, taking part in open source projects  Learning different type of programming (e.g. functional)NOW PLAYINGSCALABILITY – BME 30 May 7, 2012
  31. 31. THANK YOU Questions?NOW PLAYINGSCALABILITY – BME 31 May 7, 2012
  32. 32. NOW PLAYINGSCALABILITY – BME 32 May 7, 2012

×