Your SlideShare is downloading. ×
Scalability
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Scalability

1,057
views

Published on

Published in: Technology

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,057
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
23
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. SCALABILITY  Hodicska Gergely – Web Engineering Manager as Ustream  email: felho@ustream.tv  twitter: @felhobacsiNOW PLAYINGSCALABILITY – BME 1 May 7, 2012
  • 2. DEFINING SCALABILITY  It is not: – Performance  Easier to scale – HA  It is the ability to handle growing amount of work in a capable manner  Not just technology but people and processNOW PLAYINGSCALABILITY – BME 2 May 7, 2012
  • 3. SCALABILITY TYPES  Vertical – Bigger – Typically more expensive – Sometimes feasible (DB – SSD)  Horizontal – More – Typically we need thisNOW PLAYINGSCALABILITY – BME 3 May 7, 2012
  • 4. SCALABILITY RULES  KISS  Command and conquer  Approximate correctness  Shared nothingNOW PLAYINGSCALABILITY – BME 4 May 7, 2012
  • 5. DATABASE  Most tough to scale  Read -> Replication – Lag, cascading  Write -> Sharding – App logic, vertical vs. horizontal, key server  HA: Multi Master Replication – DRDB, MMM, MySQL clusterNOW PLAYINGSCALABILITY – BME 5 May 7, 2012
  • 6. DATABASE – MULTI MASTER MYSQL Virtual IP Master DB 1 20-30k req/sec Web cluster Virtual IP Load balancer Slave cluster cluster Master DB 2 Normal write Failover write Normal read Normal replication Failover replicationNOW PLAYINGSCALABILITY – BME 6 May 7, 2012
  • 7. DATABASE – BACKUP STRATEGY • MAATKIT Master DB 1 Delayed backup • Slave cluster Master DB 2 • PERCONA XTRABACKUP Offsite backupNOW PLAYINGSCALABILITY – BME 7 May 7, 2012
  • 8. NOSQL  CAP theorem (availability, consistency, partition tolerance -> eventual consistency)  Diverse world  Automatic partitioning, sharding, elasticity  Transparent for the application  Extendable without downtime  Fault tolerant  Redis, Riak, Voldemort, Cassandra, CouchBaseNOW PLAYINGSCALABILITY – BME 8 May 7, 2012
  • 9. CACHING  Strategies – Write-through cache – Write-back cache – Implicit/explicit invalidation  Consistent hashing  RestartNOW PLAYINGSCALABILITY – BME 9 May 7, 2012
  • 10. MEMCACHE  Local vs. remote  LRU  Storing lists  Versioning  Race condition (cas)  Object sizeNOW PLAYINGSCALABILITY – BME 10 May 7, 2012
  • 11. APP LEVEL CACHE  Code  Shared memory  APC, Ehcache  (Opcode cache)NOW PLAYINGSCALABILITY – BME 11 May 7, 2012
  • 12. HTTP LEVEL CACHING  Cache-Control header – Local vs. proxy  Static versioning  Huge expire timeNOW PLAYINGSCALABILITY – BME 12 May 7, 2012
  • 13. CACHING Web server Highlights Lightwght Request kernel Response  DRSA: globally distributed reverse proxy to serve the content to the users Running Cached from a geographically close server Internet application response? users  Cache servers: we try to cache as much Browser requests on these servers as possible cache Cached to offload our web cluster data? Load data  Browser cache: all of our static assets Varnish cache are automatically versioned for optimal server serving (using huge expire times) Local Image NginX + Varnish Custom  Application level: caching whole pages cache PERL to avoid running all the code or the Distributed pieces of data to offload the database Remote Site Accelerator  Our framework automatically package and compress the JS and CSS files to Other Disk Mogile FS reduce the number of HTTP requests Static serverNOW PLAYINGSCALABILITY – BME 13 May 7, 2012
  • 14. STATIC CONTENT  NFS (mount problems)  DB  MogileFS  Authentication, access control – Perlbal  FS limitations  Low hit ratioNOW PLAYINGSCALABILITY – BME 14 May 7, 2012
  • 15. LOAD BALANCING  Dedicated hardware vs. software based – Price  HA proxy  Nginx (HTTPS termination)  LVS (direct routing)  DNS loadbalance, Anycast (multi site)  Layer 4 vs. 7NOW PLAYINGSCALABILITY – BME 15 May 7, 2012
  • 16. SESSION  Sticky (shared)  Centralized – DB / NoSQL – Memcached  Cookie – Sysop will like youNOW PLAYINGSCALABILITY – BME 16 May 7, 2012
  • 17. ASYNCHRONOUS OPERATIONS  Decoupling  Capacity handling  Node.js  Jobs – Gearman  Message queue – Q4M, RabbitMQ, ZeroMQNOW PLAYINGSCALABILITY – BME 17 May 7, 2012
  • 18. CHALLENGES OF THE WEB STACK  Web requests: – 200 requests / sec / server (peak)  Cache server requests: – 10,000 requests / sec / server (peak)  Social stream requests: – 15,000 requests / sec / server (peak)  Database requests: – 25,000 requests / sec / server (peak)NOW PLAYINGSCALABILITY – BME 18 May 7, 2012
  • 19. WEB ARCHITECTURE DMZ SOLR SOLR Site Search cluster API GW SCM/Monitor Distributed 3rd party IRC cluster Remote Site Job manager Accelerator Web cluster Varnish Memcache NginX Load balancer Internet MogileFS users Storage Static cluster Site MQ API Main Social stream GW Logs, Stats cluster Facebook MMM Monitor Memcache Slave DB cluster Database clusterNOW PLAYINGSCALABILITY – BME 19 May 7, 2012
  • 20. SOCIAL STREAM: UNDER THE HOOD Highlights  Generated 2.5M visits in the last 30 Social Stream UI Message days (the 20% of this is new visitor) storage  0.8-1.1M messages per day Web cluster  Justin Bieber has ~230k messages per day  Jonas Brothers concert: 10k messages per minute in peak  Peak: Q4M message queue Varnish/Nginx/ – 5k new connection / sec Redis slave – 15k requests / sec – 600 Mbit / sec Redis master Twitter Facebook AIM MySpace Mixi Approval bufferNOW PLAYINGSCALABILITY – BME 20 May 7, 2012
  • 21. DEVELOPMENT BEST PRACTICES  Continuous integration – Automated builds – Unit tests – Acceptance test  TDD, (ADD, FDD ;))  Abstract branching (feature switch)  Code review, pair programming, topic experts  DevOps cultureNOW PLAYINGSCALABILITY – BME 21 May 7, 2012
  • 22. DEVOPS TOOLING  Provisioning  Configuration management (cfengine, chef, puppet)  Application deployment (capistrano, fabric)  Orchestrator (mcollective)  Monitoring (system/application level) – Nagios, Munin, Cacti, Graphite etc.  Supervisors (monit, god)  Log management/analysisNOW PLAYINGSCALABILITY – BME 22 May 7, 2012
  • 23. DEVELOPMENT BEST PRACTICES  Visualizing – Graphite (StatsD, logster) – Custom dashboards with KPIs, alerting  Runtime vs. build time  Automated code deployment  Load testing (automated better)  MVCNOW PLAYINGSCALABILITY – BME 23 May 7, 2012
  • 24. CONFIGURATION MANAGEMENT  Automation  Versioning  Accountability  Chef, Puppet – Same local dev environmentNOW PLAYINGSCALABILITY – BME 24 May 7, 2012
  • 25. CLOUD ENVIRONMENTS  Amazon, Google App Engine etc.  Early stage  Cost savings  Backup for peaksNOW PLAYINGSCALABILITY – BME 25 May 7, 2012
  • 26. ORGANIZATIONAL BEST PRACTICES  Architect board  Scrum of scrum  Product board  WESK (who else should know)  Internal demosNOW PLAYINGSCALABILITY – BME 26 May 7, 2012
  • 27. SOURCE CODE MANAGEMENT • • • • • •NOW PLAYINGSCALABILITY – BME 27 May 7, 2012
  • 28. MONITORING Highlights Tools  Proprietary dashboard to • Munin: Several custom plugins oversee key system • Cacti: Mainly for network devices performance charts in one • Nagios: More than 1200 location checks, Active checks  Real-time information about • Monit: Ensuring that a given process streaming related runs and it doesn’t consume too much servers, web/cache resources, Active checks servers, database servers • Query watchdog: Automatically stops  Summary of the Nagios checks and reports excessive read queries  Ability to roll back for historic charts  Provides shortcut to system toolsNOW PLAYINGSCALABILITY – BME 28 May 7, 2012
  • 29. WHERE TO IMPROVE  English  Enjoy programming  Soft skills (communication, team working, presentation, cooperation, management, leadership, time management etc.)  Agile development methods (Scrum, Kanban)  Continuous learning (blogs, books, conferences, code)  Craftsmanship – Clean Coder, Agile Software Development, Martin Fowler books / signature series, The Mythical Man-Month, Code Complete, The Pragmatic Programmer, Peopleware, The Passionate ProgrammerNOW PLAYINGSCALABILITY – BME 29 May 7, 2012
  • 30. WHERE TO IMPROVE  Network programming, protocols  Algorithms  DHT  Database  Big Data (Hadoop, HBase, Hive/Pig, BI tools etc.)  API design (REST, SOAP, oAuth etc.)  Playing with open source tools of big companies (Twitter, FB, Linkedin)  Blogging, taking part in open source projects  Learning different type of programming (e.g. functional)NOW PLAYINGSCALABILITY – BME 30 May 7, 2012
  • 31. THANK YOU Questions?NOW PLAYINGSCALABILITY – BME 31 May 7, 2012
  • 32. NOW PLAYINGSCALABILITY – BME 32 May 7, 2012