Successfully reported this slideshow.

Odnoklassniki.ru Architecture

23,509 views

Published on

Published in: Technology
  • Be the first to comment

Odnoklassniki.ru Architecture

  1. 1. Architecture: Surviving the High Load .пятница, 6 мая 2011 г.
  2. 2. Who we are ? Alexander Chinaryov Lead Platform Developer Since 2007 Alexander Hristoforov Lead Platform Developer Since 2009 Oleg Anastasyev Lead Platform Developer Since 2007пятница, 6 мая 2011 г.
  3. 3. Load : some facts 2,8M users online 150k pages/s, 50ms avg 32Gbit/s out • 4000 Messages/s (> # twits) • 160k Photo downloads/s • 500 Comments/s • 90 000 notifications/s • 1500/s feed posts, 30k/s getsпятница, 6 мая 2011 г.
  4. 4. Load: handled by • 3 Datacenters • 2400 servers & storages (and counting) • 1.5M SLOC (99.9% java, 0.1%C) • 60 modules • 40 devs + 8 testers • 20 adminsпятница, 6 мая 2011 г.
  5. 5. Arch: layers • 150+50 webs • 120 app srvs • 25 kinds business services • 6 SSO • >100 caches • 230 SQLs • >400 noSQLпятница, 6 мая 2011 г.
  6. 6. Load: Balance • LVS • One-cluster – Weighted RR – Pluggable Failure detectors – Integrated with one-remote-service – Locality groupsпятница, 6 мая 2011 г.
  7. 7. Arch: Presentation • Apache Tomcat 6 • RDK framework: – GUI components – Independant portlets – AJAX update → no full page – No javascript required • Google Web Toolkit for Dynamics – Toolbar, Photo pins, gifts • Flash (Apps, players, ads)пятница, 6 мая 2011 г.
  8. 8. Arch: Business Logic • Odnoklassniki-ejb – JBoss 4.2 – JTA, Stateless, Entity beans (BMP) – Business Op handling & orchestration – Event/handler pattern – Component logic – Data partitioning – Spring (DI)пятница, 6 мая 2011 г.
  9. 9. Arch: Business Srvcs • IM, discussions, feeds – JBoss Remoting 2.2 – One remote service – 100k+ req/sec on recent 8 core CPU /** * Ex. of Remote server */ public interface Server extends RemoteService { @RemoteMethod IListChunk<Friend> getFreshMyFriends(@PartitionSource long userId, IChunkProperties cp); @RemoteMethod(invokeAll=true,split=true,reduceStrategy=ListReduceStrategy.class) List<?> mapReduceMethod(@PartitionSource long userId, ... ); @RemoteMethod(invokeAll=true,asyncMaxDelay=1000L,asyncMaxBatch=100) void asyncNotify(@PartitionSource long userId, ... ); }пятница, 6 мая 2011 г.
  10. 10. Arch: Caches • one-graph – Social graph storage – 30Gb, 17K ops/server 7%CPU … • Odnoklassniki-cache – users, groups, photos,sessions... – Smart – Off heap (Unsafe) → no FGC • Near cacheпятница, 6 мая 2011 г.
  11. 11. Arch: Persistance • MS SQL 2005 – High Consistency – Flexible queries • NoSQL: one-db – Berkley 4.5 C edition + – JBoss remoting based server + – Simple querying = – noSQL storage server • … and others are in researchпятница, 6 мая 2011 г.
  12. 12. Concept: DB Partitioning • DB scaling is hard & expensive • Vertical • Horizontal • ID: – long ID = uid << 8 + domain – Domain = 0..255 – Domain → servers mapпятница, 6 мая 2011 г.
  13. 13. Perf : SQL DB • XA → local TA only • Dirty reads • DB JOIN → app server memory • FK, SP, Triggers • DELETE : – No delete/insert workflow → update – Async batch process, retry • Indexes, clustered indexesпятница, 6 мая 2011 г.
  14. 14. Perf: general • Seq Access speed: – RAM 10x > SSD 1.5x > 1Gbit eth comm 2x > disk • Random Access speed: – RAM 20000x (~50ns) > SSD 5-10x > disk (~5ms) – Net roundtrip ~ 0.5 ms • So: – Near data/cache – fastest solution ( cache coherence problem ) – Partitioned network cache – Database access is the slowest thing • Still you have to sacrifice consistencyпятница, 6 мая 2011 г.
  15. 15. Surviving : GC • Young GC → high CPU load – Too much garbage (autoboxing, overlooked log.debug,...) – FIX: find and fix code → can take weeks • Old GC → pauses → carousel – 2-4Gb is limit for ParallelGC ( 1-4 secs ) – 8-10 Gb is limit for CMS • and it still can stop the world! – FIX: use Unsafe (offheap memory) or partition • Perm GC → pauses → carousel again – Too much .classes – FIX: +CMSClassUnloadingEnableпятница, 6 мая 2011 г.
  16. 16. Surviving: failures • SQL partition failure – FIX: fault tolerance: read incomplete, write fail • One-db – Non stable replication → no fix :-( – Data corruption → separate ids storage – Random disk access → SSD, tmpfsпятница, 6 мая 2011 г.
  17. 17. Surviving: carousel • Reasons: – Net problems – Unusual activity, spammers – Full GCs – Cold caches – Unexpected slowdowns, bugs – Activity growth • Fixes: – Timeout = 3s – Client side automatic fail detectors, server cutout – Gatekeepersпятница, 6 мая 2011 г.
  18. 18. Surviving: gatekeepers • Fine grain func switches • Used for: – Fighting with carousel – Smooth new functions launch – Experiments • Can: – Turn on/off specific func, individual 3rd party games – On per server basis – On per user domainпятница, 6 мая 2011 г.
  19. 19. Surviving: measure! • One-log statisticsпятница, 6 мая 2011 г.
  20. 20. Thank you Questions ? We are hiring jobs@forticom.comпятница, 6 мая 2011 г.
  21. 21. Test yourself ;-) • PhotoMarks table PhotoId:long UserId:long Mark:byte timestamp – 32p x (500M rows, 42 Gb data + 25 Gb index) – Load (photoId, userId): 14kops, create: 1500kpos – Most load calls are check for row absence • Rejected apriori – Add more SQL nodes – too expensive – Place all marks to cache – 2600Gb RAM is not cheap as wellпятница, 6 мая 2011 г.

×