초보자를 위한 분산 캐시 이야기


       charsyam@naver.com
       http://charsyam.wordpress.com
잉여서버개발자 In NHN
관심분야!!!

Cloud, Big Data


특기!!!
발표 날로 먹기!!!
What is Cache?
Cache is a component
that transparently stores data so

that future requests for that

data can be served faster.


                   In Wikipedia
Cache 는 나중에 요청올 결과를
미리 저장해두었다가 빠르게
서비스해 주는 것
Lots of
Data
Cache
Lots of
        Data


Cache
CPU Cache
Browser Cache
Why is Cache?
Use Case: Login
Use Case: Login
Common Case
Use Case: Login
  Common Case
         Read From DB
Select * from Users where id=‘charsyam’;
일반적인 DB구성
     Master

         REPLICATION/FailOver

     Slave
일반적인 DB구성
모든 Traffic은 Master 가 처리 Slave는 장애 대비
                Master

                     REPLICATION/FailOver

                 Slave
이러니깐 부하가!!!
선택의 기로!!!
Scale UP
   Vs
Scale OUT
Scale
 UP
초당 1000 TPS
초당 3000 TPS




3배 처리 가능한 서버를 투입
Scale
OUT
초당 1000 TPS
초당 2000 TPS
초당 3000 TPS
Scale Out을 선택!!!
Scale Out을 선택!!!
   돈만 많으면 Scale Up도 좋습니다.
Request 분석
읽기 70%, 쓰기 30%
분석 결론
읽기 70%,
읽기를 분산하자!!!
One Write
    Master
       +
Multi Read Slave
Client
     ONLY
     WRITE             Only READ
    Master

REPLICATION
     Slave     Slave         Slave
Eventual
Consistency
Ex) Replication
        Master

   REPLICATION
   Slave에 부하가 있거나, 다른 이유로, 복제가
   느려질 수 있다. 그러나 언젠가는 같아진다.
         Slave
Login 정보 중에 잠시
라도 서로 다르면 안되
는 경우는?
Login 정보 중에 잠시
라도 서로 다르면 안되
는 경우는?
=> 다시 처음으로!!!
초반에는 행복했습니다.
Client



    Master

REPLICATION
  Slave      Slave    Slave   Slave   Slave
부하가 더 커지니!!!
Client



   Master   REPLICATION

Slave Slave Slave   Slave Slave Slave

Slave Slave Slave   Slave Slave Slave
성능 향상이 미비함!!!
    WHY?
머신의 I/O는 Zero섬
Partitioning
Scalable Partitioning
              Client


   PART 1               PART 2
 Web Server            Web Server
   DBMS                  DBMS
Paritioning
성능
Paritioning
성능
관리이슈
Paritioning
성능
관리이슈
비용
간단한 해결책
DB 서버 Disk는 SSD
메모리도 데이터보다 많이
DB 서버 Disk는 SSD
메모리도 데이터보다 많이

=> 돈돈돈!!!
Why is Cache?
Use Case: Login
Use Case: Login

      Read From Cache

Get charsyam
Use Case: Login

Apply Cache For Read
General Cache Layer
                         Storage Layer
                             Cache
   Application   READ
                                WRITE

     Server                     UPDATE


                 WRITE
                             DBMS
Type 1: 1 Key – N Items
        KEY        Value
Profile:User_ID   Profile
                  - LastLoginTime
                  - UserName
                  - Host
                  - Name
Type 2: 1 Key – 1 Item
         KEY            Value
name:User_ID
                        UserName
LastLoginTime:User_ID   LastLoginTime
Pros And Cons
Type 1: 1 Key – N Items
         Pros: Just 1 get.
Pros And Cons
Type 1: 1 Key – N Items
         Pros: Just 1 get.
         Cons: if 1 item is
         changed. Need Cache
         Update
         And Race Condition
Pros And Cons
Type 2: 1 Key – 1 Item
        Pros: if 1 item is
        changed, just change
        that item.
Pros And Cons
Type 2: 1 Key – 1 Item
        Pros: if 1 item is
        changed, just change
        that item.
        Cons: Some Items can
        be removed
변화하는 데이터
변화하지 않는 데이터
변화하는 데이터

  Divide
변화하지 않는 데이터
Don’t Try Update After didn’t Read DB

  Value
 Profile
 - LastLoginTime
 - UserName
 - Host
 - Name
Don’t Try Update After didn’t Read DB

  Value            EVENT: Update Only LastLoginTime

 Profile
 - LastLoginTime
 - UserName
 - Host
 - Name
Don’t Try Update After didn’t Read DB

  Value            EVENT: Update Only LastLoginTime

 Profile           Read Cache: User Profile
 - LastLoginTime
 - UserName
 - Host
 - Name
Don’t Try Update After didn’t Read DB

  Value            EVENT: Update Only LastLoginTime

 Profile           Read Cache: User Profile
 - LastLoginTime
 - UserName
                   Update Data & DB
 - Host
 - Name
                   Just Save Cache
Don’t Try Update After didn’t Read DB

  Value               EVENT: Update Only LastLoginTime

 Profile              Read Cache: User Profile
 - LastLoginTime
 - UserName
                      Update Data & DB
 - Host
 - Name
                      Just Save Cache


           It Makes Race Condition
Need
Global Lock
Need
   Global Lock
=> 오늘은 PASS!!!
How to Test
Using Real Data
100,000 Request
   In 25 million User ID
Reproduct From Log
Test Result
Memcache VS Mysql
    136 vs 1613
     seconds
Result of
Applying Cache
Select Query
CPU utility
균등한 속도!!!
      부하 감소!!!
What is Hard?
Key Value – SYNC Easy
            and




  Fail!!!            1. Update To DB



                  2. Fail To Transaction
Key Value – SYNC HARD
            and




                   1. Update To DB



  Fail!!!         2. Update to Cache
Key Value – How
            and




                   1. Update To DB



  Fail!!!         2. Update to Cache


 RETRY                  BATCH          DELETE
Data!
The most important thing
If data is not important
        Cache Updating is not important

              Login Count
            Last Login Time
                 ETC
BUT!
If data is important
        Cache Updating is important

             Server Address
               Data Path
                  ETC
HOW!
RETRY!
Retry, Retry, Retry
Solve Over 9x%.
Or Delete Cache!!!
Batch!
Queuing Service
Error Log
  Queue       Batch
Error Log   Processor
Error Log
Error Log               Cache
Error Log               Server
Error Log
  Queue       Batch
Error Log   Processor
Error Log   Error Log
Error Log               Cache
                        Server
Error Log
  Queue       Batch        UPDATE
Error Log   Processor
Error Log
Error Log                Cache
                         Server
                        Error Log
Caches
Memcache
           Redis
Memcache
Atomic Operation
Memcache
Atomic Operation
         Key:Value
Memcache
Atomic Operation
          Key:Value
  Single Thread
Memcache
    Processing
 Over 100,000 TPS
Memcache
 Max size of item
       1 MB
Memcache
      LRU,
   ExpireTime
Redis
Key:Value
Redis
ExpireTime
Replication
Snapshot
Redis
Key:Value
Collection
 Hash   List   Sorted Set
주의 사항
Item Size
1~XXX KB,
Item 사이즈는 적을수록
좋다.
Cache
      In
Global Service
Memcached In Facebook
• Facebook and Google and Many Companies
• Facebook
  – 하루 Login 5억명(한달에 8억명)(최신, 밑에는 작년 2010/04자료)
  – 활성 사용자 7,000만
  – 사용자 증가 비율 4일에 100만명
  – Web 서버 10,000 대, Web Request 초당 2000만번
  – Memcached 서버 805대 -> 15TB, HitRate: 95%
  – Mysq server 1,800 대 Master/Slave(각각, 900대)
     • Mem: 25TB, SQL Query 초당 50만번
Cache In Twitter(Old)
        API       WEB
     Page Cache




        DB        DB
Cache In Twitter(NEW)
          API                   WEB
       Page Cache
   Fragment Cache

                    Row Cache
            Vector Cache
  DB                DB          DB
Redis In Wonga

Using Redis for DataStore
             Write/Read
Redis In Wonga

        Flash Client
        Ruby Backend
NoCache In Wonga
1 million daily Users
200 million daily HTTP Requests
NoCache In Wonga
1 million daily Users
200 million daily HTTP Requests
100,000 DB Operation Per Sec
40,000 DB update Per Sec
NoCache In Wonga
First Scale Out
First Setting – 3 Month
More Traffic
MySQL hiccups – DB Problems Began
Analysis About DBMS
ActiveRecord’s status Check caused
20% extra DB

60% of All Updates were done
on ‘tiles’ table
Applying Sharding 2xD
Result of Applying Sharding 2xD
Doubling MySQL
Result of Doubling MySQL
50,000 TPS on EC2
Wonga Choose Redis!
But some failure




 => 오늘은 PASS!!!
Result!!!
Consistent Hashing
Origin
     K = 10000            N=5


                           Server

                           Server

   User Request   Proxy    Server

                           Server

                           Server
FAIL : Redistribution about 2000 Users
       K = 10000                  N=4


                                   Server

                                   Server

     User Request     Proxy        Server

                                   Server

                                   Server
RECOVER:          Redistribution about 2500 Users
     K = 10000                          N=5


                                         Server

                                         Server

   User Request           Proxy          Server

                                         Server

                                         Server
Add A,B,C Server
                   A
Add A,B,C Server
                   A




                       B
Add A,B,C Server
                   A




                       B




                   C
Add Item 1
             A

                 1


                     B




             C
Add Item 2
             A

                     1


                         B




                 2
             C
Add Item 3,4,5
                     A
                 3
                             1
         4


                                     B




                                 5
                         2
                     C
Fail!! B Server
                      A
                  3
          4


                                  B




                              5
                          2
                      C
Add Item 1 Again -> Allocated C Server
                      A
               3
                                 1
         4


                                         B




                                     5
                             2
                      C
Recover B Server -> Add Item 1
                     A
              3
                                 1
         4


                                         B

                                         1
                                     5
                            2
                     C
Thank You!
Real Implementation
                           A
                   C+2         C+3

           B+3                       A+1


     A+4                                   B

      B+2

             C+1                     A+2

                     B+1       A+3
                           C

초보자를 위한 분산 캐시 이야기