Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Redis trouble shooting_eng

1,692 views

Published on

Redis trouble shooting

Published in: Technology
  • Be the first to comment

Redis trouble shooting_eng

  1. 1. Redis Trouble Shooting Clark.kang charsyam@naver.com
  2. 2. Contents • Redis – Single Threaded • Redis Trouble Shooting • Redis Security Issue
  3. 3. Single Threaded #1 Client #1 Client #2 …… Client #N Redis Event Loop I/O Multiplexing Process Command Packet #1 Packet #2
  4. 4. Single Threaded #2 • Process only one command at once • If you run long processed command, all other commands are pending. – Keys, flushall, flushdb, lua script, MULTI/EXEC • Redis uses some other threads, but it is only for avoiding fsync call.
  5. 5. How slow? Command Item Count Time flashall 1,000,000 1000ms(1 second)
  6. 6. Recommanded Redis Version #1 • Lastest Stable Version – 3.0.x is also good. – Using after 2.8.13 • There are some differences depending on the version of redis – config set client-output-buffer-limit is accepted with redis-cli in 2.6.x – config set client-output-buffer-limit couldn’t use some expression like 1GB, 1MB in 2.8.20
  7. 7. Memeory Fragmentation #1 • Previous Redis Version that using Jemalloc 3.6.0 – Redis uses just 2.4G but rss is 12G in 2.8.6
  8. 8. Memeory Fragmentation #2 • Redis version that using after Jemalloc 3.6.0 – 2.8.20 shows less difference Mem Usages and RSS – But you should check RSS.
  9. 9. Recommanded Client(For Management) • Using redis-cli – It is best. • You can use telent also – Redis support inline command – Twemproxy doesn’t support inline command – You can’t use some command in old versions.
  10. 10. If you support Service Team to use Redis • Check They want cache or Store – If it is for cache, Turn off SAVE option – Even it is for store, Give proper value for SAVE • Using Multiple Redis Instances in one physical server. • Using maxmemory option.
  11. 11. Using Multiple Redis Instances. • CPU 4 core 32G Memory – 3 redis Instances each 8G is better than one 26G Redis Mem: 26G Mem: 8G Mem: 8G Mem: 8G
  12. 12. Replication
  13. 13. Redis Replication • Redis is Single Thread – Fork for Replication • Supported Chained replication – Not supported Multi-Master – To check RSS for Replication
  14. 14. Replication •Support Chained Replication Master 1st Slave 2nd Slave 1st slave is master of 2nd slave
  15. 15. Replication Master Slave replicationCron Health check
  16. 16. Replication Master Slave replicationCron Health check
  17. 17. Replication Master Slave replicationCron When master reruns, Resync with Master
  18. 18. Mistake: Replication Master Slave replicationCron Slave will has no data after resyncing If master has no data.
  19. 19. Persistent
  20. 20. RDB/AOF • RDB/AOF are not related. • RDB – Snapshot of current memory status – Fork and dump its memory to disk – In Write Heavy System. It can use double memory • AOF – Save update(create, update, delete) commands to disk after event loop as redis protocol. – Disk Sync option affects the performance(default: everysec) – Less Disk IO compared to RDB(Except AOF rewriting)
  21. 21. If you turn off persistent. But You can’t avoid fork (migration)
  22. 22. Migration
  23. 23. Migration Order 1. Prepare New instance(as B) for old instance(as A) 2. Send “slaveof A ip A port” to B 3. Wait to finish replication 1. Fork and using more memory 4. Turn on writable option for B 1. Config set slave-read-only no 5. Connect clients to B 6. Send “slaveof no one” to B
  24. 24. Partial Sync
  25. 25. Redis Replication Mechanism 1. Slave sends sync command to Master 2. Master forks and create RDB 3. After RDB creation, Master sends RDB data to slave 4. While sending RDB data, Master saves new commands into memory buffer 5. After sending RDB, slave starts loading RDB 6. After loading RDB, Master sends memory buffered data to slave
  26. 26. Problem of Redis Repliaction • When connection is broken between master and slave – Slave try to start FULL sync. – But Full sync is very expensive.
  27. 27. Partial Sync • If there are some small difference that can be recoverd by memory buffer. – Slave can request “PSYNC” – And master just send small memory buffer to slave. – And finish syncing. • But if master is changed as another server. – Only FULL sync is possible.
  28. 28. Trouble Shooting
  29. 29. T Service • Condition – Only for cache • Redis Configuration – stop-writes-on-bgsave-error yes • Failure – Write is forbidden after RDB creating failure – Read are OK. • Solution – Config set stop-writes-on-bgsave-error no
  30. 30. G Service • condition – Only for cache – Using default options • Redis Conf – SAVE 900 1 – SAVE 300 10 – SAVE 60 10000 • Failure – Performance degradation because of Much Disk IO in short time • Solution – config set SAVE “” – Removing SAVE option
  31. 31. S Service • Condition – Some cache and some storing data – Using one instance and using 28GB data in 32GB machine – And It has disk failure also. • Failure – Latency is high because of Using Swap memory – It spends much time for creating RDB • Normally, it takes 5~6 minutes for 10G memory • It took over 8 hours of dumping 28G • Solution – Droping server – Sometime, it is better to drop data than dragging on failure.
  32. 32. P Service #1 • Condition – Using AOF for store – 8 instances in 256GB each instance using 26GB • Failure – All 8 redis instances tried to start AOF Rewrite – Much Disk IO and Using much memory – They are start to service same time. So AOF rewriting timing also similar • Solution – Stop AOF Rewrite and manage it with batch
  33. 33. P Service(#2) – Not actually Failure • Condition – All Redis Master/Slave servers connection are broken because of Network issue. • Failure Possibility – If network is recoverd, all redis slaves will try to sync with master – All redis masters will fork and using much memory. – It can trigger Big failure of Service • Solution – Disconnection all M/S connection using “slaveof no one” – And recovering network. – And make replication connection one by one in one physical serve.
  34. 34. Replication Faiulre • Condition – 20GB data – Some write operations • Failure – Failing of Master/Slave Replication. • Solution – Checking client-output-buffer-limit – Default “client-output-buffer-limit slave 256mb 64mb 60” • Hard limit 256mb • Soft limit 64mb for 60 seconds. – Default is ok for 10G data, but If you use 20G Data • Increment it as 512mb or 1024mb
  35. 35. Redis Monitoring
  36. 36. Redis Monitoring Name Host or Redis(info) CPU Usage, Load Host Network inbound, outbound Host 현재 클라이언트 개수, max client 설정 Redis 키 개수, 명령어 처리 수 Redis 메모리 사용량, RSS Redis Disk 사용량, io Host Expired keys, Evicted keys Redis
  37. 37. Redis Security Issue
  38. 38. Redis Security is very weak. • ACL – Not supported – Never open redis port to public • only use redis in private network – Don’t run redis as root.
  39. 39. Redis hacking • Redis port is opened for public – Config set dir “/root/.ssh” – Config set dbfilename “authorized_keys” – Save • So user can use this server as root.
  40. 40. Thank you.

×