# 
Kanggil Lee 
Senior Software Engineer 
Samsung Electronics
# 
Kanggil Lee is the senior S/W Engineer in Samsung Electronics. 
He administrates ALM systems, especially Perforce servers in 
Mobile Communications Business. 
He is in charge of deploying Perforce to globally distributed 
Samsung's R&D centers as well as HQ. He managed to 
configure the world largest transactional and 24/7 sleepless 
Perforce server. Before joining Perforce team, he used to work 
as an administrator of IBM Rational products in Samsung.
# 
• Perforce at Samsung 
• Optimizing Perforce Replication 
• Lockless Reads 
• Monitoring and Maintenance 
• Future
#
# 
• All mobile projects are in Perforce, and most are 
android platform. 
• We have 15 master servers and provides 19 
perforce services. 
• ~30 overseas R&D centers. 
• Most of our users use P4V. ( > 80%)
# 
• Primary Server(Android 4.4.x ~) 
• 4 CPUs(physically 32cores), 1.5TB Memory, Linux(RHEL) 
• Metadata, Journal and Logs on Flash Arrays 
• Depot on Spinning disk(15k, RAID1+0, DAS) 
• > 7TB metadata(reclaim space up to 500GB/week) 
• > 10,000 users 
• ~ 2.6 million submitted changelists since Nov. 2013 
• ~ 8 million commands/day 
• ~ 6.5k commits/day
# 
• Other Perforce Servers( ~Android 4.3.x, etc.) 
• ~ 14 smaller servers, with between 10GB-3TB metadata 
each 
• Metadata, Journal and Logs on Flash Arrays or Spinning 
disk(15k, RAID1+0, DAS) 
• Depot on Spinning disk(15k, RAID1+0, DAS) 
• 5 Read-Only Replicas 
• > 30 Build-Farm Replicas worldwide 
• 90 proxies worldwide
#
# 
• ~ 5 hours replication lag on overseas build replicas 
– ~ 25GB journal / hour on our primary server 
– ~ 500ms latency (HQ <-> Brazil) 
• What we have done 
– Filter db.have by using –T flag 
• db.have is over 90% of each journal 
– Add QoS rules for perforce traffic 
– Expand network bandwidth 
– Set net.keep.x variables
# 
• ~ 35 mins replication lag on all build replicas 
– ~ 4GB journal / each p4 populate command 
– Break builds(Proof/Release) 
• What we have done 
– Set rpl=2 in order to profile database locking activity 
– Filter db.integed by using –T flag 
• Most lock time is due to holding write-lock on db.integed 
• < 30s (70x faster)
#
# 
• H/W upgrade is essential. 
– Flash Arrays, Fast CPU, Huge Memory, 10G NIC etc. 
• However, It can Not guarantee best performance 
all the time due to lock contention. 
– Sync commands block major write commands 
• dm-CommitSubmit, shelve, unshelve, submit, edit, populate 
– Integ commands block more write commands 
• dm-SubmitChange, revert, shelve, dm-CommitSubmit, 
change, submit, edit, resolve, reopen, add, sync, delete
# 
• Upgraded 2013.3 and set db.peeking=3 
– Upgraded all replica servers in April. 
• Reduce replication lag 
– Upgraded our primary server on Jun. 29th 
• Eliminate lock contention caused by sync and integ 
commands. 
• Shows significant performance improvements
# 
130,000 
120,000 
110,000 
100,000 
90,000 
80,000 
70,000 
60,000 
50,000 
40,000 
30,000 
20,000 
10,000 
0 
1 2 4 8 16 32 64 128 256 512 
Commands affected 
Commands delayed 
Feb. 
April 
July 
(sec)
# 
• Top 20 commands which blocked other commands 
on the primary server for the last 30days.
# 
700 
650 
600 
550 
500 
450 
400 
350 
300 
250 
200 
150 
100 
50 
0 
Feb. JUN JULY 
time(ms) 
P4 DB Write lock Activity 
write-wait 
write-held 
100 
95 
90 
85 
80 
75 
70 
65 
60 
55 
50 
Feb. JUN JULY 
% 
Command ratio (lapse < 1s) 
submit
# 
13 
12.5 
12 
11.5 
11 
10.5 
10 
9.5 
9 
8.5 
8 
7.5 
7 
6.5 
6 
5.5 
5 
4.5 
4 
3.5 
3 
2.5 
2 
1.5 
1 
0.5 
0 
Feb. JUN JULY 
time(sec) 
Avg. Lapse 
submit 
shelve 
unshelve 
edit 
revert
#
# 
• Run P4HealthCheck every 60 seconds 
– check connection, p4d status, p4broker status 
– send email and SMS if status is not ok. 
• Run Replica Gap checker every 10 minutes 
– check replication lag 
– send email if gap is over 10 minutes.
# 
• Monitoring Perforce Severs
# 
• Monitoring Perforce DB Lock
# 
• Profiling locking activities 
– run a query and select any fields you are interested in
# 
• Monitoring Perforce Commands
# 
• Monitoring Perforce Sync command
# 
• Monitoring Errors(max-values)
# 
• Maintain perforce db.* files, journal and logs 
– create and restore checkpoint files/week 
– db rebuild/week 
– unload clients and labels/day 
– rotate and replay journals(logs)/hour 
– recover the database 
• Set up Replica servers 
• Verify depots/day
#
#
#
# 
• Upgrade our primary server in October 
– 4 CPUs(physically 40cores), 3.0TB memory 
• P4d/P4broker upgrades (r13.3 -> r14.x) 
• Shared Archive or Depot on Flash Arrays(dedup) 
• Server consolidation 
– 14 other servers -> 7 other servers 
– Setup read-only replicas for all perforce services
# 
Kanggil Lee 
kanggil.lee@samsung.com

Supporting Android-based Platform Development in Samsung

  • 1.
    # Kanggil Lee Senior Software Engineer Samsung Electronics
  • 2.
    # Kanggil Leeis the senior S/W Engineer in Samsung Electronics. He administrates ALM systems, especially Perforce servers in Mobile Communications Business. He is in charge of deploying Perforce to globally distributed Samsung's R&D centers as well as HQ. He managed to configure the world largest transactional and 24/7 sleepless Perforce server. Before joining Perforce team, he used to work as an administrator of IBM Rational products in Samsung.
  • 3.
    # • Perforceat Samsung • Optimizing Perforce Replication • Lockless Reads • Monitoring and Maintenance • Future
  • 4.
  • 5.
    # • Allmobile projects are in Perforce, and most are android platform. • We have 15 master servers and provides 19 perforce services. • ~30 overseas R&D centers. • Most of our users use P4V. ( > 80%)
  • 6.
    # • PrimaryServer(Android 4.4.x ~) • 4 CPUs(physically 32cores), 1.5TB Memory, Linux(RHEL) • Metadata, Journal and Logs on Flash Arrays • Depot on Spinning disk(15k, RAID1+0, DAS) • > 7TB metadata(reclaim space up to 500GB/week) • > 10,000 users • ~ 2.6 million submitted changelists since Nov. 2013 • ~ 8 million commands/day • ~ 6.5k commits/day
  • 7.
    # • OtherPerforce Servers( ~Android 4.3.x, etc.) • ~ 14 smaller servers, with between 10GB-3TB metadata each • Metadata, Journal and Logs on Flash Arrays or Spinning disk(15k, RAID1+0, DAS) • Depot on Spinning disk(15k, RAID1+0, DAS) • 5 Read-Only Replicas • > 30 Build-Farm Replicas worldwide • 90 proxies worldwide
  • 8.
  • 9.
    # • ~5 hours replication lag on overseas build replicas – ~ 25GB journal / hour on our primary server – ~ 500ms latency (HQ <-> Brazil) • What we have done – Filter db.have by using –T flag • db.have is over 90% of each journal – Add QoS rules for perforce traffic – Expand network bandwidth – Set net.keep.x variables
  • 10.
    # • ~35 mins replication lag on all build replicas – ~ 4GB journal / each p4 populate command – Break builds(Proof/Release) • What we have done – Set rpl=2 in order to profile database locking activity – Filter db.integed by using –T flag • Most lock time is due to holding write-lock on db.integed • < 30s (70x faster)
  • 11.
  • 12.
    # • H/Wupgrade is essential. – Flash Arrays, Fast CPU, Huge Memory, 10G NIC etc. • However, It can Not guarantee best performance all the time due to lock contention. – Sync commands block major write commands • dm-CommitSubmit, shelve, unshelve, submit, edit, populate – Integ commands block more write commands • dm-SubmitChange, revert, shelve, dm-CommitSubmit, change, submit, edit, resolve, reopen, add, sync, delete
  • 13.
    # • Upgraded2013.3 and set db.peeking=3 – Upgraded all replica servers in April. • Reduce replication lag – Upgraded our primary server on Jun. 29th • Eliminate lock contention caused by sync and integ commands. • Shows significant performance improvements
  • 14.
    # 130,000 120,000 110,000 100,000 90,000 80,000 70,000 60,000 50,000 40,000 30,000 20,000 10,000 0 1 2 4 8 16 32 64 128 256 512 Commands affected Commands delayed Feb. April July (sec)
  • 15.
    # • Top20 commands which blocked other commands on the primary server for the last 30days.
  • 16.
    # 700 650 600 550 500 450 400 350 300 250 200 150 100 50 0 Feb. JUN JULY time(ms) P4 DB Write lock Activity write-wait write-held 100 95 90 85 80 75 70 65 60 55 50 Feb. JUN JULY % Command ratio (lapse < 1s) submit
  • 17.
    # 13 12.5 12 11.5 11 10.5 10 9.5 9 8.5 8 7.5 7 6.5 6 5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 Feb. JUN JULY time(sec) Avg. Lapse submit shelve unshelve edit revert
  • 18.
  • 19.
    # • RunP4HealthCheck every 60 seconds – check connection, p4d status, p4broker status – send email and SMS if status is not ok. • Run Replica Gap checker every 10 minutes – check replication lag – send email if gap is over 10 minutes.
  • 20.
    # • MonitoringPerforce Severs
  • 21.
    # • MonitoringPerforce DB Lock
  • 22.
    # • Profilinglocking activities – run a query and select any fields you are interested in
  • 23.
    # • MonitoringPerforce Commands
  • 24.
    # • MonitoringPerforce Sync command
  • 25.
    # • MonitoringErrors(max-values)
  • 26.
    # • Maintainperforce db.* files, journal and logs – create and restore checkpoint files/week – db rebuild/week – unload clients and labels/day – rotate and replay journals(logs)/hour – recover the database • Set up Replica servers • Verify depots/day
  • 27.
  • 28.
  • 29.
  • 30.
    # • Upgradeour primary server in October – 4 CPUs(physically 40cores), 3.0TB memory • P4d/P4broker upgrades (r13.3 -> r14.x) • Shared Archive or Depot on Flash Arrays(dedup) • Server consolidation – 14 other servers -> 7 other servers – Setup read-only replicas for all perforce services
  • 31.
    # Kanggil Lee kanggil.lee@samsung.com

Editor's Notes

  • #17 Write waiting 시간 1532% 개선 (247.01ms -> 15.13ms) Write held 시간 3318% 개선 (586.01ms -> 17.14ms) Submit 커맨드의 97%가 1초 이내 처리 완료 다른 커맨드도 1초 이내에 완료 된 커맨드 비율이 5~20% 증가하였음
  • #18 Submit 처리 시간은 1136% 개선(3.09s -> 0.25s) Shelve 처리 시간은 87% 개선(3.77s -> 2.02s) Unshelve 처리 시간은 84% 개선(5.51s -> 2.99s) Edit 처리 시간은 279% 개선 (1.44s -> 0.38s) Revert 처리 시간은 68% 개선 (5.16s -> 3.07s)
  • #28 Submit 처리 시간은 1136% 개선(3.09s -> 0.25s) Shelve 처리 시간은 87% 개선(3.77s -> 2.02s) Unshelve 처리 시간은 84% 개선(5.51s -> 2.99s) Edit 처리 시간은 279% 개선 (1.44s -> 0.38s) Revert 처리 시간은 68% 개선 (5.16s -> 3.07s)