[NetherRealm Studios] Game Studio Perforce Architecture
MERGE 2013 THE PERFORCE CONFERENCE SAN FRANCISCO • APRIL 24−26Perforce White PaperTo provide a solid foundation for software developmentexcellence in today s demanding economy, it s criticalto have a software version management solution thatcan meet your demands.Game Studio Perforce ArchitectureCreating Services That Power LargeDevelopmentRyan Mensching, NetherRealm Studios (WBGames)
2 Game Studio Perforce Architecture Introduction NetherRealm Studios has been creating video games on multiple platforms for more than 20years. As a AAA game developer, the amounts of data produced and production quality haveincreased many times over. Along with this growth comes increased demand for Perforcespace, reliability, and flexibility. These three pillars are the foundation of our infrastructuredecisions.Background Perforce Software Version Management has been an integral part of studio collaboration since2006. Prior to migrating to Perforce, several products had been used, including VisualSourceSafe. Perforce has been able to scale to the needs of studio development whileallowing management of large binary assets and large integrations. Enabling this growth,however, has required some engineering to ensure the best possible structure for Perforce tolayer on.Past projects encountered such stability or resource constraints that individual team memberstook it upon themselves to run or host services on individual machines. Projects were at risk oflosing data and compromising timelines, which could cost hundreds of thousands of dollars.For game studios, deadlines and “crunch” are generally mandatory events during the multi-year development cycle. Staff may be on site working around the clock during these periods,and having Perforce down halts work and collaboration.This is true not only for our code development but also for art asset development and binaryart assets, such as textures, models, and many cinema movies, which consume the mostspace for any given game project. The current project head revision is around 20 GB (sync in 8minutes) and the art assets head revision is around 545 GB (sync in 3 hours). Furthermore, wecan have 460+ submitted changelists per day with upwards of 20+ GB of iterations (see Figure1). This adds up to a significant load on Perforce during critical business.We also use a complex system of file types to manage data; +S flags are used on binarieswhere appropriate as well as file locks to manage user access and submits. Internally we arealso publishing software at a consumer level for our artists and designers. This softwarepackage undergoes a basic software release cycle and its release depends on game contentprogression and vice versa. This can make for a very difficult release window. The studiocurrently works on more than one game at a time, so we generally have a split audience thatmay be in different places within their projects.From an IT perspective, we had aging hardware and processes, plus general support issueskeeping up with business demand. We recommissioned hardware to virtual machines (VMs)and restructured existing storage as best as possible to accommodate a better layout andspace usage. Ultimately, these experiences and experiments led to further engineering for thesystems that Perforce and critical systems would use in our environment.
3 Game Studio Perforce Architecture Figure 1: Disk growth for 3 monthsIT Strategic Goals As IT, we focus our goals around what the business needs to stay viable and to produce itsproduct. For games this is very similar to many other industries, with the caveat that mosteverything must be ready or available immediately, if not faster. This can be a very painfulexperience if your infrastructure is not prepared to handle or adapt to the business. To beginour design and building our infrastructure, we established several pillars that were dictated bythe needs of the business and our experiences supporting it:• SpaceThis may seem straightforward, but how space is approached from a logical andmanagement perspective makes a big difference when in the middle of solvingissues. The space needs to be not only available but also be abstracted to suitablelayers so that applications can be applied in real time without majorreconfiguration.• SpeedThis is something that all SAN/storage users will have concerns about. When youare running all your VMs and major storage from a single platform, speed is acritical component of making the system work without causing slowness to users.• FlexibilityAlthough tied to several of the other pillars, flexibility needs to stand alone becausethis is a major gate to how you use and manage your resources, whether it be achoice of protocols or how quickly you spin up new VMs or hardware.
4 Game Studio Perforce Architecture • ReliabilityRedundancy should be inherent in any solid infrastructure design and thecomponents you choose should make this a seamless effort instead of requiring adedicated action to provide robustness. If a separate architecture plan is necessaryto provide reliability, holes may emerge over time, or this activity may eat up toomuch time.• ManagementThe glue that brings all the pieces together and allows insight to how the systemsare running is critical. Without proper management, issues will arise in the otherpillars that can trickle down. Management of the platform as a whole should not bea complex endeavor and should be self or easily documented.Solutions As an organization, how do we accomplish all of our goals and realize our vision? We hadalready established a solid layer 2 and 3 network with 10 GB distribution, a core component.On top of this we started working on migrating and testing all services to storage via IPprotocols to eliminate cabling and complexity (see Figure 2).With this base IP and storage layer, we started to test VMware as a platform to support our OSand applications. With these early renditions, we utilized internet small computer systeminterface (iSCSI) for our VM storage layer as well as any mass storage for applications andPerforce. We had some success with this platform and were able to keep our performancebenchmark for Perforce, via the p4bench tool, in the top 20 platforms on the Perforce benchdatabase at the time. Ultimately, this platform had some inherent management flaws andrequired hardware upgrades and processes to be engineered further.Figure 2: Service stackOur storage platform is a key component of our infrastructure design. We had issues withiSCSI being agile enough to keep up; logical unit number (LUN) management becomes amajor burden for a small team. iSCSI also has inherent configuration complexity at the networklayer to provide redundancy and performance. Through our testing we found that network filesystem (NFS) provided equal or better performance while allowing several advantages.
5 Game Studio Perforce Architecture First, there is the notion of true thin provisioned volumes and volumes that can be expandedon the fly without downtime or LUN changes. Second, VMware also has very mature supportfor NFS, which enables clustering and redundancy by default. The engine to power thisstorage platform is NetApp. With its mature feature set and collapsed protocol support, wewere able to leverage NFS while also having support for FC and iSCSI if so required.We are able to leverage NFS at our OS layer as well to provide dynamic support for volumegrowth, or shrinking, a key component to our growth plan for Perforce revision file storage.With the NetApp platform, we also have the advantage of snap technologies that allow forgranular file level snaps and restores that do not chew up space. For example, we are able toretrieve a single file from a deleted shelved changelist if so required, without downtime or anyoverhead (granted, we try to avoid this).Through VM evaluation and performance testing over the years, we have found that we needservers that are wide and fast. The more RAM we can give, the better for disk caching, and toaccommodate the fastest execution of database jobs, we want the fastest cores.We only have two cores on each of our Perforce servers but these are running at 3.4 GHz. Wehave been using the Cisco UCS platform to provide our compute resources for VMware. Withthis platform, we have inherent abstraction at the hardware layer, which allows theblade/hardware to be swapped or upgraded, and the reassignment of the profile toautomatically configure its network interface controllers (NICs), connections, and othercharacteristics. This allows for quick additions or replacements. The virtualization at the NIClayer is an important factor for our installation. We utilize several virtualized NICs to connectdirectly to the 10 GB backbone at the VM host layer as well as direct pass-through technologyat the OS guest level to connect to Perforce NFS data.At the OS level, we are a RHEL shop, which has served us well as a layer on virtualization.Linux utilizes virtual resources better than most OSes and provides the best way to operate athin OS with very little service overhead. We have allocated 32 GB of memory to ourproduction Perforce servers.With the disk cache system in Linux and having large amounts of RAM in our cluster, we areable to get excellent throughput from both VMDKs and mounted NFS achieving 440+ MB tothe desktop Perforce client. Our Perforce instances are run in the standard fashion with somecustom scripts for management and startup of services.We use proxies for remote work as well as in-house high-demand applications such as buildmachines. We are moving forward with full broker front-ended servers with replicas to offloadread-only data and checkpoints. With this model we are able to distribute loads seamlessly,and with our revision files being stored on NFS we can have a single source mountedread/write or read-only for production and read-only replicas without needing to sync/replicaterevision files.Reliability is a critical factor for our system design. At the storage level, we have redundancy atthe NetApp array level with automated SnapMirrors. At the volume level, we use SnapShottechnology to provide a month of on-array backups. This is the functionality that enables us torestore single files directly back to depots/servers on demand with no outage time. At the layer2 and layer 3, we have redundancy through multi-homed connections to our UCS chassis. Atthe VMware level, we have redundancy via VMware HA and clustering. At the Perforce level,we are using replicas, proxies, and checkpoints for integrity, load balancing, and backups.
6 Game Studio Perforce Architecture Finally, to glue our systems together, we use a suite of management tools. Each layer has amanagement and application monitoring tool, NetApp, UCS, and VMware. With each of thesetools, we can manage and allocate resources quickly as well as get advanced notifications forsystem usage or failures.Figure 3: Logical PerforceConclusion We have found Perforce capable of scaling to the needs of high-speed and data intensedevelopment processes. Accomplishing this, however, has required many rounds ofengineering and investment in infrastructure technologies. We have chosen these platformsfrom our experience, which means other situations may require different solutions. However,
7 Game Studio Perforce Architecture the general performance and design philosophies can be applied. Ultimately, one of the largestgains for modern infrastructure is virtualization, and we have made design choices to abstractat each layer of the platform. This allows easy upgrades, migration, and management ofgrowth without changing how we use the platform or the services it provides.