Workspace Acceleration andStorage Reduction:A Comparison of Methods &Introduction to IC Manage ViewsRoger March and Shiv Sikand, IC Manage, Inc.
Digital Assets Growing at Rapid RateFile systems are being stressed by rapidly expanding digital assets. These growing datasets arebeing driven by the high capacity needs of companies designing multi-function consumerelectronics and biomedical devices and software companies developing video games andenterprise systems.It is a world markedly different from traditional software development. In terms of scale, it is notuncommon to see Perforce depots encompassing many terabytes in size composed ofhundreds of millions of files. A single client spec may define a workspace of many gigabytesand tens of thousands of files. An organization may have thousands of users spread across theglobe.Content is stored and modified on local drives by individual users working on workstations orlaptops; however, regressions, analysis, and build farms that utilize this content will typically runon some form of networked file storage to allow large sets of machines to operate on the dataseamlessly.A 2012 independent study of 524 respondents pointed out the top challenges associated withnetwork file systems, which showed the clear impact of these expanding digital assets. The topproblems were slow remote and/or local workspace syncs and storage issues such as storagecapacity not keeping up with expanding data volumes and the high cost of adding networkattached storage (NAS) devices. Additionally, the respondents cited application slowdown, withnetwork storage bottlenecks increasing tool iteration time by 30 percent.File System Adaptation for Expanding Digital AssetsAn optimally managed file system utilizing Perforce SCM will have key factors, even in the faceof continuously expanding digital assets.Client workspace syncs should be rapid to encourage continuous integration and check-ins.Network disk space usage should be optimally used such that network storage capacity doesn’tinterfere with daily execution. Directly related to this is the need to minimize network bandwidthto avoid creating bottlenecks that slow file access and must constantly be managed. Thesystem must be designed to scale with a growing number of users and expanding digital assets.Further, such any infrastructure enhancements should be compatible with existing storagetechnologies and be storage vendor agnostic to allow organizations to adapt to rapidly changinginfrastructures. The enhanced system must be reliable in the event of failures or errors.Individual users must be able to maintain workspace file control and stability, withoutcumbersome, error-prone manual management of network cache storage and differentversions. And finally, development teams should be able to build workspaces anywhere and ondemand, avoiding problems and costs associated with disk space allocation.
High Demand on Network StorageBeyond sheer size, the character of the IT environment has shifted over the years. It used to bethat nearly all Perforce workspaces resided on a user’s workstation. Today’s workspaces tendto reside on NAS devices where the workspaces are network mounted to the machinesaccessing the data. There are a number of advantages to this arrangement. One is that it allowsthe user to be easily relocated to another machine without having to physically move theworkspace. For environments where virtualization is the norm, local workspace storage is notutilized, and all files are stored on some form of networked storage. Some organizations alsofeel that restricting the data to the storage appliance gives them better control for security. TheNAS device is usually hardware-optimized to its tasks; it provides much greater performanceand reliability than is available from a commodity server running a network file system (NFS)such as CIFS.Unfortunately, the cost of storage on NAS is dramatically higher than commodity storage. Evenwith specialized hardware, use tends to expand to saturate the NAS device. The solution ofadding NAS units often makes cost a barrier to scaling.Figure 1: Current baseline NFSes over-rely on network storageAs shown in Figure 1, many current baseline NFSes over-rely on network storage. Theyduplicate file storage for every user workspace, utilizing precious Tier 1 storage space. Thededuplication optimization performed to address this issue is very inefficient due to continuallyand rapidly changing datasets. Further, because local caching is underutilized due to the highcost of solid-state storage, user file access often requires full network bandwidth, creatingbottlenecks that degrade file access time.Network Bandwidth BottlenecksMany users may be working on a particular project in different or changing roles. This makes itimpractical or undesirable to continually tune the workspaces for the current role.To address this, with traditional approaches, the workspace client is configured to cover theentire project, and the user must download a complete set of project files for every workspace.The clear drawback is that workspace syncs for large datasets can be extremely slow due tonetwork bandwidth bottlenecks, lock contention on the Perforce server due to long running syncoperations, and limited I/O bandwidth on the target storage device.
The performance impact is most severe for remote sites because the bandwidth available to thePerforce server at the remote site is typically a fraction of that available on the local network. If alarge set of changes is submitted on the local network, doing simple syncs at the remote sitecan take a long time; the entire change set must make its way down the wire—even if parts ofthe change set have nothing to do with the area the user is currently working in. The Perforceproxy architecture can mitigate this for multiple accesses to the same file by multiple users.However, for wide and deep workspaces, the proxy consistency model results in large numbersof queries to the master site, which tends to be latency limited on long haul networks.Dynamic Virtual Workspaces: A Novel Approach toWorkspace AccelerationWith a dynamic virtual workspaces approach, the user is presented with a fully populatedworkspace in near real time, through advanced virtualization techniques. The individualworkspaces are displayed rapidly irrespective of client complexity, the size of the viewmappings, or the total number of workspaces, which can be a limitation in some clonetechnologies.As the files are accessed by the operating system due to an application or user request, theyare delivered on demand to fast local storage caches through a server, replica, or proxy. Afterthe preliminary cold cache startup cost, most applications request files sequentially and the on-demand nature of the access results in a major acceleration of workspace syncs compared withthe traditional approach of populating the entire workspace before task execution.One element of the dynamic virtual workspace system design is that the local cache can be anyblock device. The simplest form would be the unused disk space on a bare metal host.Typically, only a small fraction of this disk space is used by the operating system and the rest isunused. Another choice is the use of volatile memory instead of persistent storage such astmpfs. For environments requiring even more performance, local solid-state storage drives canbe utilized very cheaply for the cache.The caches themselves are designed with built-in space reclamation capabilities, allowing thecache to be set up and tuned for each individual workload. The caches are kept under quota byautomatically removing files via a least recently used (LRU) algorithm, and quota sizes can beindividually controlled, on a per-workspace basis.
Figure 2: Workspace acceleration is a major advantage of the dynamic virtual workspacemethodAs Figure 2 illustrates, a major advantage of the dynamic virtual workspace method is theworkspace acceleration: It achieves near-instant workspace syncs for both local and remotesites. Remote sites should be configured to use a local Perforce replica. Further, individualworkspaces can be modified and updated dynamically with independent view mappings to alignwith advanced branching strategies.Dynamic Virtual Workspaces Contrasted with Snap andCloneOne alternate method to dynamic virtual workspaces is a “snap and clone” approach, where ITsets up a template, takes a snapshot, and creates clones. The clones can then be madeavailable much faster than syncing a multitude of individual workspaces. The snap and clonemethod does achieve some workspace sync acceleration over traditional NAS volumes;however, the drawback is the way it restricts the mixing and matching of constantly changing filesets, particularly when those changes involve many thousands of files. Further, it requiresongoing scripting and maintenance for client workspace customization and under-utilizes localcaching, so remote sites can still face network bottlenecks because of long latencies betweenthe clone workspace and the master content source.Local Caching ApproachMany companies are looking to improve NFS performance, particularly in virtualizedenvironments where boot storms, logout storms, and Monday morning sync storms result inreduced productivity. The main solution used is to add solid-state storage caches, either inlinebetween the client and the filer for a vendor agnostic model or onboard the actual NAS deviceitself.However, SCM environments such as Perforce have a different use model compared tounstructured data for which NAS environments are currently optimized. In the SCM model, therepository contains the workspace files, which are then replicated to users as partial projectionsof the depot. These files are then modified or used to generate a derived state from the sourcefiles.A local caching approach makes optimal utilization of local caching and network storage.Network storage is used for transient storage for edited files only, with the edited files removedonce they are checked into Perforce. Modified files checked into Perforce are automaticallymoved back to the local cache.
Figure 3: A local caching optimization approachAn advantage of this local caching optimization approach, as Figure 3 shows, is that expensivenetwork disk space is freed up instantly, such that network disk space utilization can typically bereduced by a significant amount.Intelligent File RedirectionThe intelligent file redirection approach separates reads from writes, storing the reads in localcache (see Figure 4). The modified files (writes) are automatically written to NAS, forsafekeeping.Figure 4: The intelligent file redirection approachThis approach takes advantage of on-board speeds for reads instead of slower networkspeeds—achieving typically twice the performance of network reads. Intelligent file redirectionalso ensures that modified files on the NAS device are removed after they are checked into thePerforce server, to instantly free up network disk space. The redirected reads, writes, andremovals are all done automatically without the need for manual handling.
Intelligent file redirection has other significant advantages when widely deployed in anenterprise. By eliminating read traffic through redirection to local cache, the filers can beoptimized for sequential write performance, increasing throughput for write-intensive tasks andprolonging the useful life of the filers by reducing both network and space utilization.Real-Time Deduplication Reclaims Space for Check-InsA supplemental enhancement available with an intelligent file redirect approach is toautomatically purge write files checked into Perforce from the write network storage. Thisinstantaneous de-duplication frees up network disk space.Advanced Content Delivery to Minimize Network Bandwidth,Reduce Errors, and Increase I/O AvailabilityMany organizations have large tool or third-party content libraries and geographically distributedsites. In many cases, this content must be synchronized to all sites to ensure that centrallydeveloped methodologies will work seamlessly at any location. This content tends to have thecharacteristics of a very large canonical NFS mount point. Normally these directories live on theNAS device consuming large amounts of space and file traffic because every machine needs toaccess them for their tools and data. A variety of methods are used to synchronize thembetween sites. These methods include block-based replication provided by most NAS vendors,or file-based replication provided by tools such as rsync.In many cases, large amounts of precious bandwidth between sites is consumed by thisreplication even though the specific content that was part of a big push to remote sites may noteven be needed because the granularity of the push is too coarse. Additionally, replicationcreates synchronization boundaries that can be time consuming to resolve for fast-changingdatasets. Server farms can also generate excessive I/O load on the filers as a series of jobs arequeued and run on a large number of hosts.A solution to this problem is to use a single read-only workspace instead of the canonical NFSmount point with replication. With this configuration, a single Perforce workspace is constructedas a read-only object. Multiple DVW instances from any number of locations can connect to theworkspace sync state (have table). A single sync of the workspace will result in nearinstantaneous global synchronization of the metadata. The DVW instances can either reside asa single NFS mount point for all hosts in the farm or be configured as individual DVWs on eachhost with local caching. As a result, the I/O is highly localized with on-demand granularity, andsimilar workloads will benefit from warm caches on the execution hosts.IC Manage Views: Accelerates Workspace Syncs andSlashes Network StorageIC Manage Views works with existing storage technologies to:• Reduce network storage by four times through local caching techniques and real-timede-duplication.• Achieve near-instant syncs of fully populated workspaces through dynamic virtualworkspace technology.
• Deliver two times faster file access and speed up applications through automatedintelligent file redirection (see Figure 5).Figure 5: IC Manage Views: accelerates workspace syncs and slashes network storageIn addition, IC Manage Views features:• 100% percent compatibility with existing storage technologies; NAS agnostic.• Scalability: savings increase with number of users and the size of databases.• Flexibility in building workspaces on demand; development teams can build workspacesanywhere, avoiding problems and costs associated with disk space allocation.• Reliability: handles cache recovery in the event of failures or errors.• Stability: designers maintain workspace file control. No manual management of networkcache storage and different versions.Figure 6 presents some representative IC Manage Views benchmark results. In this example,the workspace was 1 GB and 10K files.Figure 6: IC Manage Views benchmark results for 1GB workspace and 10k files
IC Manage Views can dramatically lower costs associated with storage and increaseproductivity through accelerated delivery of workspace content. It achieves these advantagesthrough dynamic virtual workspace, local caching, instant de-duplication, and intelligent fileredirection technologies.About IC ManageIC Manage provides IC Manage Views, which accelerates Perforce client workspace syncs anddrastically reduces the amount of storage needed to keep up with expanding software data. ICManage Views gives software teams the flexibility to build workspaces anywhere, avoidingproblems and costs associated with disk space allocation. IC Manage is headquartered at 2105South Bascom Ave., Suite 120, Campbell, CA. For more information visit us atwww.icmanage.com.Shiv Sikand, Vice President of Engineering, IC Manage, Inc.Shiv founded IC Manage in 2003 and has been instrumental in the company achievingtechnology leadership in high-performance design and IP management solutions. Prior to ICManage, Shiv was at Matrix Semiconductor, where he worked on the world’s first 3D memorychips.Shiv also worked at MIPS, where he led the development of the MIPS Circuit Checker (MCC).While working on the MIPS processor families at SGI, Shiv created and deployed cdsp4, theCadence-Perforce integration, which he later open sourced. Cdsp4 provided the inspiration andarchitectural testing ground for IC Manage. Shiv received his BSc and MSc degrees in physicsand electrical engineering from the University of Manchester Institute of Science andTechnology.Roger March, Chief Technology Officer, IC Manage, Inc.Prior to IC Manage, Roger worked at Matrix Semiconductor. He designed and helped build mostof its CAD infrastructure. He worked mainly on the physical and process side and also onoptimizing layout for manufacturability.Roger began his career as a circuit and logic designer in the early days of microprocessors anddynamic RAMs. While working as a designer at Data General and Zilog on long-forgottenproducts like the microNova, microEclipse, and Z80K, he found himself drawn to the CAD sideto provide design tools that the marketplace was not yet offering. He wrote circuit, logic, andfault simulators that were used to build and verify microprocessors of the day.Roger then joined MIPS as its first CAD engineer. Here he built the infrastructure with acombination of vendor and internal tools. This system was used to build all the MIPSmicroprocessors as well as most of their system designs. He wrote module generators for chipphysical design, placement and allocation tools for board designs, test pattern generators andcoverage tools for verification, and yet another logic simulator—found to be 30 times faster thanVerilog-XL in benchmarks. After MIPS was acquired by Silicon Graphics, Roger became aPrincipal Engineer at the company. Working in the microprocessor division, he worked onproblems in logic verification and timing closure. This dragged him more deeply into the realm ofdesign databases. He wrote several tools to help analyze and manipulate physical, logical, and
parasitic extraction datasets. This included work in fault simulation, static timing verification,formal verification, physical floor planning, and physical timing optimization.