Recent work by market research organisations IDC (Villars, 2008) and Gartner (Bell, 2008) demonstrates a massive growth in unstructured data - typically 80% of data in industry, potentially higher within universities owing to the nature of their business (Beedie et al. , 2009). Furthermore the Storage Networking Industry Association (2008) estimated that 80% of files were no longer modified 90 days after creation. More recently industry has suggested that 80% of stored data is inactive after 60 days (Quattromini, 2010). All the universities talked to during dissemination activities keep unstructured data on tier 1 storage regardless of value (e.g. intellectual property, research results, interim results) or currency (e.g. frequently accessed information, mature archive documents) to the university’s business. Tier 1 storage is highly-performant, reliable, highly available, mirrored and well protected. However, tier 1 storage has a relatively high environmental impact - the fastest, most reliable disk storage uses more power, space and cooling and this effect is doubled by high availability techniques like multi-site mirroring. In addition, protecting the storage with a very frequent backup regime involves considerable backup infrastructure (e.g. tape libraries, disk arrays and servers). All this equipment also has a high level of embodied energy. Lower tier storage could offer energy and cost savings. It is available in much higher density formats which take up less space and cooling, and has a lower embodied energy. Mirroring is often considered unnecessary for data with low availability requirements. Given the dynamic nature of universities, efforts to place data within taxonomies or use other methods to maintain accurate metadata have had limited success. However the access, creation and modification dates are all important pieces of automatically maintained information that may be used to establish data value.
Conducting tests with the disk array showed an increase of 8% in energy consumption between idle (disks spinning, no I/O) and maximum utilization (disks spinning and high levels of I/O).
A typical 20TB storage system configured with RAID 10 might require 180 Tier1 450GB 15,000rpm disk drives . This configuration would use 2.7kW, at a cost of around ￡ 2,600 per annum and be responsible for the production of 64 tonnes of CO 2 over the five year lifetime . By comparision, tiered storage might require only 36 of the same t ier 1 disk drives (migration policy 20 /80 tier 1/tier 2 ), and 18 t ier 2 7,200rpm 1TB (SATA) storage disks configured with RAID5 . The total collection of disks would use only 0.7 kW – for total cost of around ￡ 720 per annum – a saving of ￡ 1,880 per annum, and reduce associated CO 2 emissions by 46 tonnes over the five year lifetime. Using fewer disks (54 compared to 180) , reduces total storage system footprint freeing up p hysical space for futu r e growth in the datacentre .
Planet Filestore and StorC Projects Cardiff University