Introduction to Object Storage Solutions White Paper


Published on

Learn more about Hitachi Content Platform Anywhere by visiting
and more information on the Hitachi Content Platform is at

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Introduction to Object Storage Solutions White Paper

  2. 2. WHITE PAPER 2ContentsExecutive Summary 3Introduction 4Main Concepts and Features 4Object-Based Storage 4Object Structure 4Distributed Design 6Open Architecture 6Multitenancy 7Object Versioning 7Spin-Down and Storage Tiering 7Search 8Replication 8Common Use Cases 8Fixed-Content Archiving 8Backup-Free Data Protection and Content Preservation 8Cloud-Enabled Storage 10E-Discovery, Compliance and Metadata Analysis 11System Fundamentals 12Hardware Overview 12Software Overview 14System Organization 14Namespaces and Tenants 15Main Concepts 15User and Group Accounts 17System and Tenant Management 17Policies 18Content Management Services 19Conclusion 20
  3. 3. WHITE PAPER 3Introduction to Object Storage and Hitachi Content PlatformExecutive SummaryOne of IT’s greatest challenges today is an explosive, uncontrolled growth of unstructured data. Continual growth ofemail and documents, video, Web pages, presentations, medical images, and so forth, increase both complexity andrisk. This effect is seen particularly in distributed IT environments, such as cloud service providers and organizationswith branch or remote office sites. The vast quantity of data being created, difficulties in management and properhandling of unstructured content, and complexity of supporting more users and applications pose challenges to ITdepartments. Organizations often end up with sprawling storage silos for a multitude of applications and workloads,with few resources available to manage, govern, protect, and search the data.Hitachi Data Systems provides an alternative solution to these challenges through a single object storage platformthat can be divided into virtual storage systems, each configured for the desired level of service. The great scale andrich features of this solution help IT organizations in both private enterprises and cloud service providers managedistributed IT environments. It helps them to control the flood of storage requirements for unstructured content andaddresses a variety of workloads.
  4. 4. WHITE PAPER 4IntroductionHitachi Content Platform (HCP) is a multipurpose distributed object-based storage system designed to sup-port large-scale repositories of unstructured data. HCP enables IT organizations and cloud service providers tostore, protect, preserve and retrieve unstructured content with a single storage platform. It supports multiple levelsof service and readily evolves with technology and scale changes. With a vast array of data protection and contentpreservation technologies, the system can significantly reduce or even eliminate tape-based self backups or backupsof edge devices connected to the platform. HCP obviates the need for a siloed approach to storing unstructuredcontent. Massive scale, multiple storage tiers, Hitachi reliability, nondisruptive hardware and software updates, mul-titenancy and configurable attributes for each tenant allow the platform to support a wide range of applications ona single physical HCP instance. By dividing the physical system into multiple, uniquely configured tenants, adminis-trators create “virtual content platforms” that can be further subdivided into namespaces for further organization ofcontent, policies and access. With support for thousands of tenants, tens of thousands of namespaces, and pet-abytes of capacity in one system, HCP is truly cloud-ready.Main Concepts and FeaturesObject-Based StorageHitachi Content Platform, as a general-purpose object store, allows unstructured data files to be stored as objects.An object is essentially a container that includes both file data and associated metadata that describes the data.The objects are stored in a repository. The metadata is used to define the structure and administration of the data.HCP can also leverage object metadata to apply specific management functions, such as storage tiering, to eachobject. The objects have intelligence that enables them to automatically take advantage of advanced storage anddata management features to ensure proper placement and distribution of content.HCP architecture isolates stored data from the hardware layer. Internally, ingested files are represented as objectsthat encapsulate both the data and metadata required to support applications. Externally, HCP presents each objecteither as a set of files in a standard directory structure or as a uniform resource locator (URL) accessible by users andapplications via HTTP/HTTPS.HCP stores objects in a repository. Data that is ingested and stored in the repository is permanently associated withthe information about that data, called metadata. Each data object encapsulates both object data and metadata, andis treated within HCP as a single unit for all intents and purposes.Object StructureAn HCP repository object is composed of file data and the associated metadata, which in turn consists of systemmetadata and, optionally, custom metadata and an access control list (ACL). The structure of the object is shown inFigure 1.File data is an exact digital copy of the actual file contents at the time of its ingestion. If the object is under retention,it cannot be deleted before the expiration of its retention period, except when using a special privileged operation.If versioning is enabled, multiple versions of a file can be retained. If appendable objects are enabled, data can beappended to an object (with the CIFS or NFS protocols) without modifying the original fixed-content data.
  5. 5. WHITE PAPER 5Figure 1. HCP ObjectMetadata is system or user generated data that describes the fixed-content data of an object and defines theobject’s properties. System metadata, the system-managed properties of the object, includes HCP-specific meta-data and POSIX metadata.HCP-specific metadata includes the date and time the object was added to the namespace (ingest time), thedate and time the object was last changed (change time), the cryptographic hash value of the object along with thenamespace hash algorithm used to generate that value, and the protocol through which the object was ingested. Italso includes the object’s policy settings, such as data protection level (DPL), retention, shredding, indexing, and, forHCP namespaces only, versioning.POSIX metadata includes a user ID and group ID, a POSIX permissions value, and POSIX time attributes.Custom metadata is optional, user-supplied descriptive information about a data object that is usually provided aswell-formed XML. It is typically intended for more detailed description of the object. This metadata can also be usedby future users and applications to understand and repurpose the object content. HCP supports multiple custommetadata fields for each object.ACL is optional, user-provided metadata containing a set of permissions granted to users or user groups to performoperations on an object. The ACLs are supported only in HCP namespaces.The complete metadata structure, as supported in HCP namespaces, is shown in Figure 2. It includes all metafilessupported by HCP for objects, which were generated for the sample data structure (assuming that custom metadataand ACLs were added for each object).
  6. 6. WHITE PAPER 6Figure 2. HCP Namespace: Complete Metadata StructureDistributed DesignAn HCP system consists of both hardware and software and comprises many different components that are con-nected together to form a robust, scalable architecture for object-based storage. HCP runs on an array of servers, ornodes, that are networked together to form a single physical instance. Each node is a storage node. Storage nodesstore data objects. All runtime operations and physical storage, including data and metadata, are distributed amongthe storage nodes. All objects in the repository are distributed across all available storage space but still presented asfiles in a standard directory structure. Objects that are physically stored on any particular node are available from allother nodes.Open ArchitectureHCP has an open architecture that insulates stored data from technology changes, as well as from changes in HCPitself due to product enhancements. This open architecture ensures that users will have access to the data long afterit has been added to the repository. HCP acts as both a repository that can store customer data and an onlineportal that enables access to that data by means of several industry-standard interfaces, as well as through anintegrated search facility, Hitachi Data Discovery Suite (HDDS). The HTTP or HTTPS, WebDAV, CIFS and NFS pro-tocols support various operations. These operations include storing data, creating and viewing directories, viewingand retrieving objects and their metadata, modifying object metadata, and deleting objects. Objects that were addedusing any protocol are immediately accessible through any other supported protocol. These protocols can be used toaccess the data with a Web browser, the HCP client tools, 3rd-party applications, Microsoft®Windows®Explorer, ornative Windows or UNIX tools.HCP allows special-purpose access to the repository through the SMTP protocol, which is used only for storingemail. For data backup and restore, HCP supports the NDMP protocol.
  7. 7. WHITE PAPER 7MultitenancyMultitenancy support allows the repository in a single physical HCP instance to be partitioned into multiple name-spaces. A namespace is a logical partition that contains a collection of objects particular to one or more applications.Each namespace is a private object store that is represented by a separate directory structure and has a set ofindependently configured attributes. Namespaces provide segregation of data, while tenants, or groupings ofnamespaces, provide segregation of management. An HCP system can have up to 1,000 tenants. Each tenant andits set of namespaces constitute a virtual HCP system that can be accessed and managed independently by usersand applications. This HCP feature is essential in enterprise, cloud and service-provider environments.Data access to HCP namespaces can be either authenticated or nonauthenticated, depending on the type andconfiguration of the access protocol. Authentication can be performed using HCP local accounts or Microsoft ActiveDirectory®groups.Object VersioningHCP supports object versioning, which is the capability of a namespace to create, store and manage multipleversions of objects in the HCP repository. This ability provides a history of how the data has changed over time.Versioning facilitates storage and replication of evolving content, thereby creating new opportunities for HCP inmarkets such as content depots and workflow applications.Versioning is available in HCP namespaces and is configured at the namespace level. Versioning is only supportedwith HTTP or REST. Other protocols cannot be enabled if versioning is enabled for the namespace. Versioningapplies only to objects, not to directories or symbolic links. A new version of an object is created when an objectwith the same name and location as an existing object is added to the namespace. A special type of version, calleda deleted version, is created when an object is deleted. Updates to the object metadata affect only the current ver-sion of an object and do not create new versions.Previous versions of objects that are older than a specified amount of time can be automatically deleted, or pruned.It is not possible to delete specific historical versions of an object; however, a user or application with appropriatepermissions can purge the object to delete all its versions, including the current one.Spin-Down and Storage TieringHCP implements spin-down disk support as an early step towards the long-term goal of supporting information life-cycle management (ILM) and intelligent objects. In the near term, the goal of the HCP spin-down feature is to takeadvantage of the energy savings potential of the spin-down technology.HCP spindown-capable storage is based on the power savings feature of Hitachi midrange storage systems and is acore element of the new storage tiering functionality, which is implemented as an HCP service. According to storagetiering strategy that is specified by customers, the storage tiering service identifies objects that are eligible to resideon spin-down storage and moves them to and from the spin-down storage as needed.Tiering selected content to spindown-enabled storage lowers overall cost by reducing energy consumption forlarge-scale unstructured data storage, such as deep archives and disaster recovery sites. Storage tiering can veryeffectively be used with customer-identified “dark data” (rarely accessed data) or data replicated for disaster recoveryby moving that data to spin-down storage some time after ingestion or replication. Customer sites where data pro-tection is critical can use storage tiering to move all redundant data copies to spin-down storage, which makes thecost of keeping data protection copies competitive to a tape solution.Storage tiering also enables service providers to use a turnkey framework to offer differentiated object data man-agement plans. This capability further enhances HCP as an attractive target for fixed content, especially forarchive-oriented use cases where tape may be considered an alternative.
  8. 8. WHITE PAPER 8SearchHCP provides the only integrated metadata query engine on the market. HCP includes comprehensive search capa-bilities that enable users to search for objects in namespaces, analyze namespace contents, and manipulate groupsof objects. To satisfy government requirements, HCP supports e-discovery for audits and litigation.The metadata query engine is always available in any HCP system, but the content search facility requires installationof a separate HDS product, Hitachi Data Discovery Suite.ReplicationReplication, an add-on feature to HCP, is the process that keeps selected tenants and namespaces in 2 or moreHCP systems in sync with each other. The replication service copies one or more tenants or namespaces from oneHCP system to another, propagating object creations, object deletions, and metadata changes. HCP also replicatestenant and namespace configuration, tenant-level user accounts, compliance and tenant log messages, and reten-tion classes.The HCP system in which the objects are initially created is called the primary system. The 2nd system is calledthe replica. Typically, the primary system and the replica are in separate geographic locations and connected bya high-speed wide area network. HCP supports different replication topologies including many-to-one and chainconfigurations.Common Use CasesFixed-Content ArchivingHitachi Content Platform is optimized for fixed-content data archiving. Fixed-content data is information that doesnot change but must be kept available for future reference and be easily accessible when needed. A fixed-contentstorage system is one in which the data cannot be modified. HCP uses “write-once, read-many” (WORM) storagetechnology, and a variety of policies and services (such as retention, content verification and protection) to ensure theintegrity of data in the repository. The WORM storage means that data, once ingested into the repository, cannot beupdated or modified; that is, the data is guaranteed to remain unchanged from when it was originally stored. If theversioning feature is enabled within the HCP system, different versions of the data can be stored and retrieved, inwhich case each version is WORM.Backup-Free Data Protection and Content PreservationHCP is a true backup-free platform. HCP protects content without the need for backup. It uses sophisticated datapreservation technologies, such as configurable data and metadata protection levels (MDPL), object versioning andchange tracking, multisite replication with seamless application failover, and many others. HCP includes a variety offeatures designed to protect integrity, provide privacy, and ensure availability and security of stored data. Below is asummary of the key HCP data protection features:■■ Content immutability. This intrinsic feature of HCP WORM storage design protects the integrity of the data in therepository.■■ Content verification. The content verification service maintains data integrity and protects against data corrup-tion or tampering by ensuring that the data of each object matches its cryptographic hash value. Any violation isrepaired in a self-healing fashion.■■ Scavenging. The scavenging service ensures that all objects in the repository have valid metadata. In case meta-data is lost or corrupted, the service tries to reconstruct it by using the secondary, or scavenging, metadata (acopy of the metadata stored with each copy of the object data).
  9. 9. WHITE PAPER 9■■ Data encryption. HCP supports encryption at rest capability that allows seamless encryption of data on the physi-cal volumes of the repository. This ensures data privacy by preventing unauthorized access to the stored data. Theencryption and decryption are handled automatically and transparently to users and applications.■■ Versioning. HCP uses versioning to protect against accidental deletes and storing wrong copies of objects.■■ Data availability.■■ RAID protection. RAID storage technology provides efficient protection from simple disk failures. SAN-basedHCP systems typically use RAID-6 erasure coding protection to guard against dual drive failures.■■ Multipathing and zero-copy failover. These features provide data availability in SAN-attached array of inde-pendent nodes (SAIN) systems.■■ Data protection level and protection service. In addition to using RAID and SAN technologies to providedata integrity and availability, HCP can use software mirroring to store the data for each object in multiple loca-tions on different nodes. HCP groups storage nodes into protection sets with the same number of nodes ineach set, and tries to store all the copies of the data for an object in a single protection set where each copyis stored on a different node. The protection service enforces the required level of data redundancy by check-ing and repairing protection sets. In case of violation, it creates additional copies or deletes extra copies of anobject to bring the object into compliance. If replication is enabled, the protection service can use an objectcopy from a replica system if the copy on the primary system is unavailable.■■ Metadata redundancy. In addition to the data redundancy as specified by DPL, HCP creates multiple copiesof the metadata for an object on different nodes. Metadata protection level or MDPL is a system-wide settingthat specifies the number of copies of the metadata that the HCP system must maintain (normally 2 copies,MDPL2). Management of MDPL redundancy is independent of the management of data copies for DPL.■■ Nondisruptive software and hardware upgrades. HCP employs a number of techniques that minimize oreliminate any disruption of normal system functions during software and hardware upgrades. Nondisruptivesoftware upgrade (NDSU) is one of these techniques that includes greatly enhanced online upgrade support,nondisruptive patch management, and online upgrade performance improvements. HCP supports media-freeand remote upgrades, HTTP or REST drain mode, and parallel operating system (OS) installation. It also sup-ports automatic online upgrade commit, offline upgrade duration estimate, enhanced monitoring and emailalerts, and other features.Storage nodes can be added to an HCP system without causing any downtime. HCP also supports nondisrup-tive storage upgrades that allow online storage addition to SAIN systems without any data outage.■■ Seamless application failover. This feature is supported by HCP systems in a replicated topology. Thiscapability includes seamless failover routing feature that enables direct integration with customer-owned loadbalancers by allowing HTTP requests to be serviced by any HCP system in a replication topology. Seamlessdomain name system (DNS) failover is an HCP built-in multisite load-balancing and high-availability technologythat is ideal for cost efficient, best-effort customer environments.■■ Replication. If enabled, this feature provides a multitude of mechanisms that ensure data availability. The rep-lica system can be used both as a source for disaster recovery and to maintain data availability by providinggood object copies for protection and content verification services. If an object cannot be read from the primarysystem, HCP can try to read the object from the replica if read-from-replica feature is enabled.■■ Data security.■■ Authentication of management and data access.
  10. 10. WHITE PAPER 10■■ Granular, multilayer data access permission scheme.■■ IP filtering technology and protocol-specific access or deny lists.■■ Secure Sockets Layer (SSL) for HTTP or WebDAV data access, management access, and replication.■■ Node login prevention.■■ Shredding policy and service.■■ Autonomic technology refresh feature, implemented as HCP migration service, enables organizations to main-tain continuously operating content stores that allows them to preserve their digital content assets for the longterm.Cloud-Enabled StorageThe powerful, industry-leading capabilities of HCP make it well suited to the cloud storage space. An HCP-basedinfrastructure solution is sufficiently flexible to accommodate any cloud deployment models (public, private orhybrid) and simplify the migration to the cloud for both service providers and subscribers. HCP provides edge-to-core, secure multitenancy and robust management capabilities, and a host of features to optimize cloud storageoperations.HCP, in its role as an online data repository, is truly ready for a cloud-enabled market. While numerous HCP featureswere already discussed earlier in this paper, the purpose of this section is to summarize those that contribute themost to HCP cloud capabilities. They include:■■ Large-scale multitenancy.■■ Management segregation. HCP supports up to 1,000 tenants, each of which can be uniquely configured foruse by a separate cloud service subscriber.■■ Data segregation. HCP supports up to 10,000 namespaces, each of which can be uniquely configured for aparticular application or workload.■■ Massive scale.■■ Petabyte repository offers 40PB of storage, 80 nodes, 32 billion user objects, and 15 million files per directory,all on a single physical system.■■ Best node density in the object storage industry supports 500TB per node and 400+M objects per node. Withfewer number of nodes, HCP requires less power, less cooling, and less floor space.■■ Unparalleled expandability that allows organizations to “start small” and expand according to demand.■■ Nodes and/or storage can be added to expand an HCP system’s storage and throughput capacity, withoutdisruptions. Multiple storage systems are supported by a single HCP system.■■ Easy tenant and storage provisioning.■■ Geographical dispersal and global accessibility.■■ WAN-friendly REST interface for namespace data access and replication.■■ Replication of content across multiple sites using advanced, flexible replication topologies.■■ WAN-optimized, high-throughput data transfer.
  11. 11. WHITE PAPER 11■■ High availability.■■ Fully redundant hardware.■■ Automatic routing of client requests around hardware failures.■■ Load balancing across all available hardware.■■ Multiple REST interfaces. These interfaces include the REST API for namespace data access, management API,and metadata query API. REST API is a technology of choice for cloud enablers and consumers. Some of thereasons for its popularity include high efficiency and low overhead, caching at both the client and the server andAPI uniformity. In addition, this technology offers a stateless nature that allows accommodation of the latencies ofInternet access and potentially complex firewall configurations.■■ Secure, granular access to tenants, namespaces and objects, which is crucial in any cloud environment. Thisaccess is facilitated by the HCP multilayer, flexible permission mechanism, including object-level ACLs.■■ Usage metering. HCP has built-in chargeback capabilities, indispensable for cloud use, to facilitate provider andsubscriber transactions. HCP also provides tools for 3rd-party vendors and customers to write to the API for easyintegration with the HDS solution for billing and reporting.■■ Low-touch system that is self-monitoring, self-managing and self-healing. HCP features advanced monitor-ing, audit and reporting capabilities. HCP services can automatically repair issues if they arise.■■ Support for multiple levels of service. This support is provided through HCP policies, service plans and quotasthat can be configured for each tenant helps enforce service-level agreements (SLAs). It allows the platform toaccommodate a wide range of subscriber use cases and business models on a single physical system.■■ Edge-to-core solution. HCP, working in tandem with Hitachi Data Ingestor (HDI), provides an integrated edge-to-core solution for cloud storage deployments. HCP serves as the “engine” at the core of the HDS cloudarchitecture. HDI resides at the edge of the storage cloud (for instance, at a remote office or subscriber site) andserves as the “on-ramp” for application data to enter the cloud infrastructure. HDI acts as a local storage cachewhile migrating data into HCP and maintaining links to stored content for later retrieval. Users and applicationsinteract with HDI at the edge of the cloud but perceive bottomless, backup-free storage provided by HCP at thecore.E-Discovery, Compliance and Metadata AnalysisCustom metadata enables building massive unstructured data stores by providing means for faster and moreaccurate access of content and giving storage managers the meaningful information they need to efficiently andintelligently process data and apply the right object policies to meet all business, compliance and protection require-ments. Regulatory compliance features include namespace retention mode (compliance and enterprise), retentionclasses, retention hold, automated content disposition, and privileged delete and purge. HCP search capabili-ties include support for e-discovery for litigation or audit purposes. On HCP, open APIs allow direct 3rd-partyintegration.HCP supports search facilities that provide an interactive interface. The search console offers a structured environ-ment for creating and executing queries (sets of criteria that each object in the search results must satisfy). Userscan apply various selection criteria, such as objects stored before a certain date or larger than a specified size.Queries return metadata for objects included in the search result. This metadata can be used to retrieve the object.From the search console, users can open objects, perform bulk operations on objects (hold, release, delete, purge,privileged delete and purge, change owner, set ACL), and export search results in standard file formats for use asinput to other applications.
  12. 12. WHITE PAPER 12The metadata query engine (MQE) is integrated with HCP and is always available in the HCP system. It is also usedby the metadata query API, a programmatic interface for querying namespaces. The MQE index resides on desig-nated logical volumes on the HCP storage nodes, sharing or not sharing the space on these volumes with the objectdata, depending on the type of system and volume configuration.Search is enabled at both the tenant and namespace levels. Indexing is enabled on a per-namespace basis. Settingsat the system and namespace levels determine whether custom metadata is indexed in addition to system meta-data and ACLs. If indexing of custom metadata is disabled, the MQE indexes do not include custom metadata. If anamespace is not indexed at all, searches do not return any results for objects in this namespace.Each object has an index setting that affects differently what content is indexed by the metadata query engine. Ifindexing is enabled for a namespace, MQE always indexes system metadata and ACLs regardless of the index set-ting for an object. If the index setting is set to true, MQE also indexes custom metadata for this object.System FundamentalsHardware OverviewAn individual physical HCP instance, or HCP system, is not a single device; it is a collection of devices that, com-bined with HCP software, can provide all the features of an online object repository while tolerating node, disk andother component failures.From a hardware perspective, each HCP system consists of the following categories of components:■■ Nodes (servers).■■ Internal or SAN-attached storage.■■ Networking components (switches and cabling).■■ Infrastructure components (racks and power distribution units).Storage nodes are the vital part of HCP. They store and manage the objects that reside in the physical system stor-age. The nodes are conventional off-the-shelf servers. Each node can have multiple internal physical drives and/orconnect to external Fibre Channel storage (SAN). In addition to using RAID and SAN technologies and a host of otherfeatures to protect the data, HCP uses software mirroring to store the data and metadata for each object in multiplelocations on different nodes. For data, this feature is managed by the namespace DPL setting, which specifies thenumber of copies of each object HCP must maintain in the repository to ensure the required level of data protection.For metadata, this feature is managed by the MDPL, which is a system-wide setting.A storage node runs the complete HCP software and serves as both a repository for objects and a gateway to thedata and metadata they contain. All runtime operations are distributed among the storage nodes, ensuring reliabilityand performance.HCP runs on a redundant array of independent nodes (RAIN) or a SAN-attached array of independent nodes(SAIN). RAIN systems use the internal storage in each node. SAIN systems use the external SAN storage. HCP isoffered as 2 products: HCP 300 (based on RAIN configuration) and HCP 500 (based on SAIN configuration).HCP RAIN (HCP 300)The nodes in an HCP 300 system are Hitachi Compute Rack 220 (CR 220) servers. RAIN nodes contain internalstorage: RAID controller and disks. All nodes use hardware RAID-5 data protection. In an HCP RAIN system, thephysical disks in each node form a single RAID group, normally RAID-5 (5D+1P) (see Figure 3). This helps ensure theintegrity of the data stored on each node.
  13. 13. WHITE PAPER 13An HCP 300 (RAIN) system must have a minimum of 4 storage nodes. Additional storage nodes are added in4-node increments. An HCP 300 system can have a maximum of 20 nodes.HCP 300 systems are normally configured with a DPL setting of 2 (DPL2), which, coupled with hardware RAID-5,yields an effective RAID-5+1 total protection level.Figure 3. HCP 300 Hardware ArchitectureHCP SAIN (HCP 500/500XL)The nodes in an HCP 500 system are either Hitachi Compute Rack 220 (CR 220) servers or blades in HitachiCompute Blade 320 (CB 320) servers. The HCP 500 nodes contain Fibre Channel host bus adapters (HBAs) anduse external Fibre Channel SAN storage; they are diskless servers that boot from the SAN-attached storage.The nodes in a SAIN system can have internal storage in addition to being connected to external storage. Thesenodes are called HCP 500XL nodes. They are an alternative to the standard HCP 500 nodes and have the samehardware configuration, except the addition of the RAID controller and internal hard disk drives. In HCP 500XL nodes,the system metadata database resides on the local disks, which leads to more efficient and faster database opera-tions. As a result, the system has the ability to better support larger capacity and higher object counts per node andaddress higher performance requirements.A typical 500XL node internal storage configuration includes six 500GB 7200RPM SATA II drives in a single RAID-5(5D+1P) RAID group, with 2 LUNs: 31GB (operating system) and 2.24TB (database). The HCP 500XL nodes are usu-ally considered when the system configuration exceeds 4 standard nodes.
  14. 14. WHITE PAPER 14HCP 500 and 500XL (SAIN) systems are supported with a minimum of 4 storage nodes. With a SAIN system, addi-tional storage nodes are added in pairs, so the system always has an even number of storage nodes. A SAIN systemcan have a maximum of 80 nodes.Both RAIN and SAIN systems can have a DPL as high as 4, which affords maximum data availability but greatlysacrifices storage utilization. Typically, the external SAN-attached storage uses RAID-6. Best protection andhigh availability of an HCP 500 system is achieved by giving each node its own RAID group or Hitachi DynamicProvisioning (HDP) pool containing 1 RAID group.Software OverviewHCP system software consists of an operating system (the appliance operating system) and core software. The coresoftware includes components that:■■ Enable access to the object repository through the industry-standard HTTP or HTTPS, WebDAV, CIFS, NFS,SMTP and NDMP protocols.■■ Ingest fixed-content data, convert it into HCP objects, and manage the objects data and metadata over time.■■ Maintain the integrity, stability, availability and security of stored data by enforcing repository policies and executingsystem services.■■ Enable configuration, monitoring and management of the HCP system through a human-readable interface.■■ Support searching the repository through an interactive Web interface (the search console) and a programmaticinterface (the metadata query API).System OrganizationHCP is a fully symmetric, distributed application that stores and manages objects (see Figure 4). An HCP objectencapsulates the raw fixed-content data that is written by a client application, and its associated system andcustom metadata. Each node in an HCP system is a Linux-based server that runs a complete HCP instance. TheHCP system can withstand multiple simultaneous node failures, and acts automatically to ensure that all object andnamespace policies are valid.
  15. 15. WHITE PAPER 15Figure 4. The High-Level Structure of an HCP SystemExternal system communication is managed by the DNS manager, a distributed network component that balancesclient requests across all nodes to ensure maximum system throughput and availability. The DNS manager works inconjunction with a corporate DNS server to allow clients to access the system as a single entity, even though thesystem is made up of multiple independent nodes.The HCP system is configured as a subdomain of an existing corporate domain. Clients access the system usingpredefined protocol-specific or namespace-specific names.While not required, using DNS is important in ensuring balanced and problem-free client access to an HCP system,especially for the HTTP or REST clients.Namespaces and TenantsMain ConceptsAn HCP repository is partitioned into namespaces. A namespace is a logical repository as viewed by an applica-tion. Each namespace consists of a distinct logical grouping of objects with its own directory structure, such that theobjects in one namespace are not visible in any other namespace. Access to one namespace does not grant a useraccess to any other namespace. To the user of a namespace, the namespace is the repository. Namespaces are notassociated with any preallocated storage; they share the same underlying physical storage. Namespaces provide amechanism for separating the data stored for different applications, business units or customers. For example, theremay be one namespace for accounts receivable and another for accounts payable. While a single namespace can
  16. 16. WHITE PAPER 16host one or more applications, it typically hosts only one application. Namespaces also enable operations to workagainst selected subsets of repository objects. For example, a search could target the accounts receivable andaccounts payable namespaces but not the employees namespace.Figure 5 shows the logical structure of an HCP system with respect to its multitenancy features.Figure 5. HCP System Logical Layout: Namespaces and TenantsNamespaces are owned and managed by tenants. Tenants are administrative entities that provide segregation ofmanagement, while namespaces offer segregation of data. A tenant typically represents an actual organization
  17. 17. WHITE PAPER 17such as a company or a department within a company that uses a portion of a repository. A tenant can also corre-spond to an individual person. Namespace administration is done at the owning tenant level.Clients can access HCP namespaces through HTTP or HTTPS, WebDAV, CIFS, NFS and SMTP protocols. Theseprotocols can support authenticated and/or anonymous types of access (types of access and their combinations arediscussed in more detail later in this document). HCP namespaces are owned by HCP tenants. An HCP system canhave multiple HCP tenants, each of which can own multiple namespaces. The number of namespaces each HCPtenant can own can be limited by an administrator.User and Group AccountsUser and group accounts control access to various HCP interfaces and give users permission to perform administra-tive tasks and access namespace content.An HCP user account is defined in HCP; it has a set of credentials, username and password, which is stored locallyin the system. The HCP system uses these credentials to authenticate a user, performing local authentication.An HCP group account is a representation of an Active Directory (AD) group. To create group accounts, HCP mustbe configured to support Active Directory. The group account enables AD users in the AD group to access one ormore of HCP interfaces.Like HCP user accounts, HCP group accounts are defined separately at the system and tenant levels. Different ten-ants have different user and group accounts. These accounts cannot be shared across tenants. Group membershipis different at the system and tenant levels.HCP administrative roles can be associated with both system-level and tenant-level user and group accounts. Dataaccess permissions can be associated with only tenant-level user and group accounts. Consequently, system-levellocal and AD users can only be administrative users, while tenant-level local and AD users can be both adminis-trative users and have data access permissions. Tenant-level users can have only administrative roles withoutnamespace data permissions, or only namespace data permissions without administrative roles, or any combinationof administrative roles and namespace data permissions.System and Tenant ManagementThe implementation of segregation of management in the HCP system is illustrated in Figure 6.An HCP system has both system-level and tenant-level administrators:■■ System-level administrative accounts are used for configuring system-wide features, monitoring system hard-ware and software and overall repository usage, and managing system-level users. The system administrator userinterface, the system management console, provides the functionality needed by the maintainer of the physi-cal HCP system. For example, it allows the maintainer to shut down the system, see information about nodes,manage policies and services, and create HCP tenants. System administrators have a view of the system as awhole. This view includes all HCP software and hardware that make up the system, and can perform all of theadministration for actions that have system scope.■■ Tenant-level administrative accounts are used for creating HCP namespaces. They can configure individual ten-ants and namespaces, monitor namespace usage at the tenant and namespace level, manage tenant-level users,and control access to namespaces. The required functionality is provided by the tenant administrator user inter-face, tenant management console. This interface is intended for use by the maintainer of the virtual HCP system(an individual tenant with a set of namespaces it owns). The tenant-level administration feature facilitates segrega-tion of management, which is essential in cloud environments.
  18. 18. WHITE PAPER 18An HCP tenant can optionally grant system-level users administrative access to itself. In this case, system-levelusers with the monitor, administrator, security or compliance role can log into the tenant management console oruse the HCP management API for that tenant. System-level users with the monitor or administrator role can alsoaccess the tenant management console directly from the system management console. This effectively enables asystem administrator to function as a tenant administrator, as shown in Figure 4. System-level users can perform allthe activities allowed by the tenant-level roles that correspond to their system-level roles. An AD user may belongto AD groups for which the corresponding HCP group accounts exist at both the system and tenant levels. Thisuser has the roles associated with both the applicable system-level group accounts and the applicable tenant-levelgroup accounts.PoliciesObjects in a namespace have a variety of properties, such as the retention setting or index setting. These proper-ties are defined for each object by the object system metadata. Objects can also be affected by some namespaceproperties, such as the default metadata settings that are inherited by new objects stored in the namespace, or theversioning setting. Both the namespace-level settings and the properties that are part of the object metadata serve asparameters for the HCP system’s transactions and services, and determine the object’s behavior during its life cyclewithin the repository. These settings are called policies.An HCP policy is one or more settings that influence how transactions and internal processes (services) affectobjects in a namespace. Policies ensure that objects behave in expected ways.The HCP policies are described in Table 1.Table 1. HITACHI CONTENT PLATFORM PoliciesPolicy Name Policy Description and Components Transactions and Services InfluencedDPL System DPL setting, namespace DPL setting. Object creation. Protection service.Retention Default retention setting, object retention setting,hold setting, system metadata and custommetadata options for objects under retention.Object creation, object deletion, system andcustom metadata handling. Disposition, Garbagecollection services.Shredding Default shred setting, object shred setting. Object deletion. Shredding service.Indexing Default index setting, object index setting. MQE.Versioning Versioning setting, pruning setting. Object creation and deletion. Garbage collectionservice.Custom Metadata Validation XML syntax validation. Add/replace custom metadata operations.Each policy may consist of one or more settings that may have different scopes of application and methods of con-figuration. Policy settings are defined at the object and the namespace level. Note that the same policy setting maybe set at different levels depending on the namespace. The default retention, shred and index settings are set at thenamespace level in HCP namespaces.
  19. 19. WHITE PAPER 19Table 2 lists all policy settings sorted according to their scope and method of configuration.Table 2. HITACHI CONTENT PLATFORM Policy Settings: Scope and ConfigurationPolicy Policy SettingHCP NamespacesScope/Level Configured ViaData Protection Level System DPL: 1-4 System System UINamespace DPL: 1-4, dynamic Namespace Tenant UI, MAPIRetention Default retention setting: fixed date, offset, special value,retention classNamespace Tenant UI, MAPIRetention setting: fixed date, offset, special value,retention classObject REST API,retention.txtHold setting: true or false Object REST APIOwnership and POSIX permission changes underretention: true or falseNamespace Tenant UI, MAPICustom metadata operations allowed under retention Namespace Tenant UI, MAPIIndexing Index setting: true or false (1/0) Object REST API, index.txtDefault index setting: true or false Namespace Tenant UI, MAPIShredding Shred setting: true or false (1/0) Object REST API, shred.txtDefault shred setting: true or false Namespace Tenant UI, MAPICustom Metadata Validation XML validation: true or false Namespace Tenant UI, MAPIVersioning Versioning setting: true or false Namespace Tenant UI, MAPIPruning setting: true/false and number of days forprimary or replicaNamespace Tenant UI, MAPIContent Management ServicesA Hitachi Content Platform service is a background process that performs a specific function that is targeted atpreserving and improving the overall health of the HCP system. In particular, services are responsible for optimizingthe use of system resources and maintaining the integrity and availability of the data stored in the HCP repository.HCP implements 12 services: protection, content verification, scavenging, garbage collection, duplicate elimination,shredding, disposition, compression, capacity balancing, storage tiering, migration and replication.HCP services are briefly described in Table 3.
  20. 20. WHITE PAPER 20Table 3. HITACHI CONTENT PLATFORM ServicesPolicy DescriptionProtection Enforces DPL policy compliance by ensuring that the proper number of copies of each object exists in the system,and that damaged or lost objects can be recovered. Any policy violation invokes repair process. Offers bothscheduled and event-driven service. Events trigger a full service run, even if the service is disabled, after aconfigurable amount of time: 90 minutes after node shutdown; 1 minute after logical volume failure; 10 minutesafter node removal.Content Verification Guarantees data integrity of repository objects by ensuring that the content of a file matches its digital signature.Repairs the object if the hash does not match. Detects and repairs discrepancies between primary and secondarymetadata. SHA-256 hash algorithm is used by default. Checksums are computed on external and internal files.Computationally intensive and time-consuming service. Runs according to the active service schedule.Scavenging Ensures that all objects in the repository have valid metadata, and reconstructs metadata in case the metadata islost or corrupted, but data files exist. The service verifies that both the primary metadata for each data object andthe copies of the metadata stored with the object data (secondary metadata) are complete, valid and in sync witheach other. Computationally intensive and time-consuming service. Scheduled service.Garbage Collection Reclaims storage space by purging hidden data and metadata for objects marked for deletion, or left behind byincomplete transactions. It also deletes old versions of objects that are eligible for pruning. When applicable, thedeletion triggers the shredding service. Scheduled service, not event driven.Duplicate Elimination Identifies and eliminates redundant objects in the repository, and merges duplicate data to free space. The hashsignature of external file representations is used to select objects as input to the service. These objects are thenchecked in a byte for byte manner to ensure that the data contents are indeed identical. Scheduled service.Shredding Overwrites storage locations where copies of the deleted object were stored in such a way that none of its dataor metadata can be reconstructed, for security reasons. Also called secure deletion. The default HCP shreddingalgorithm uses 3 passes to overwrite an object and is DoD 5220.22-M standard compliant. The algorithm isselected at install time. Event-driven only service, not scheduled. It is triggered by the deletion of an object markedfor shredding.Disposition Automatic cleanup of expired objects. All HCP namespaces can be configured to automatically delete objectsafter their retention period expires. Can be enabled or disabled both at the system and namespace level; enablingdisposition for a namespace has no effect if the service is disabled at the system level. Disposition service deletesonly current versions of versioned objects. Scheduled service.Compression Compresses object data to make more efficient use of system storage space. The space reclaimed bycompression can be used for additional storage. A number of configurable parameters are provided via SystemManagement Console. Scheduled service.Capacity Balancing Attempts to keep the usable storage capacity balanced (roughly equivalent) across all storage nodes in thesystem. If storage utilization for the nodes differs by a wide margin, the service moves objects around to bringthe nodes closer to a balanced state. Runs only when started manually. Additions and deletions of objects donot trigger the service. Typically, an authorized HCP service provider starts this service after adding new storagenodes to the system. In addition, while not part of the service, during normal system operation new objects tendto naturally spread among all storage nodes in the system in fairly even proportion. This is due to the nature of thestorage manager selection algorithm and resource monitoring of the administrative engine.Storage Tiering Determines which storage tiering strategy applies to an object, evaluates where the copies of the object shouldreside based on the rules in the applied service plan, and moves objects between running and spin-down storageas needed. Active only in spindown-capable HCP SAIN systems. Scheduled service.ConclusionHitachi Data Systems object storage solutions avoid the limitations of traditional file systems by intelligently storingcontent in far larger quantities and in a much more efficient manner. These solutions provide for the new demandsimposed by the explosion of unstructured data and its growing importance to organizations, their partners, theircustomers, their governments and their shareholders.
  21. 21. WHITE PAPER 21The Hitachi Data Systems object storage solutions treat file data, file metadata and custom metadata as a singleobject that is tracked and stored among a variety of storage tiers. With secure multitenancy and configurable attri-butes for each logical partition, the object store can be divided into a number of smaller virtual object stores thatpresent configurable attributes to support different service levels. This allows the object store to support a wide rangeof workloads, such as content preservation, data protection, content distribution and even cloud from a single physi-cal infrastructure. One infrastructure is far easier to manage than disparate silos of technology for each application orset of users. By integrating many key technologies in a single storage platform, Hitachi Data Systems object storagesolutions provide a path to short-term return on investment and significant long-term efficiency improvements. Theyhelp IT evolve to meet new challenges, stay agile over the long term and address future change and growth.
  22. 22. © Hitachi Data Systems Corporation 2013. All rights reserved. HITACHI is a trademark or registered trademark of Hitachi, Ltd. Microsoft, Windows and Active Directory are trademarks orregistered trademarks of Microsoft Corporation. All other trademarks, service marks, and company names are properties of their respective owners.Notice: This document is for informational purposes only, and does not set forth any warranty, expressed or implied, concerning any equipment or service offered or to be offered byHitachi Data Systems Corporation.WP-425-B DG May 2013Corporate Headquarters2845 Lafayette StreetSanta Clara, CA 96050-2639 USAwww.HDS.comRegional Contact InformationAmericas: +1 408 970 1000 or info@hds.comEurope, Middle East and Africa: +44 (0) 1753 618000 or info.emea@hds.comAsia Pacific: +852 3189 7900 or