Hitachi content platform custom object metadata enhancement tool


Published on

Hitachi Data Systems offers advanced metadata management capabilities for Hitachi Content Platform (HCP) with the HCP custom object metadata enhancement tool.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hitachi content platform custom object metadata enhancement tool

  1. 1. W H I T E P A P E R Hitachi Content Platform “Custom Aciduisismodo Dolore Eolore Object Metadata Enhancement Tool" Dionseq Uatummy Odolorem Vel Advanced Metadata Management Capabilities for Hitachi Content Platform By Christian Heiter, Michael Malaret and David Haberland of Hitachi Data Systems Federal Region and Clifford Grimm of Hitachi Content Platform Engineering at Hitachi Data Systems October 2011
  2. 2. 2 Table of Contents Executive Summary 3 Introduction 4 Customer Challenges 4 Hitachi Content Platform Custom Object Metadata Enhancement Tool: Standards, Performance and Custom Settings 5 Based on Open Standards 6 System Operation, Environment and Performance 8 User Settings and Customization 9 Hitachi Content Platform Custom Object Metadata Enhancement Tool: Architecture and Operation 9 Ingest Function Process Flow 10 Augment Function Process Flow 11 HCP Namespace Usage by HCP Custom Object Metadata Enhancement Tool 12 Source or Destination Locations 12 Reference Architecture and Host Implementation Guidelines 13 Example Proof of Concept Implementation 14 Parameters and Configuration Settings 14 Hitachi Content Platform Primer 15 About Hitachi Content Platform 15 Object-based Storage 16 Namespaces and Tenants 17 Namespace Access 17 REST Interface 17 Transmitting Data in Compressed Format 20 Data Access Permissions 20 Replication 21 Namespace Operations 22 REST Interface Primer 23 Service Offerings 24 Appendix A: References 25 Appendix B: Feedback 26
  3. 3. 3 Executive Summary Many organizations must typically manage multiple data stores, some of which contain raw data objects with a small amount of metadata while others contain related extended metadata. The metadata is usually custom metadata, which evolves over the life of the data object but cannot be stored with the object itself. Managing multiple disparate data stores adds considerable complexity and increases the total cost of ownership. System implementation complexity can be reduced by integrating the raw objects with their cor- responding metadata, while providing the ability to add custom metadata at any point in the future. If properly implemented, the new system will provide the capability for advanced searches, including a search across the metadata, itself. The Hitachi Content Platform (HCP) "custom object metadata enhancement tool" was developed to add custom metadata information to objects in an HCP repository. HCP provides an intelligent data store capability with retention and security policies, data protection and content search. The com- bination of HCP and this tool will reduce complexity and greatly expand the richness of a repository search, thus increasing the value of the data and providing more advanced decision making and inference capabilities. More powerful actionable intelligence will result from this broader search. While this custom tool was originally developed to enhance HCP objects with geospatial metadata, it was intentionally implemented to be metadata-type agnostic. Using this tool, any custom meta- data can be easily added to objects destined for an HCP repository during the ingest phase or after they are already in the repository using an augmentation operation. Any open source or proprietary tool that can extract the metadata from an input file can be used. The HCP custom object metadata enhancement tool is one of the initial components in a broader program to create a Hitachi Data Systems file and content services "ecosystem." This ecosystem will enhance the file and content solution product offerings from HDS with a set of tools that add capabilities and simplify usage in order to increase the value of the stored content. This document is intended for the technical reader. It provides a technical summary of the custom object metadata enhancement tool as well as a high-level introduction to HCP. No prior knowledge of HCP is expected from the reader. The anticipated result is a better understanding of HCP plus the custom tool solution and how it can add value to the data while reducing the total cost of ownership for the customer.
  4. 4. 4 Introduction Hitachi Data Systems has created a new tool called the Hitachi Content Platform (HCP) custom object metadata enhancement tool, which expands the capability of Hitachi Content Platform [1]. This tool allows file objects stored in HCP to be augmented with additional custom metadata infor- mation to significantly increase data correlation using HCPs index and search capability. Metadata enhancements will reduce the need for multiple data repositories containing duplicated objects, potentially simplifying the data architecture by integrating multiple disparate data stores. The resulting expanded content store will greatly increase the datas value and provide advanced search and correlation capabilities. This increased effectiveness of the content searches and timely actionable information will be increased as a result of the tool. Specific missions and applications can be supported, with HCP storing file objects such as images or other rich media plus their related custom metadata. The custom metadata could be proprietary, classified or based on open standards or formats. The metadata augmentation can be performed either during the initial object ingestion or by post-processing existing large data stores. The latter case allows a large repository to be updated with new information without having to re-ingest or create a new copy on another system. As needs change and new object information is available, additional metadata can be added. The custom object metadata enhancement tool will allow HCP product features to be utilized across new application spaces. HCP provides scalability to 40PB of storage, with high data integrity and data replication. Multiple virtual content platforms can be created from a single physical imple- mentation with all resulting tenants securely managed with individualized options for data retention policies, encryption, versioning and detailed audit logging. Other existing features in HCP will allow for distributed implementations to increase system resiliency. HCP also supports advanced Hitachi storage virtualization capabilities for even greater efficiency, scalability and flexibility. Customer Challenges Hitachi Content Platform with the custom object metadata enhancement tool may present a viable solution for organizations with one or more of the following challenges: ■■Very large data sets that have already been ingested into HCP, but which require enhancements of the stored information with custom metadata ■■Inability to add custom metadata while ingesting content into HCP ■■A need to cost-effectively enhance the search capabilities for large data stores across a larger information space for the same file objects ■■Data located in distributed locations but which would benefit from a distributed search capability ■■Policy management of the data sets ■■Enforced access rights and namespaces to security-protected data partitions ■■Disparate data stores with multiple data and accompanying metadata sets
  5. 5. 5 Hitachi Content Platform Custom Object Metadata Enhancement Tool: Standards, Performance and Custom Settings HCP custom object metadata enhancement tool is a standalone application that runs in conjunction with Hitachi Data Migrator software, powered by CommVault® (see Figure 1). It discovers objects in a local user directory, extracts metadata information from each object and creates an XML file containing the metadata; then the tool either ingests the file with the object into HCP or adds the information to the corresponding object previously ingested. Figure 1. Hitachi Content Platform Custom Metadata Enhancement Tool: Solution Architecture Key features of the HCP custom object metadata enhancement tool include: ■■Allows HCP file objects to be augmented with custom metadata ■■Creates custom metadata to be stored in XML format in HCP ■■Provides capability to either add custom metadata during the ingestion phase or to post-process and augment existing HCP file objects with custom metadata ■■Performs custom metadata operations either on local files or mounted remote directories con- taining the files
  6. 6. 6 ■■Runs periodically as a user-space application on any server ■■Provides tool parameter settings and customization capabilities, including: ■■ New file check and process interval ■■ Update or replace existing custom metadata if the object already exists in the HCP data store ■■ HCP source and destination location namespace ■■Enhances the value of the information in the HCP data store by allowing for more advanced searches ■■Provides an end-user pluggable custom metadata generation architecture ■■Provides whole object ingestion with HCP v4.1, allowing for a more efficient single write opera- tion ■■Interfaces through the HCP Representational State Transfer (REST) interface ■■Supported as a virtual machine HCP custom object metadata enhancement tool will periodically start the metadata extraction pro- cess. At this time it will either ingest the new files with the new metadata or add the new metadata to existing objects already in the HCP data store. The tool has provided an extensible custom meta- data generation architecture; this allows the user to configure the tool to call the appropriate external application. The callable metadata extraction application can be any open source or proprietary software that extracts key information from the file object. Based on Open Standards HCP custom object metadata enhancement tool will invoke user-pluggable applications to extract the metadata from the objects, and then reformat the data in an XML file to be ingested into HCP (see Figure 2). The XML open standard was selected because it will extend the useful life of the data and reduce the long-term operational costs, since it does not require proprietary tools to support proprietary formats. As new data becomes available, the existing XML-based information in the data store can be further enhanced by any new application that creates new metadata.
  7. 7. 7 Figure 2. XML-formatted Custom Metadata Sample Resulting from the FWtools Application: Includes Geospatial Information to Augment an Existing Hitachi Content Platform Object
  8. 8. 8 HCP custom object metadata enhancement tool is constructed to use HCP open standard REST [2] interface. This industry-standard interface is used for distributed hypermedia systems such as the World Wide Web and typically involves an HTTP context. REST removes the need for proprietary interfaces, which means it can quickly accelerate integration and long-term maintenance costs. It also provides the capability for simpler customization as mission needs require, even throughout the life of a long-term mission. System Operation, Environment and Performance Custom metadata can be added to the HCP data store in 2 different manners. The 1st allows the enhancement to be performed during the ingest operation. In this case, new objects are found in the user directory by HCP custom object metadata enhancement tool. The tool will first call the pluggable application to extract the metadata and create an XML representation of the resulting metadata. Then it will ingest the object and the corresponding metadata into the HCP data store. The 2nd functional method allows existing HCP file objects to be enhanced (augmented) with new metadata. HCP custom object metadata enhancement tool will see the new file in the input direc- tory. If the exact object already exists in the data store, then the tool will call the external program to extract the metadata, convert it to an XML representation and ingest the newly formed metadata for the corresponding object. HCP custom object metadata enhancement tool can be configured to search local directories on the same machine where it is running, or it can search for objects on files located in a mounted remote directory. The tool also has been tested to run in a virtual machine, pulling data from the local directory inside the virtual machine.
  9. 9. 9 Performance can be enhanced with HCP v4.1 since it provides the capability for a whole object in- gestion operation. This allows a single write operation to be performed with both the file object and the corresponding metadata, thus saving network bandwidth and system resources. User Settings and Customization HCP custom object metadata enhancement tool provides a number of user-configurable settings, including: ■■Metadata extraction application. This is the application that will be run on each file to extract the relevant metadata. This is implemented as a pluggable interface. ■■Process run interval period. The user can select the interval when the input user directory is checked for new files and processed. This setting allows for adaptation to situations ranging from new data provided at an extremely high rate to when the new data is infrequently provided. ■■Update or replace selection. If HCP custom object metadata enhancement tool discovers that the object already exists in the HCP data store, then the user has the option to either update or replace the existing custom metadata. ■■Input directory. The user can specify either a local directory on the same machine where HCP custom object metadata enhancement tool is running, or a remote directory that has been previ- ously mounted and is accessible. ■■HCP destination namespace. The user can select the destination HCP namespace. ■■HCP namespace authorization. The user can specify the HCP access authorization information for the destination HCP namespace. ■■File process count. The user can specify the number of files that will be processed in each interval. Adjusting this will require some tuning since there will be variability in the implementation. Examples include: ■■ Plug-in applications will process files at varying speeds. ■■ File sizes will vary. ■■ File addition rates will vary. HCP custom object metadata enhancement tool provides a custom metadata generation architecture that is end user pluggable. Therefore, any open-source, customer-proprietary or vendor-proprietary metadata extraction application can be used and changed as needed. Customization of the application, itself, can be easily performed by individuals with Java experience. C-language-based interfaces to the lower level system operations are provided with HCP custom object metadata enhancement tool to allow for further customization, as required. Hitachi Content Platform Custom Object Metadata Enhancement Tool: Architecture and Operation HCP custom object metadata enhancement tool encapsulates a number of functions into an ex- tensible and customizable tool suite (see Figure 3). It watches for new files in a local user directory
  10. 10. 10 and runs each file through an external metadata extraction program to see if there is any metadata. Then, it ingests the resulting metadata to augment the corresponding HCP file object. If the file object does not already exist in the data store, then the tool will ingest the object, itself, as well. The tools running application program will wake up on a periodic basis to perform these functions. Figure 3. Components and Interfaces between Hitachi Content Platform and Hitachi Content Platform Custom Object Metadata Enhancement Tool The REST interface is used for all communication between HCP custom object metadata enhance- ment tool and HCP. This is done for several reasons, including portability, supportability and perfor- mance. REST is used by many database and distributed web applications and is implemented using a behavioral model. The beauty is: REST is an open standard and offers simplicity in its stateless model. This makes it much easier to integrate distributed components. There are 2 operational modes for HCP custom object metadata enhancement tool: an in-band file object and metadata ingest mode, and an out-of-band metadata augmentation mode. Both are described below. Ingest Function Process Flow The ingest function is an in-band mode whereby new file objects are pre-processed to extract the custom metadata before ingestion into the HCP data store. This function is useful when new data is being ingested so that the accompanying metadata is added at the same time as the file object. Detailed operation of the ingest function is shown in Figure 4. At a user-defined periodic interval, the HCP custom object metadata enhancement tool process will wake up and begin searching for new files in the user directory. The list of files is processed in order by the tool; each file in the resulting list is provided as input to the external metadata extraction program. The extraction
  11. 11. 11 program will read the specified file and send the resulting XML-formatted metadata information to the tool . The tool will read the specified file and send the information pair (object + metadata) to HCP. HCP will then write each component to the respective location in the data store , completing the ingest operation. Figure 4. Detailed HCP Custom Object Metadata Enhancement Tool Ingest Process Flow: Post-processor Extracts Custom Metadata, Augments Existing HCP Objects. Augment Function Process Flow The augment function is an out-of-band mode whereby existing file objects are post-processed in order to augment the stored information with new object metadata. This is useful when a large amount of data already exists in HCP. Otherwise, all of the objects would have to be re-ingested into another data repository, which could take considerable time and network resources. Detailed operation of the augment function is shown in Figure 5. As indicated by marker , the customer has previously ingested a large number of files into the HCP data store. HCP custom object metadata enhancement tool will periodically wake up and query the existing HCP data store, searching for files without metadata, which had not been modified since the previous query . Files matching the criteria are supplied to the metadata extraction application , which will read the file object from the local directory and provide any custom metadata from the files formatted in an XML format. HCP custom object metadata enhancement tool will then ingest the custom metadata to augment the corresponding HCP objects .
  12. 12. 12 Figure 5. Detailed HCP Custom Metadata Enhancement Tool Process Flow During an Augment Function: Post-processor Extracts Custom Metadata, Augments Existing HCP Objects HCP Namespace Usage by HCP Custom Object Metadata Enhancement Tool HCP provides access to the repository as partitioned namespaces. A namespace is a logical grouping of objects such that the objects in one namespace are not visible in any other namespace. To the user of a namespace, the namespace is the repository and it may appear as a network- accessible mount point. This brief introduction allows for the discussion about source and destination locations, but more detail on HCP and namespaces are provided below. Source or Destination Locations HCP custom object metadata enhancement tool provides flexibility in the input source location as well as the output destination. In the case of an ingest operation, the input source could be from a file system on the machine where HCP custom object metadata enhancement tool is running, or from a file system on a network-mounted remote directory. For an augmentation operation, the ob- jects would be sourced from the root folder in either an HCP default namespace or an authenticated namespace. The destination of any HCP custom object metadata enhancement tool file operation is always to an HCP repository, but either namespace is allowed. The destination namespace can be the same as the source namespace, or it can be a different namespace. The path should contain the root folder within the appropriate namespace. A summary of the allowable locations is shown in Table 1.
  13. 13. 13 TABLE 1. HCP CUSTOM METADATA ENHANCEMENT TOOL: ALLOWABLE SOURCE AND DESTINATION LOCATIONS Object Location File System HCP Default HCP Authenticated Namespace Namespace Source Input Yes Yes Yes Destination Output No Yes Yes Reference Architecture and Host Implementation Guidelines In a typical implementation, HCP custom object metadata enhancement tool runs on a host ma- chine, which is not part of HCP. The tool requires minimal resources, and the host machine could either be a physical machine or a virtual machine. The processor, memory and storage requirements are driven more by the plug-in metadata extraction application as well as the size of the objects and the required object process rate. If possible, administrators should provide adequate memory to al- low the operating system to keep the object, as well as the metadata extraction application resident in memory since the application will be called repeatedly (i.e. for every new object to be processed). Since HCP custom object metadata enhancement tool requires only a single machine (physical or virtual), its reference architecture is more dependent on the HCP implementation than the tool node. Figure 6 depicts an example implementation with the tools physical node connected to a 4-node HCP 500 system. This HCP was configured with failover and uses modular storage with LUNs pro- visioned from individual RAID groups. The tool node in this diagram shows the new content being sourced from either a local directory on the node, or from a remote directory (but not both).
  14. 14. 14 Figure 6. HCP Custom Object Metadata Enhancement Tool Reference Architecture: HCP Implementation as a 4-node HCP 500 Supporting Failover Using Modular Storage with LUNs Provisioned from Individual RAID Groups Example Proof of Concept Implementation As a proof of concept demonstration, both HCP custom object metadata enhancement tool func- tions were utilized. The tool was first used to enhance existing objects previously ingested, but which required augmentation with newly provided geospatial metadata information. The demonstra- tion also ingested new objects augmented with the corresponding geospatial-based metadata. The pluggable metadata application used was an open-source geographic information system (GIS) program called FWtools [3]. FWtools provides the ability to view geospatial information from a variety of format types, while also providing the ability to extract the metadata for the supported file types including the National Imagery Transmission Format (NITF) [4]. NITF files are used by federal agencies and system integrators focused on correlating information in the objects with geospatial information, all from multiple events and data sources. Parameters and Configuration Settings HCP custom object metadata enhancement tool has a number of tunable parameters and configuration settings that must be properly set before starting normal operation. All of these settings can be found in the "" file. All of the settings are listed in Table 2 along with the corresponding description.
  15. 15. 15 TABLE 2. TUNABLE HCP CUSTOM OBjECT METADATA ENHANCEMENT TOOL PARAMETERS AND CONFIGURATION SETTINGS Parameter Description source.path Local path to the directory that contains the data to ingest source.maxBatchSize Maximum number of file handles to "batch" per loop iteration destination.user HCP data access: user to use for ingest destination.password HCP data access password for destination.user account destination.passwordEncoded Indication if the destination.password value is encoded in md5 format destination.rootpath Root path REST URL to HCP to place content metadata.classes Comma separated, ordered list of class(es) to load to extract metadata from files execution.loopcount Number of times to load up the batch with files to process execution.stopRequestFile Name of file in process, local directory to watch for to indicate to stop processing execution.pauseRequestFile Name of file on local machine to watch for to indicate to pause processing: For as long as the file exists, the program will be paused. Delete the file to resume. Changing this value while the program is in the paused state will not cause the new value to be used until resumed. execution.deleteSourceFiles Indicates whether the source files should be deleted after written to HCP: If the file does not have correct permissions, attempt to change and try again. execution.forceDeleteSourceFiles Indicates whether the source file permissions should be forced to be deleted by changing the source file permissions execution.deleteSourceEmptyDirs Indicates whether the empty directories in the source files should be periodi- cally cleaned up execution.updateMetadata Indicates whether metadata should be updated for existing metadata on objects in HCP: If set to false, source files will be ignored (but deleted, if indicated). execution.pauseSleepInSeconds Number of seconds to sleep during pause for between checks for resume execution.batchSleepInSecond Number of seconds to sleep at end of batch run before attempting another batch execution.debugging.httpheaders Indicates whether HTTP headers should be written to the console (stdout) Hitachi Content Platform Primer The functionality described here is based on Hitachi Content Platform version 4.1, but some content might be applicable to prior HCP versions. About Hitachi Content Platform Hitachi Content Platform is a distributed storage system designed to support large, growing re- positories of fixed-content data. HCP stores objects that include both data and the corresponding metadata. It distributes these objects across the storage space but still presents them as files in a standard directory structure.
  16. 16. 16 HCP provides access to stored objects through the HTTP protocol, as well as through user interfaces such as the namespace browser and search console. HCP is a combination of hardware and software that provides an object-based data storage envi- ronment. An HCP repository stores all types of data, including simple text files as well as multigiga- byte satellite, medical or database images. HCP provides easy access to the repository for adding, retrieving and deleting the stored data. HCP uses write once, read many (WORM) storage technol- ogy and a variety of policies and internal processes to ensure the integrity of the stored data and the efficient use of storage capacity. Key features of HCP include: ■■Scalability up to 40PB of storage in a single cluster ■■Capability to provision a single cluster into multiple virtual content platforms ("tenants"), each with its own unique configuration and access control to manage data placement and content distribu- tion to appropriate audiences ■■Connection capabilities to a wide range of applications and protocols via http, REST, NFS, CIFS and more ■■High data integrity, with data integrity checking, RAID-6, replication, encryption, WORM, multiple versions of objects and audit logging ■■Automation of data migration from old storage to new storage ■■Management and enforcement policies for retention, disposal, shredding and other compliance and lifecycle management operations ■■Increased value of unstructured data using metadata and custom metadata for automation and search ■■Capability to create a single, multipurpose, unstructured data platform for archive, cloud and backup capabilities ■■Capability to monitor and report on storage and bandwidth use of different tenants for charge- back ■■Enhanced management capabilities with comprehensive interfaces for cloud and distributed environments ■■Scalability to branch and remote offices via Hitachi Data Ingestor The following section introduces basic HCP concepts and includes information regarding HCP namespaces. Object-based Storage HCP stores objects in the repository. Each object permanently associates data HCP receives (for example, a file, an image or a database) with information about that data, called metadata. An object encapsulates: ■■Fixed-content data, which is an exact digital reproduction of data as it existed before it was stored. Once it is in the repository, this fixed content data cannot be modified. ■■System metadata offers system-managed properties that describe the fixed-content data (for example, its size and creation date). System metadata includes settings, such as retention and
  17. 17. 17 data protection level, that influence how transactions and internal processes affect the object. ■■Custom metadata is metadata that a user or application provides to further describe an object. It is typically specified as XML and can be used to create self-describing objects. Future users and applications can use this metadata to understand and repurpose the object content. Namespaces and Tenants An HCP repository is partitioned into namespaces. A namespace is a logical grouping of objects such that the objects in one namespace are not visible in any other namespace. To the user of a namespace, the namespace is the repository. Namespaces provide a mechanism for separating the data stored for different applications, business units, or customers. For example, a deployment could have one namespace for accounts receivable and another for accounts payable. Namespaces also enable operations to work against selected subsets of repository objects. For example, a query could be performed that targets the accounts receivable and accounts payable namespaces but not the employee namespace. Namespaces are owned and managed by administrative entities called tenants. A tenant typically corresponds to an actual organization such as a company or a division or department within a com- pany. A tenant can also correspond to an individual person. Namespace Access HCP provides several techniques for accessing and managing data in the namespace. These include: ■■REST interface ■■Metadata query API ■■Namespace browser ■■Search console ■■Hitachi Data Migrator ■■HCP client tools REST Interface Clients use an HTTP-based REST interface to access the namespace. Using this interface, actions can be performed such as adding objects to the namespace, viewing and retrieving objects, chang- ing object metadata and deleting objects. The namespace can be accessed programmatically with applications, interactively with a command-line tool or through a graphical user interface (GUI). Figure 7 shows the relationship between original data, objects in a namespace and the HTTP access protocol.
  18. 18. 18 Figure 7. Client-HCP Namespace: Relationship between Original Data, Objects in a Namespace and HTTP Access to the HCP Data Store Metadata Query API HCP allows clients to use HTTP requests to find objects that meet specific criteria, including object change time, index setting, operations on the object and the object location. If the client has the appropriate permissions, it can query multiple namespaces, and a single request can query multiple HCP namespaces and the default namespace. A metadata query to HCP will return a set of records containing metadata that describes the matching objects. If the query matches a large number of objects, multiple requests can be used to page sequentially through the records and retrieve only a specific number of records in response to each request. Namespace Browser The HCP namespace browser provides management of the namespace content and the ability to view information about namespaces. The browser functions include: ■■List, view, and retrieve objects and versions of objects ■■Create empty directories ■■Store and delete objects ■■Display namespace information, including: ■■ The namespaces that can be accessed ■■ Retention classes for use within a namespace ■■ Permissions for namespace access ■■ Statistics about a namespace Search Console The HCP search console is an easy-to-use web application that provides the capability to search for and manage objects based on specified criteria. For example, a search for objects stored before a certain date or larger than a specified size could then be deleted or marked accordingly to prevent them from being deleted.
  19. 19. 19 The search console works with either of 2 implementations, which must be enabled at the HCP system level: ■■The Hitachi Data Discovery Suite (HDDS) search facility interacts with HDDS, which performs searches and returns results to the HCP search console. HDDS is a separate product from HCP. ■■The HCP search facility is integrated with HCP and works internally to perform searches and return results to the search console. Only one of the search facilities can be enabled in the HCP GUI at any given time. If neither is enabled, HCP does not support using the search console to search namespaces. The system associated with the enabled search facility is called the active search system. The active search system (that is, HDDS or HCP) maintains an index of data objects in each search- enabled namespace. The index is based on object content and metadata. The active search system uses the index for fast retrieval of search results. When objects are added to or removed from the namespace or when object metadata changes, the active search system automatically updates the index to keep it current. For information on using the search console, please reference [5]. Note: Not all namespaces support search if the namespace administrator has not enabled search. Hitachi Data Migrator Hitachi Data Migrator is a high-performance, multithreaded client-side utility for viewing, copying, and deleting data. Data Migrator functions include: ■■Copy objects, files and directories between local file systems, HCP namespaces and earlier HCP archives ■■Delete objects, files and directories, including performing bulk delete operations ■■View the content of objects and files, including the content of old versions of objects ■■Rename files and directories on the local file system ■■View object, file and directory properties ■■Create empty directories ■■Add, replace or delete custom metadata for objects Data Migrator has both a GUI and a command-line interface (CLI). For information on using Data Migrator, please reference [6]. HCP Client Tools HCP comes with a set of command-line tools that allows data to be copied or moved between a client and an HCP system. The tools also provide a search capability using specified criteria. Additionally, empty directories can be created in a local or remote file system or on an HCP system. The client tools support multiple namespace access protocols and multiple client platforms. The command syntax is the same for all supported configurations.
  20. 20. 20 For information on installing and using the client tools, please reference [7]. Note: For most purposes, the HCP client tools have been superseded by Hitachi Data Migrator. However, they have some features, such as finding files that are not available in Data Migrator. Transmitting Data in Compressed Format Object data or custom metadata can be compressed in gzip format to save bandwidth before sending it to HCP. The PUT request contains the subrequest to tell HCP that data is compressed. HCP will then know to decompress the data before storing it. Similarly, in a GET request, HCP can be told to return object data or custom metadata in compressed format. In this case, the returned data must first be decompressed before use. HCP supports only the gzip algorithm for compressed data transmission. HCP can be told that the request body is compressed by including a Content-Encoding header with the value gzip. In this case, HCP uses the gzip algorithm to decompress the received data. HCP can be told to send a compressed response by specifying an Accept-Encoding header. If the header specifies gzip, a list of compression algorithms that includes gzip, or *, HCP uses the gzip algorithm to compress the data before sending it. For examples of sending and receiving objects in compressed format, please reference Chapter 4, "Working with objects and versions" in [8]. Notes: ■■HCP can also compress and decompress metadata query API requests and responses. For more information on this, please reference the HCP product document titled "Using a Namespace," in the section titled "Request HTTP elements." ■■Since HCP normally compresses stored object data and custom metadata, it is unnecessary to explicitly compress objects for storage. However, if gzip-compressed objects or custom metadata are to be stored, do not use a Content-Encoding header. To retrieve stored gzip-com- pressed data, do not use an Accept-Encoding header. Data Access Permissions All namespace access clients must have permission to access and perform actions on data. Table 3 describes the permissions and the operations allowed.
  21. 21. 21 TABLE 3. HCP PERMISSIONS AND ALLOWABLE OPERATIONS Permission Operations Read y■Retrieve objects and system metadata. y■Check for object existence. y■Check for and retrieve custom metadata. Write y■Add objects. y■Create directories. y■ and change system and custom metadata. Set Delete Delete objects, empty directories and remove custom metadata. Purge Delete objects and their historical versions. Privileged y■Delete or purge objects regardless of retention. y■Place objects on hold. Search Search for objects. For information on this, please reference Chapter 8 “Using the HCP metadata queryAPI” Conduct search in [8]. Some operations require multiple permissions. For example, to place an object on hold, the user must have both write and privileged permissions. Similarly, performing a privileged purge will require delete, privileged and purge permissions. Permissions are set at 2 levels: ■■Namespace-level permissions. This permission mask specifies the maximum permissions for any user that accesses the namespace. ■■Data access account. This specifies permissions for an individual user. Accessing a namespace will require a data access account with a username and password. The account specifies available namespaces and associated permissions. The required permissions for a particular operation must be enabled in both the namespace-level permission mask and the corresponding data access account permissions. Replication Replication is the process of keeping selected tenants and namespaces in 2 HCP systems in sync with each other. Basically, this entails copying object creations, deletions and metadata changes from one system to the other. HCP also replicates the tenant and namespace configuration, data access accounts and retention classes. The HCP system in which the objects are initially created is called the primary system. The 2nd system is called the replica. Replication has several purposes, including: ■■If the primary system becomes unavailable (for example, due to network issues), the replica can provide continued data availability. ■■If the primary system suffers irreparable damage, the replica can serve as a source for disaster recovery.
  22. 22. 22 ■■If an object cannot be read from the primary system (for example, because a server is unavail- able), HCP can try to read it from the replica. Note: Replication is an add-on feature to HCP. Not all systems include it. Namespace Operations Familiar commands and tools are used to perform operations on a namespace. Some operations relate to specific types of metadata. For more information on this metadata, please reference Chapter 2, "Understanding objects" section in [8]. Operations that store or retrieve data can optionally transmit the data in gzip-compressed format. For more information on this, see the individual commands used for those operations. Operation Restrictions The operations that can be performed are subject to the following restrictions: ■■The HTTP request headers must include valid user information. ■■The namespace must be configured to allow HTTP or HTTPS access from the client IP address. ■■The namespace configuration and user permissions must allow the operation. For information on user permissions, please reference Chapter 10, "Using the Namespace Browser" in [8]. Supported Operations The following operations can be performed on a namespace: ■■Write data to the namespace. ■■If versioning is enabled, store new versions of existing objects. ■■Override default metadata when storing an object. ■■Create an empty directory in the namespace. ■■Check for object existence. ■■View the content of an object. ■■View object metadata. ■■Delete an object. ■■Delete an empty directory. ■■Set retention for an object that has none. ■■Extend the retention period for an object. ■■Set or change a retention class for an object. ■■Hold or release an object. ■■Enable shredding of an object. ■■Change the index setting for an object. ■■Add, replace or delete custom metadata for an object. ■■Add or retrieve object data and custom metadata in a single operation.
  23. 23. 23 ■■Check for and read custom metadata. ■■List retention classes available in the namespace. ■■List namespace permissions for the user. ■■List the namespace statistics. ■■List the accessible namespaces. ■■Use the HCP metadata query API to get information about objects that meet specified criteria in one or more namespaces. Prohibited operations HCP never allows users to: ■■Rename an object or directory. ■■Overwrite a successfully stored object. However, if versioning is enabled, new versions of an object can be written. ■■Modify the fixed-content portion of an object. ■■Delete an object that is under retention if the privileged permission is not granted or if the namespace is configured to prevent this operation. ■■Delete a directory that contains one or more objects. ■■Shorten an explicitly set retention period. REST Interface Primer The Representational State Transfer (REST) interface is a behavioral model used by many database and distributed web applications. Its beauty lies is in its simplicity. From the Wikipedia definition: REST-style architectures consist of clients and servers. Clients initiate requests to servers; servers process requests and return appropriate responses. Requests and responses are built around the transfer of representations of resources. A resource can be essentially any coherent and meaningful concept that may be addressed. A representation of a resource is typically a document that captures the current or intended state of a resource. At any particular time, a client can either be in transition between application states or "at rest." A client in a rest state is able to interact with its user, but creates no load and consumes no per-client storage on the servers or on the network. The client begins sending requests when it is ready to make the transition to a new state. While one or more requests are outstanding, the client is considered to be in transition. The representation of each application state contains links that may be used next time the client chooses to initiate a new state transition. REST was initially described in the context of HTTP, but is not limited to that protocol. RESTful architectures can be based on other Application Layer protocols if they already provide a rich and uniform vocabulary for applications based on the transfer of meaningful representational state. RESTful applications maximize the use of the pre-
  24. 24. 24 existing, well-defined interface and other built-in capabilities provided by the chosen network protocol, and minimize the addition of new application-specific features on top of it. Service Offerings Customization and support services are available. Please contact your HDS Account Manager for additional information.
  25. 25. 25 Appendix A: References [1] Hitachi Content Platform (HCP): platform.pdf [2] REST interface: [3] FWTools for GIS imaging: [4] National Imagery Transmission Format (NITF) files: Transmission_Format [5] HCP "Searching Namespaces" manual, part of the HCP Product Documentation Set [6] HCP "Using HCP Data Migrator" manual, part of the HCP Product Documentation Set [7] HCP "Using the HCP Client Tools" manual, part of the HCP Product Documentation Set [8] HCP "Using a Namespace" manual, part of the HCP Product Documentation Set
  26. 26. 26 Appendix B: Feedback Hitachi Data Systems welcomes your feedback. Please share your thoughts by sending an email message to,, or Please be sure to include the title of this white paper in your email message.
  27. 27. Corporate Headquarters Regional Contact Information750 Central Expressway Americas: +1 408 970 1000 or info@hds.comSanta Clara, California 95050-2627 USA Europe, Middle East and Africa: +44 (0) 1753 618000 or Asia Pacific: +852 3189 7900 or is a registered trademark of Hitachi, Ltd., in the United States and other countries. Hitachi Data Systems is a registered trademark and service mark of Hitachi, Ltd., in the UnitedStates and other countries.All other trademarks, service marks and company names in this document or website are properties of their respective owners.Notice: This document is for informational purposes only, and does not set forth any warranty, expressed or implied, concerning any equipment or service offered or to be offered byHitachi Data Systems Corporation.© Hitachi Data Systems Corporation 2011. All Rights Reserved. WP-410-A DG October 2011