Hitachi Data Systems offers advanced metadata management capabilities for Hitachi Content Platform (HCP) with the HCP custom object metadata enhancement tool.
1. W H I T E P A P E R
Hitachi Content Platform “Custom
Aciduisismodo Dolore Eolore
Object Metadata Enhancement Tool"
Dionseq Uatummy Odolorem Vel
Advanced Metadata Management Capabilities for
Hitachi Content Platform
By Christian Heiter, Michael Malaret and David Haberland of Hitachi Data
Systems Federal Region and Clifford Grimm of Hitachi Content Platform
Engineering at Hitachi Data Systems
October 2011
2. 2
Table of Contents
Executive Summary 3
Introduction 4
Customer Challenges 4
Hitachi Content Platform Custom Object Metadata Enhancement Tool:
Standards, Performance and Custom Settings 5
Based on Open Standards 6
System Operation, Environment and Performance 8
User Settings and Customization 9
Hitachi Content Platform Custom Object Metadata Enhancement Tool:
Architecture and Operation 9
Ingest Function Process Flow 10
Augment Function Process Flow 11
HCP Namespace Usage by HCP Custom Object Metadata Enhancement Tool 12
Source or Destination Locations 12
Reference Architecture and Host Implementation Guidelines 13
Example Proof of Concept Implementation 14
Parameters and Configuration Settings 14
Hitachi Content Platform Primer 15
About Hitachi Content Platform 15
Object-based Storage 16
Namespaces and Tenants 17
Namespace Access 17
REST Interface 17
Transmitting Data in Compressed Format 20
Data Access Permissions 20
Replication 21
Namespace Operations 22
REST Interface Primer 23
Service Offerings 24
Appendix A: References 25
Appendix B: Feedback 26
3. 3
Executive Summary
Many organizations must typically manage multiple data stores, some of which contain raw data
objects with a small amount of metadata while others contain related extended metadata. The
metadata is usually custom metadata, which evolves over the life of the data object but cannot be
stored with the object itself. Managing multiple disparate data stores adds considerable complexity
and increases the total cost of ownership.
System implementation complexity can be reduced by integrating the raw objects with their cor-
responding metadata, while providing the ability to add custom metadata at any point in the future.
If properly implemented, the new system will provide the capability for advanced searches, including
a search across the metadata, itself.
The Hitachi Content Platform (HCP) "custom object metadata enhancement tool" was developed to
add custom metadata information to objects in an HCP repository. HCP provides an intelligent data
store capability with retention and security policies, data protection and content search. The com-
bination of HCP and this tool will reduce complexity and greatly expand the richness of a repository
search, thus increasing the value of the data and providing more advanced decision making and
inference capabilities. More powerful actionable intelligence will result from this broader search.
While this custom tool was originally developed to enhance HCP objects with geospatial metadata,
it was intentionally implemented to be metadata-type agnostic. Using this tool, any custom meta-
data can be easily added to objects destined for an HCP repository during the ingest phase or after
they are already in the repository using an augmentation operation. Any open source or proprietary
tool that can extract the metadata from an input file can be used.
The HCP custom object metadata enhancement tool is one of the initial components in a broader
program to create a Hitachi Data Systems file and content services "ecosystem." This ecosystem
will enhance the file and content solution product offerings from HDS with a set of tools that add
capabilities and simplify usage in order to increase the value of the stored content.
This document is intended for the technical reader. It provides a technical summary of the custom
object metadata enhancement tool as well as a high-level introduction to HCP. No prior knowledge
of HCP is expected from the reader. The anticipated result is a better understanding of HCP plus the
custom tool solution and how it can add value to the data while reducing the total cost of ownership
for the customer.
4. 4
Introduction
Hitachi Data Systems has created a new tool called the Hitachi Content Platform (HCP) custom
object metadata enhancement tool, which expands the capability of Hitachi Content Platform [1].
This tool allows file objects stored in HCP to be augmented with additional custom metadata infor-
mation to significantly increase data correlation using HCP's index and search capability. Metadata
enhancements will reduce the need for multiple data repositories containing duplicated objects,
potentially simplifying the data architecture by integrating multiple disparate data stores.
The resulting expanded content store will greatly increase the data's value and provide advanced
search and correlation capabilities. This increased effectiveness of the content searches and timely
actionable information will be increased as a result of the tool.
Specific missions and applications can be supported, with HCP storing file objects such as images
or other rich media plus their related custom metadata. The custom metadata could be proprietary,
classified or based on open standards or formats. The metadata augmentation can be performed
either during the initial object ingestion or by post-processing existing large data stores. The latter
case allows a large repository to be updated with new information without having to re-ingest or
create a new copy on another system. As needs change and new object information is available,
additional metadata can be added.
The custom object metadata enhancement tool will allow HCP product features to be utilized across
new application spaces. HCP provides scalability to 40PB of storage, with high data integrity and
data replication. Multiple virtual content platforms can be created from a single physical imple-
mentation with all resulting tenants securely managed with individualized options for data retention
policies, encryption, versioning and detailed audit logging. Other existing features in HCP will allow
for distributed implementations to increase system resiliency. HCP also supports advanced Hitachi
storage virtualization capabilities for even greater efficiency, scalability and flexibility.
Customer Challenges
Hitachi Content Platform with the custom object metadata enhancement tool may present a viable
solution for organizations with one or more of the following challenges:
■■Very large data sets that have already been ingested into HCP, but which require enhancements
of the stored information with custom metadata
■■Inability to add custom metadata while ingesting content into HCP
■■A need to cost-effectively enhance the search capabilities for large data stores across a larger
information space for the same file objects
■■Data located in distributed locations but which would benefit from a distributed search capability
■■Policy management of the data sets
■■Enforced access rights and namespaces to security-protected data partitions
■■Disparate data stores with multiple data and accompanying metadata sets
5. 5
Hitachi Content Platform Custom Object
Metadata Enhancement Tool: Standards,
Performance and Custom Settings
HCP custom object metadata enhancement tool is a standalone application that runs in conjunction
with Hitachi Data Migrator software, powered by CommVault® (see Figure 1). It discovers objects
in a local user directory, extracts metadata information from each object and creates an XML file
containing the metadata; then the tool either ingests the file with the object into HCP or adds the
information to the corresponding object previously ingested.
Figure 1. Hitachi Content Platform Custom Metadata Enhancement Tool: Solution
Architecture
Key features of the HCP custom object metadata enhancement tool include:
■■Allows HCP file objects to be augmented with custom metadata
■■Creates custom metadata to be stored in XML format in HCP
■■Provides capability to either add custom metadata during the ingestion phase or to post-process
and augment existing HCP file objects with custom metadata
■■Performs custom metadata operations either on local files or mounted remote directories con-
taining the files
6. 6
■■Runs periodically as a user-space application on any server
■■Provides tool parameter settings and customization capabilities, including:
■■ New file check and process interval
■■ Update or replace existing custom metadata if the object already exists in the HCP data store
■■ HCP source and destination location namespace
■■Enhances the value of the information in the HCP data store by allowing for more advanced
searches
■■Provides an end-user pluggable custom metadata generation architecture
■■Provides whole object ingestion with HCP v4.1, allowing for a more efficient single write opera-
tion
■■Interfaces through the HCP Representational State Transfer (REST) interface
■■Supported as a virtual machine
HCP custom object metadata enhancement tool will periodically start the metadata extraction pro-
cess. At this time it will either ingest the new files with the new metadata or add the new metadata
to existing objects already in the HCP data store. The tool has provided an extensible custom meta-
data generation architecture; this allows the user to configure the tool to call the appropriate external
application. The callable metadata extraction application can be any open source or proprietary
software that extracts key information from the file object.
Based on Open Standards
HCP custom object metadata enhancement tool will invoke user-pluggable applications to extract
the metadata from the objects, and then reformat the data in an XML file to be ingested into HCP
(see Figure 2). The XML open standard was selected because it will extend the useful life of the data
and reduce the long-term operational costs, since it does not require proprietary tools to support
proprietary formats. As new data becomes available, the existing XML-based information in the data
store can be further enhanced by any new application that creates new metadata.
7. 7
Figure 2. XML-formatted Custom Metadata Sample Resulting from the FWtools Application: Includes
Geospatial Information to Augment an Existing Hitachi Content Platform Object
8. 8
HCP custom object metadata enhancement tool is constructed to use HCP open standard REST
[2] interface. This industry-standard interface is used for distributed hypermedia systems such as
the World Wide Web and typically involves an HTTP context. REST removes the need for proprietary
interfaces, which means it can quickly accelerate integration and long-term maintenance costs. It
also provides the capability for simpler customization as mission needs require, even throughout the
life of a long-term mission.
System Operation, Environment and Performance
Custom metadata can be added to the HCP data store in 2 different manners. The 1st allows the
enhancement to be performed during the ingest operation. In this case, new objects are found in
the user directory by HCP custom object metadata enhancement tool. The tool will first call the
pluggable application to extract the metadata and create an XML representation of the resulting
metadata. Then it will ingest the object and the corresponding metadata into the HCP data store.
The 2nd functional method allows existing HCP file objects to be enhanced (augmented) with new
metadata. HCP custom object metadata enhancement tool will see the new file in the input direc-
tory. If the exact object already exists in the data store, then the tool will call the external program to
extract the metadata, convert it to an XML representation and ingest the newly formed metadata for
the corresponding object.
HCP custom object metadata enhancement tool can be configured to search local directories on
the same machine where it is running, or it can search for objects on files located in a mounted
remote directory. The tool also has been tested to run in a virtual machine, pulling data from the
local directory inside the virtual machine.
9. 9
Performance can be enhanced with HCP v4.1 since it provides the capability for a whole object in-
gestion operation. This allows a single write operation to be performed with both the file object and
the corresponding metadata, thus saving network bandwidth and system resources.
User Settings and Customization
HCP custom object metadata enhancement tool provides a number of user-configurable settings,
including:
■■Metadata extraction application. This is the application that will be run on each file to extract the
relevant metadata. This is implemented as a pluggable interface.
■■Process run interval period. The user can select the interval when the input user directory is
checked for new files and processed. This setting allows for adaptation to situations ranging from
new data provided at an extremely high rate to when the new data is infrequently provided.
■■Update or replace selection. If HCP custom object metadata enhancement tool discovers that
the object already exists in the HCP data store, then the user has the option to either update or
replace the existing custom metadata.
■■Input directory. The user can specify either a local directory on the same machine where HCP
custom object metadata enhancement tool is running, or a remote directory that has been previ-
ously mounted and is accessible.
■■HCP destination namespace. The user can select the destination HCP namespace.
■■HCP namespace authorization. The user can specify the HCP access authorization information
for the destination HCP namespace.
■■File process count. The user can specify the number of files that will be processed in each
interval. Adjusting this will require some tuning since there will be variability in the implementation.
Examples include:
■■ Plug-in applications will process files at varying speeds.
■■ File sizes will vary.
■■ File addition rates will vary.
HCP custom object metadata enhancement tool provides a custom metadata generation
architecture that is end user pluggable. Therefore, any open-source, customer-proprietary or
vendor-proprietary metadata extraction application can be used and changed as needed.
Customization of the application, itself, can be easily performed by individuals with Java experience.
C-language-based interfaces to the lower level system operations are provided with HCP custom
object metadata enhancement tool to allow for further customization, as required.
Hitachi Content Platform Custom Object
Metadata Enhancement Tool: Architecture
and Operation
HCP custom object metadata enhancement tool encapsulates a number of functions into an ex-
tensible and customizable tool suite (see Figure 3). It watches for new files in a local user directory
10. 10
and runs each file through an external metadata extraction program to see if there is any metadata.
Then, it ingests the resulting metadata to augment the corresponding HCP file object. If the file
object does not already exist in the data store, then the tool will ingest the object, itself, as well. The
tool's running application program will wake up on a periodic basis to perform these functions.
Figure 3. Components and Interfaces between Hitachi Content Platform and Hitachi
Content Platform Custom Object Metadata Enhancement Tool
The REST interface is used for all communication between HCP custom object metadata enhance-
ment tool and HCP. This is done for several reasons, including portability, supportability and perfor-
mance. REST is used by many database and distributed web applications and is implemented using
a behavioral model. The beauty is: REST is an open standard and offers simplicity in its stateless
model. This makes it much easier to integrate distributed components.
There are 2 operational modes for HCP custom object metadata enhancement tool: an in-band file
object and metadata ingest mode, and an out-of-band metadata augmentation mode. Both are
described below.
Ingest Function Process Flow
The ingest function is an in-band mode whereby new file objects are pre-processed to extract the
custom metadata before ingestion into the HCP data store. This function is useful when new data is
being ingested so that the accompanying metadata is added at the same time as the file object.
Detailed operation of the ingest function is shown in Figure 4. At a user-defined periodic interval, the
HCP custom object metadata enhancement tool process will wake up and begin searching for
new files in the user directory. The list of files is processed in order by the tool; each file in the
resulting list is provided as input to the external metadata extraction program. The extraction
11. 11
program will read the specified file and send the resulting XML-formatted metadata information
to the tool . The tool will read the specified file and send the information pair (object +
metadata) to HCP. HCP will then write each component to the respective location in the data store
, completing the ingest operation.
Figure 4. Detailed HCP Custom Object Metadata Enhancement Tool Ingest Process Flow:
Post-processor Extracts Custom Metadata, Augments Existing HCP Objects.
Augment Function Process Flow
The augment function is an out-of-band mode whereby existing file objects are post-processed
in order to augment the stored information with new object metadata. This is useful when a large
amount of data already exists in HCP. Otherwise, all of the objects would have to be re-ingested into
another data repository, which could take considerable time and network resources.
Detailed operation of the augment function is shown in Figure 5. As indicated by marker , the
customer has previously ingested a large number of files into the HCP data store. HCP custom
object metadata enhancement tool will periodically wake up and query the existing HCP data
store, searching for files without metadata, which had not been modified since the previous query
. Files matching the criteria are supplied to the metadata extraction application , which will
read the file object from the local directory and provide any custom metadata from the files
formatted in an XML format. HCP custom object metadata enhancement tool will then ingest the
custom metadata to augment the corresponding HCP objects .
12. 12
Figure 5. Detailed HCP Custom Metadata Enhancement Tool Process Flow During an
Augment Function: Post-processor Extracts Custom Metadata, Augments Existing HCP
Objects
HCP Namespace Usage by HCP Custom Object Metadata
Enhancement Tool
HCP provides access to the repository as partitioned namespaces. A namespace is a logical
grouping of objects such that the objects in one namespace are not visible in any other namespace.
To the user of a namespace, the namespace is the repository and it may appear as a network-
accessible mount point. This brief introduction allows for the discussion about source and
destination locations, but more detail on HCP and namespaces are provided below.
Source or Destination Locations
HCP custom object metadata enhancement tool provides flexibility in the input source location as
well as the output destination. In the case of an ingest operation, the input source could be from a
file system on the machine where HCP custom object metadata enhancement tool is running, or
from a file system on a network-mounted remote directory. For an augmentation operation, the ob-
jects would be sourced from the root folder in either an HCP default namespace or an authenticated
namespace.
The destination of any HCP custom object metadata enhancement tool file operation is always to an
HCP repository, but either namespace is allowed. The destination namespace can be the same as
the source namespace, or it can be a different namespace. The path should contain the root folder
within the appropriate namespace.
A summary of the allowable locations is shown in Table 1.
13. 13
TABLE 1. HCP CUSTOM METADATA ENHANCEMENT TOOL:
ALLOWABLE SOURCE AND DESTINATION LOCATIONS
Object Location File System HCP Default HCP Authenticated
Namespace Namespace
Source Input Yes Yes Yes
Destination Output No Yes Yes
Reference Architecture and Host Implementation Guidelines
In a typical implementation, HCP custom object metadata enhancement tool runs on a host ma-
chine, which is not part of HCP. The tool requires minimal resources, and the host machine could
either be a physical machine or a virtual machine. The processor, memory and storage requirements
are driven more by the plug-in metadata extraction application as well as the size of the objects and
the required object process rate. If possible, administrators should provide adequate memory to al-
low the operating system to keep the object, as well as the metadata extraction application resident
in memory since the application will be called repeatedly (i.e. for every new object to be processed).
Since HCP custom object metadata enhancement tool requires only a single machine (physical or
virtual), its reference architecture is more dependent on the HCP implementation than the tool node.
Figure 6 depicts an example implementation with the tool's physical node connected to a 4-node
HCP 500 system. This HCP was configured with failover and uses modular storage with LUNs pro-
visioned from individual RAID groups. The tool node in this diagram shows the new content being
sourced from either a local directory on the node, or from a remote directory (but not both).
14. 14
Figure 6. HCP Custom Object Metadata Enhancement Tool Reference Architecture: HCP
Implementation as a 4-node HCP 500 Supporting Failover Using Modular Storage with
LUNs Provisioned from Individual RAID Groups
Example Proof of Concept Implementation
As a proof of concept demonstration, both HCP custom object metadata enhancement tool func-
tions were utilized. The tool was first used to enhance existing objects previously ingested, but
which required augmentation with newly provided geospatial metadata information. The demonstra-
tion also ingested new objects augmented with the corresponding geospatial-based metadata.
The pluggable metadata application used was an open-source geographic information system
(GIS) program called FWtools [3]. FWtools provides the ability to view geospatial information from a
variety of format types, while also providing the ability to extract the metadata for the supported file
types including the National Imagery Transmission Format (NITF) [4]. NITF files are used by federal
agencies and system integrators focused on correlating information in the objects with geospatial
information, all from multiple events and data sources.
Parameters and Configuration Settings
HCP custom object metadata enhancement tool has a number of tunable parameters and
configuration settings that must be properly set before starting normal operation. All of these
settings can be found in the "ingestor.properties" file. All of the settings are listed in Table 2 along
with the corresponding description.
15. 15
TABLE 2. TUNABLE HCP CUSTOM OBjECT METADATA
ENHANCEMENT TOOL PARAMETERS AND CONFIGURATION
SETTINGS
Parameter Description
source.path Local path to the directory that contains the data to ingest
source.maxBatchSize Maximum number of file handles to "batch" per loop iteration
destination.user HCP data access: user to use for ingest
destination.password HCP data access password for destination.user account
destination.passwordEncoded Indication if the destination.password value is encoded in md5 format
destination.rootpath Root path REST URL to HCP to place content
metadata.classes Comma separated, ordered list of class(es) to load to extract metadata from
files
execution.loopcount Number of times to load up the batch with files to process
execution.stopRequestFile Name of file in process, local directory to watch for to indicate to stop
processing
execution.pauseRequestFile Name of file on local machine to watch for to indicate to pause processing: For
as long as the file exists, the program will be paused. Delete the file to resume.
Changing this value while the program is in the paused state will not cause the
new value to be used until resumed.
execution.deleteSourceFiles Indicates whether the source files should be deleted after written to HCP: If the
file does not have correct permissions, attempt to change and try again.
execution.forceDeleteSourceFiles Indicates whether the source file permissions should be forced to be deleted
by changing the source file permissions
execution.deleteSourceEmptyDirs Indicates whether the empty directories in the source files should be periodi-
cally cleaned up
execution.updateMetadata Indicates whether metadata should be updated for existing metadata on
objects in HCP: If set to false, source files will be ignored (but deleted, if
indicated).
execution.pauseSleepInSeconds Number of seconds to sleep during pause for between checks for resume
execution.batchSleepInSecond Number of seconds to sleep at end of batch run before attempting another
batch
execution.debugging.httpheaders Indicates whether HTTP headers should be written to the console (stdout)
Hitachi Content Platform Primer
The functionality described here is based on Hitachi Content Platform version 4.1, but some content
might be applicable to prior HCP versions.
About Hitachi Content Platform
Hitachi Content Platform is a distributed storage system designed to support large, growing re-
positories of fixed-content data. HCP stores objects that include both data and the corresponding
metadata. It distributes these objects across the storage space but still presents them as files in a
standard directory structure.
16. 16
HCP provides access to stored objects through the HTTP protocol, as well as through user
interfaces such as the namespace browser and search console.
HCP is a combination of hardware and software that provides an object-based data storage envi-
ronment. An HCP repository stores all types of data, including simple text files as well as multigiga-
byte satellite, medical or database images. HCP provides easy access to the repository for adding,
retrieving and deleting the stored data. HCP uses write once, read many (WORM) storage technol-
ogy and a variety of policies and internal processes to ensure the integrity of the stored data and the
efficient use of storage capacity.
Key features of HCP include:
■■Scalability up to 40PB of storage in a single cluster
■■Capability to provision a single cluster into multiple virtual content platforms ("tenants"), each with
its own unique configuration and access control to manage data placement and content distribu-
tion to appropriate audiences
■■Connection capabilities to a wide range of applications and protocols via http, REST, NFS, CIFS
and more
■■High data integrity, with data integrity checking, RAID-6, replication, encryption, WORM, multiple
versions of objects and audit logging
■■Automation of data migration from old storage to new storage
■■Management and enforcement policies for retention, disposal, shredding and other compliance
and lifecycle management operations
■■Increased value of unstructured data using metadata and custom metadata for automation and
search
■■Capability to create a single, multipurpose, unstructured data platform for archive, cloud and
backup capabilities
■■Capability to monitor and report on storage and bandwidth use of different tenants for charge-
back
■■Enhanced management capabilities with comprehensive interfaces for cloud and distributed
environments
■■Scalability to branch and remote offices via Hitachi Data Ingestor
The following section introduces basic HCP concepts and includes information regarding HCP
namespaces.
Object-based Storage
HCP stores objects in the repository. Each object permanently associates data HCP receives (for
example, a file, an image or a database) with information about that data, called metadata.
An object encapsulates:
■■Fixed-content data, which is an exact digital reproduction of data as it existed before it was
stored. Once it is in the repository, this fixed content data cannot be modified.
■■System metadata offers system-managed properties that describe the fixed-content data (for
example, its size and creation date). System metadata includes settings, such as retention and
17. 17
data protection level, that influence how transactions and internal processes affect the object.
■■Custom metadata is metadata that a user or application provides to further describe an object. It
is typically specified as XML and can be used to create self-describing objects. Future users and
applications can use this metadata to understand and repurpose the object content.
Namespaces and Tenants
An HCP repository is partitioned into namespaces. A namespace is a logical grouping of objects
such that the objects in one namespace are not visible in any other namespace. To the user of a
namespace, the namespace is the repository.
Namespaces provide a mechanism for separating the data stored for different applications, business
units, or customers. For example, a deployment could have one namespace for accounts receivable
and another for accounts payable.
Namespaces also enable operations to work against selected subsets of repository objects. For
example, a query could be performed that targets the accounts receivable and accounts payable
namespaces but not the employee namespace.
Namespaces are owned and managed by administrative entities called tenants. A tenant typically
corresponds to an actual organization such as a company or a division or department within a com-
pany. A tenant can also correspond to an individual person.
Namespace Access
HCP provides several techniques for accessing and managing data in the namespace. These
include:
■■REST interface
■■Metadata query API
■■Namespace browser
■■Search console
■■Hitachi Data Migrator
■■HCP client tools
REST Interface
Clients use an HTTP-based REST interface to access the namespace. Using this interface, actions
can be performed such as adding objects to the namespace, viewing and retrieving objects, chang-
ing object metadata and deleting objects. The namespace can be accessed programmatically with
applications, interactively with a command-line tool or through a graphical user interface (GUI).
Figure 7 shows the relationship between original data, objects in a namespace and the HTTP
access protocol.
18. 18
Figure 7. Client-HCP Namespace: Relationship between Original Data, Objects in a
Namespace and HTTP Access to the HCP Data Store
Metadata Query API
HCP allows clients to use HTTP requests to find objects that meet specific criteria, including object
change time, index setting, operations on the object and the object location. If the client has the
appropriate permissions, it can query multiple namespaces, and a single request can query multiple
HCP namespaces and the default namespace.
A metadata query to HCP will return a set of records containing metadata that describes the
matching objects. If the query matches a large number of objects, multiple requests can be used to
page sequentially through the records and retrieve only a specific number of records in response to
each request.
Namespace Browser
The HCP namespace browser provides management of the namespace content and the ability to
view information about namespaces. The browser functions include:
■■List, view, and retrieve objects and versions of objects
■■Create empty directories
■■Store and delete objects
■■Display namespace information, including:
■■ The namespaces that can be accessed
■■ Retention classes for use within a namespace
■■ Permissions for namespace access
■■ Statistics about a namespace
Search Console
The HCP search console is an easy-to-use web application that provides the capability to search for
and manage objects based on specified criteria. For example, a search for objects stored before a
certain date or larger than a specified size could then be deleted or marked accordingly to prevent
them from being deleted.
19. 19
The search console works with either of 2 implementations, which must be enabled at the HCP
system level:
■■The Hitachi Data Discovery Suite (HDDS) search facility interacts with HDDS, which performs
searches and returns results to the HCP search console. HDDS is a separate product from HCP.
■■The HCP search facility is integrated with HCP and works internally to perform searches and
return results to the search console.
Only one of the search facilities can be enabled in the HCP GUI at any given time. If neither is
enabled, HCP does not support using the search console to search namespaces. The system
associated with the enabled search facility is called the active search system.
The active search system (that is, HDDS or HCP) maintains an index of data objects in each search-
enabled namespace. The index is based on object content and metadata. The active search system
uses the index for fast retrieval of search results. When objects are added to or removed from the
namespace or when object metadata changes, the active search system automatically updates the
index to keep it current.
For information on using the search console, please reference [5].
Note: Not all namespaces support search if the namespace administrator has not enabled
search.
Hitachi Data Migrator
Hitachi Data Migrator is a high-performance, multithreaded client-side utility for viewing, copying,
and deleting data. Data Migrator functions include:
■■Copy objects, files and directories between local file systems, HCP namespaces and earlier HCP
archives
■■Delete objects, files and directories, including performing bulk delete operations
■■View the content of objects and files, including the content of old versions of objects
■■Rename files and directories on the local file system
■■View object, file and directory properties
■■Create empty directories
■■Add, replace or delete custom metadata for objects
Data Migrator has both a GUI and a command-line interface (CLI).
For information on using Data Migrator, please reference [6].
HCP Client Tools
HCP comes with a set of command-line tools that allows data to be copied or moved between
a client and an HCP system. The tools also provide a search capability using specified criteria.
Additionally, empty directories can be created in a local or remote file system or on an HCP system.
The client tools support multiple namespace access protocols and multiple client platforms. The
command syntax is the same for all supported configurations.
20. 20
For information on installing and using the client tools, please reference [7].
Note: For most purposes, the HCP client tools have been superseded by Hitachi Data Migrator.
However, they have some features, such as finding files that are not available in Data Migrator.
Transmitting Data in Compressed Format
Object data or custom metadata can be compressed in gzip format to save bandwidth before sending
it to HCP. The PUT request contains the subrequest to tell HCP that data is compressed. HCP will
then know to decompress the data before storing it.
Similarly, in a GET request, HCP can be told to return object data or custom metadata in compressed
format. In this case, the returned data must first be decompressed before use.
HCP supports only the gzip algorithm for compressed data transmission.
HCP can be told that the request body is compressed by including a Content-Encoding header with
the value gzip. In this case, HCP uses the gzip algorithm to decompress the received data.
HCP can be told to send a compressed response by specifying an Accept-Encoding header. If the
header specifies gzip, a list of compression algorithms that includes gzip, or *, HCP uses the gzip
algorithm to compress the data before sending it.
For examples of sending and receiving objects in compressed format, please reference Chapter 4,
"Working with objects and versions" in [8].
Notes:
■■HCP can also compress and decompress metadata query API requests and responses.
For more information on this, please reference the HCP product document titled "Using a
Namespace," in the section titled "Request HTTP elements."
■■Since HCP normally compresses stored object data and custom metadata, it is unnecessary
to explicitly compress objects for storage. However, if gzip-compressed objects or custom
metadata are to be stored, do not use a Content-Encoding header. To retrieve stored gzip-com-
pressed data, do not use an Accept-Encoding header.
Data Access Permissions
All namespace access clients must have permission to access and perform actions on data. Table 3
describes the permissions and the operations allowed.
21. 21
TABLE 3. HCP PERMISSIONS AND ALLOWABLE OPERATIONS
Permission Operations
Read y■Retrieve objects and system metadata.
y■Check for object existence.
y■Check for and retrieve custom metadata.
Write y■Add objects.
y■Create directories.
y■ and change system and custom metadata.
Set
Delete Delete objects, empty directories and remove custom metadata.
Purge Delete objects and their historical versions.
Privileged y■Delete or purge objects regardless of retention.
y■Place objects on hold.
Search Search for objects. For information on this, please reference Chapter 8 “Using
the HCP metadata queryAPI” Conduct search in [8].
Some operations require multiple permissions. For example, to place an object on hold, the user
must have both write and privileged permissions. Similarly, performing a privileged purge will require
delete, privileged and purge permissions.
Permissions are set at 2 levels:
■■Namespace-level permissions. This permission mask specifies the maximum permissions for
any user that accesses the namespace.
■■Data access account. This specifies permissions for an individual user. Accessing a
namespace will require a data access account with a username and password. The account
specifies available namespaces and associated permissions.
The required permissions for a particular operation must be enabled in both the namespace-level
permission mask and the corresponding data access account permissions.
Replication
Replication is the process of keeping selected tenants and namespaces in 2 HCP systems in sync
with each other. Basically, this entails copying object creations, deletions and metadata changes
from one system to the other. HCP also replicates the tenant and namespace configuration, data
access accounts and retention classes.
The HCP system in which the objects are initially created is called the primary system. The 2nd
system is called the replica.
Replication has several purposes, including:
■■If the primary system becomes unavailable (for example, due to network issues), the replica can
provide continued data availability.
■■If the primary system suffers irreparable damage, the replica can serve as a source for disaster
recovery.
22. 22
■■If an object cannot be read from the primary system (for example, because a server is unavail-
able), HCP can try to read it from the replica.
Note: Replication is an add-on feature to HCP. Not all systems include it.
Namespace Operations
Familiar commands and tools are used to perform operations on a namespace. Some operations
relate to specific types of metadata. For more information on this metadata, please reference
Chapter 2, "Understanding objects" section in [8].
Operations that store or retrieve data can optionally transmit the data in gzip-compressed format.
For more information on this, see the individual commands used for those operations.
Operation Restrictions
The operations that can be performed are subject to the following restrictions:
■■The HTTP request headers must include valid user information.
■■The namespace must be configured to allow HTTP or HTTPS access from the client IP address.
■■The namespace configuration and user permissions must allow the operation.
For information on user permissions, please reference Chapter 10, "Using the Namespace Browser"
in [8].
Supported Operations
The following operations can be performed on a namespace:
■■Write data to the namespace.
■■If versioning is enabled, store new versions of existing objects.
■■Override default metadata when storing an object.
■■Create an empty directory in the namespace.
■■Check for object existence.
■■View the content of an object.
■■View object metadata.
■■Delete an object.
■■Delete an empty directory.
■■Set retention for an object that has none.
■■Extend the retention period for an object.
■■Set or change a retention class for an object.
■■Hold or release an object.
■■Enable shredding of an object.
■■Change the index setting for an object.
■■Add, replace or delete custom metadata for an object.
■■Add or retrieve object data and custom metadata in a single operation.
23. 23
■■Check for and read custom metadata.
■■List retention classes available in the namespace.
■■List namespace permissions for the user.
■■List the namespace statistics.
■■List the accessible namespaces.
■■Use the HCP metadata query API to get information about objects that meet specified criteria in
one or more namespaces.
Prohibited operations
HCP never allows users to:
■■Rename an object or directory.
■■Overwrite a successfully stored object. However, if versioning is enabled, new versions of an
object can be written.
■■Modify the fixed-content portion of an object.
■■Delete an object that is under retention if the privileged permission is not granted or if the
namespace is configured to prevent this operation.
■■Delete a directory that contains one or more objects.
■■Shorten an explicitly set retention period.
REST Interface Primer
The Representational State Transfer (REST) interface is a behavioral model used by many database
and distributed web applications. Its beauty lies is in its simplicity. From the Wikipedia definition:
REST-style architectures consist of clients and servers. Clients initiate requests to
servers; servers process requests and return appropriate responses. Requests and
responses are built around the transfer of representations of resources. A resource
can be essentially any coherent and meaningful concept that may be addressed.
A representation of a resource is typically a document that captures the current or
intended state of a resource.
At any particular time, a client can either be in transition between application states or
"at rest." A client in a rest state is able to interact with its user, but creates no load and
consumes no per-client storage on the servers or on the network.
The client begins sending requests when it is ready to make the transition to a new
state. While one or more requests are outstanding, the client is considered to be in
transition. The representation of each application state contains links that may be used
next time the client chooses to initiate a new state transition.
REST was initially described in the context of HTTP, but is not limited to that protocol.
RESTful architectures can be based on other Application Layer protocols if they
already provide a rich and uniform vocabulary for applications based on the transfer of
meaningful representational state. RESTful applications maximize the use of the pre-
24. 24
existing, well-defined interface and other built-in capabilities provided by the chosen
network protocol, and minimize the addition of new application-specific features on top
of it.
Service Offerings
Customization and support services are available. Please contact your HDS Account Manager for
additional information.
25. 25
Appendix A: References
[1] Hitachi Content Platform (HCP): http://www.hds.com/assets/pdf/hitachi-datasheet-content-
platform.pdf
[2] REST interface: http://en.wikipedia.org/wiki/Representational_State_Transfer
[3] FWTools for GIS imaging: http://fwtools.maptools.org
[4] National Imagery Transmission Format (NITF) files: http://en.wikipedia.org/wiki/National_Imagery_
Transmission_Format
[5] HCP "Searching Namespaces" manual, part of the HCP Product Documentation Set
[6] HCP "Using HCP Data Migrator" manual, part of the HCP Product Documentation Set
[7] HCP "Using the HCP Client Tools" manual, part of the HCP Product Documentation Set
[8] HCP "Using a Namespace" manual, part of the HCP Product Documentation Set
26. 26
Appendix B: Feedback
Hitachi Data Systems welcomes your feedback. Please share your thoughts by sending an email
message to Christian.Heiter@hds.com, Clifford.Grimm@hds.com, Michael.Malaret@hds.com or
David.Haberland@hds.com. Please be sure to include the title of this white paper in your email
message.