Trustworthy Records Retention

What does it mean ?
Organization should secure the entire life cycle of their records,
So that records are created, kept accessible for an appropriate
period of time and deleted, without tampering from
organizational insider or outsider.

Why do we need such this concept ?
Most traditional security techniques are of a little help in
ensuring trustworthy retention of records, because these
techniques focus on outsiders as the source of threats!
With organizational fraud, the threats come from inside the
organization, often from highly-placed employees.

What’s the goal of Trustworthy
Records Retention ?
The gaol of it is to provide long-term retention and eventual
disposal of organizational records in such a manner that no user
can delete, hide, or tamper with record during it’s retention
period.
* Nor recreate a record’s content once it has been deleted!

Regulatory Legislation
Trustworthy records retention has become mandatory with the
passing of regulatory legislation all around the world.
With each regulations is designed for a particular application
area a number of assurance criteria are common to many of the
directives.

Common assurance criteria:
1- Guaranteed retention.
2- Long-term retention.
3- Efficient access to data.
4- Data confidentiality.
5- Data integrity.

Common assurance criteria:
6- Litigation holds.
7- Guaranteed deletion.
8- Auditing.
9- High penalties for non-compliance
10- Insider adversaries.

Some examples of laws & regulations:
( to ensure trustworthy retention )
* The Sarbanes-Oxley Act of 2002 require public companies to
provide disclosure and accountability of their financial reporting
subject to independent audits.
* Food and Drug Administration place control over records of
trials of potential medicines.

* The Health Insurance Portability and Accountability Act
(HIPAA) requires trustworthy storage of medical records.
* Federal Information Security Management Act which
requiring yearly audits, risk assessments, certifications and
continuous monitoring such system.

* The Family Education Rights and Privacy Act requires
long-term trustworthy storage of student records from
elementary school through the university level.
* The Markets in Financial Instruments Directives( MiFID)
regulates financial market across Europe, and introduces
strict requirements on electronic record keeping.

Threat model:
* The main focus in trustworthy retention records is on preventing
malicious insiders from tampering with or destroying records.
* The second factor in threat model for trustworthy retention is that
the visible alteration or destruction of records is tantamount to an
admission of guilt, in the context of litigation.

Some implications of the threat model:
* Trustworthy retention
When an adversary attempt to modify or delete a record, or hide it,
we must make sure that the regulatory authority can detect such
this attacks and prevent them.

* Trustworthy access and migration
When the organization needs to migrate it’s record to new storage
server, the regulatory authority must be able to detect whether any
such modification or omissions occurred during migration.

* Trustworthy deletion
When mandatory retention of a records is over, the organization
removes the record.. the regulatory authority needs to prevent
adversary from gaining any information about that deleted record.

Storage architecture
Because the key requirement for trustworthy retention of records
is to prevent deletion and modification of the records,
Because also, the existing signature-based approaches and the
techniques for outsourcing and traditional access control are
powerless and don’t guarantee these requirements..
We need a new kind of storage architecture to thwart these
attacks.

The new storage architecture should have the following
properties:
1- The component for enforcing the storage security properties
should be as small as possible.
2- Cost of any effective attack against the component must be
high, and it’s results must be conspicuous.
3- The resulting system must provide end-to-end security
guarantees.
4- The price per byte of storage must be modest.

The storage industry has developed a variety of compliance
storage products and these products are often referred to as
WORM ( write once, read many ) devices.
Three types of storage will be discussed:
1- Tape-based products.
2- Optical-disk products.
3- Hard disk products.

1- Tape-based products
The Quantum DLTSage predictive, preventative and diagnostic
tools for tape storage are provides a compliance storage.
Disadvantage:
* The WORM assurances are provided under the assumption
that only Quantum tape readers are deployed, which is
impractical.

1- Tape-based products
Disadvantages (con.):
* Given the nature of magnetic tape, an attacker can easily
dismantle the plastic tape enclosure and access the underlying
data on different readers, thus compromising it’s integrity.
In addition to inability of secure deletion.

2- Optical-disk products
Optical WORM-disk solution rely on irreversible physical write
effects to ensure the inability to alter existing content.
Disadvantages:
* It’s challenging to deploy a scalable optical-only solution
with increasing amount of information on constant low-latency.
* inability to fine-tune WORM and secure deletion granularity.
* Perform poorly in price- performance measurements.

3- Hard disk-products
Magnetic disk recording offers better overall cost and
performance than optical or tape storage.
There are a lots of soft-WORM that offers immutability for
records in hard disk storage devices.

3- Hard disk-products
Examples of soft-WORM that can be applied on hard
disks:
1- EMC Centera
* Each data record has two components: the content & its
associated content descriptor file (CDF) which contains metadata
attribute (creation date, time, format) and the object’s content
address.

Hard disk-products
disks:
1- EMC Centera (con.)
* The CDF is used for access to and management of the records.
* Centera permit deletion of a pointer to a record upon
expiration
of the retention period.
* Given its software- only nature, these mechanisms are
vulnerable to simple software-based attacks and physical
attacks.

Hard disk-products
disks:
2- Hitachi Message Archive for Compliance
* The system allow customers to lock down archived data,
making it non-erasable and non-rewritable for prescribed period.
* Given its software- only nature, these mechanisms are
vulnerable to simple software-based attacks and physical
attacks as Centera.

Hard disk-products
disks:
3- IBM System Storage Archive Manager
* The system make the deletion of data before it’s scheduled
expiration extremely difficult.

Hard disk-products
disks:
4- Sun StorageTek Compliance Archiving Software
* The system offers WORM assurances through its StorageTek,
and this software run to provide compliance-enabling features
for
authenticity, integrity, ready access and security.

Strong WORM
Today’s compliance storage products do not really satisfy the
criteria for trustworthy record retention.

Strong WORM
For sound design, the following properties are required to strong
WORM:
* To prevent physical attack, strong tamper-resistant and reactive
hardware is requires to ensure data integrity.
* The requirement for efficient access to large volume of records
will need to be searched using indexes. These indexes cannot be
kept on traditional storage, as a super user could hide a record.

Strong WORM
For sound design, the following properties are required to strong
WORM:
* Current products don’t ensure that a record is trustworthy
throughout it’s entire life cycle, from creation, through migration
to newer strong servers, to eventual deletion.
*Current compliance storage products aim to address the
problem of ( documents ) retention; no product support
structured data.

Resistance to physical attack !
Using of the PROM circuitry that can put in the arm electronics
of hard disk drive, or in processor accessible memory to prevent
further writing on disk surface of the hard disk or writing to a
section of logical block addresses (LBAs) don’t provide strong
WORM guarantees, an insider can open the storage medium
enclosures to gain physical access to underlying data.

By adding a trusted SCPU (Secure CPU) inside the storage
server, we can guarantee the trustworthiness of records.

To achieve high throughput rates, the SCPU is involved in
document insertions and deletions but NOT in reads, thus
minimizing the overhead if the workload is dominated by read
queries.

Clients who perform reads get an SCPU-certified guarantee that:
1- The block was not tampered with if the read is successful; and
if the read is fail either.
2- The block was deleted according to its retention policy.
3- The block is never existed on this storage server.

Another trick to increase throughput during periods of high load
is to temporarily replace expensive SCPU signature operations
with less expensive short-term secure variants.
The system can strengthen these weaker constructs when load
slackens, but within their security lifetime.

To authenticate the contents of the records on the storage server,
one option is to keep a Merkle tree whose entries are signed by
the SCPU.
However, the resulting O(log n) cost to insert or delete a record,
where n is the number of documents, will reduce the throughput
of the system.

To address this problem, one can instead label data block with
monotonically increasing consecutive serial numbers and then
introduce a concept of sliding ”windows” that are authenticated
at O(1) cost by only signing the window boundaries.

Trustworthy Indexing:
Indexing ensures that a target record can be quickly extracted from
terabytes of data.

An indexing approach for trustworthy records retention must have
the following properties:
1- The search path to an index entry must be immutable for the
lifetime of the record that it indexes.
2- The indexing code should reside outside the storage server to
keep the trusted computing base small.

An indexing approach for trustworthy records retention must have
the following properties:
3- The insertion and indexing of a record must be performed
atomically.
4- All traces of a record must be removed from the index when the
record is deleted.

The first step in ensuring trustworthy indexing is to store the index
on WORM.
However, use of WORM alone is insufficient to index trustworthy
because the following problems in B-tree and hash based structure.

Use B-tree to store the index:
The problem comes when a node is
split into two nodes when it
overflows.
33
39 43 47 51 6321 23 33
Write-Once B-Tree

Two pointers are added to the end of
its parent node, superseding the
earlier pointer to the old node. 33 33 47
39 43 47 51 6321 23 33
Write-Once B-Tree
Insert 45
39 43 45 47 51 63

So, an adversary can effectively modify any record he wishes by
creating a new version of the appropriate nodes during copy
operation.
33 33 47
39 43 47 51 6321 23 33
Tampered Write-Once B-Tree ( omit 51)
Insert 45
39 43 45 47 63

Use hash-based structure to store the index:
The problem comes when the number of records in a hash table
exceeds a high water marks.
A new hash table with larger size is allocated and all the records
are rehashed and moved into the new table.

Use hash-based structure to store the index:
The ability to relocate records, however, provides an opportunity
for an adversary to alter the record during the copying step.

Hash-based structure and B-tree
All these approaches are vulnerable because the search path to
particular record is not term-immutable.
So, researchers have proposed trustworthy versions of hashing and
inverted indexes, both guarantee term-immutable search path.

Generalized hash tree
GHT is a balanced tree-based data structure that dose not require
periodic rebalancing.
In a GHT, predefined hashes of the record key determine all
possible lookup or insertion locations.

The location where a record can be
inserted or looked up are therefore
immutable.
GHT
3 51
39 54 47 51
39 43 47 513939 4739 43 47 51
0 1 2 3
0 1 2 3
0 1 2 30 1 2 3
4 5 6 7
4 5 6 74 5 6 7
3 51

To insert or look up a record in a GHT, the record key is hashed to
obtain a position within the root node. If the corresponding node
position at the root node is empty, the record is inserted there.

If there is a collision, the key is rehashed (using different hash
function) and attempt is made to insert the key in the appropriate
sub tree of the root node.

This process is repeated until an
empty node position is found.
If record cannot be inserted, a new
leaf node is added.
GHT After Insertion
3 51
39 54 47 51
39 43 47 513939 4739 43 47 51
0 1 2 3
0 1 2 3
0 1 2 30 1 2 3
4 5 6 7
4 5 6 74 5 6 7
3 51
39 47 51
h0 ( k)=1
h1 ( k)=0
h2 ( k)=7
h3 ( k)=2

Inverted indexes
Keyword search is the most convenient way to query unstructured
records such as email bodies and reports.
Search engines typically use inverted indexes for this purpose.

Inverted indexes
An inverted index comprises a dictionary of terms plus a posting
list for each term containing the identifiers of all records
containing
that term with additional metadata.

Inverted indexes
Queries are answered by scanning the posting lists of terms in the
query.
Query
Data
Base
Worm
Index
1 3 9 17 36
3 9 31
3 19
7 36
3
Ordinary Inverted Index

Inverted indexes
For trustworthy version of inverted indexes, each posting list can
be stored in a separate append-only file on WORM storage, but
this
approach is too slow to support real-time insertion of typically
business documents.

Inverted indexes
The performance can be improved vastly by merging the posting
lists for different terms until the tails of all posting lists fit into the
storage server cashe.

Inverted indexes
However, the "popular” terms are not merged together,
performance is little affected by merging.
Query
Data
Base
Worm
Index
1 3 9 17 36
3#Data 3#Base 9#Data 19#Base 3#£Data
7 36
3
Inverted Index After Merging

B+ tree
Multi-keyword conjunctive queries can be answered by
intersecting
the posting lists of the query terms.
To make the intersection fast, an additional index such as a B+ tree
is usually kept for each posting list, and zigzag join is used to
perform the intersection.

B+ tree
B+ tree can be created for an
Increasing sequence of document ID
without any node splits or merges, by
building the tree from the bottom up.
23
7 11
7 1
3
31
2 4 13 19 23 29 31 33
B+ tree in WORM

B+ tree
Such this index structure is also not trustworthy, even when kept
on
WOPM storage, because the path to each entry is not immutable.

B+ tree
The adversary can hide some entries
by creating a separate sub tree that
does not contain specific entry and
adding an entry at root to lead to the
new sub tree.
23 2
5
7 11
7 1
3
31
2 4 13 19 23 29 31 33
B+ tree with manipulated to
hide 31 by adding 25 to the root
32
25 26 32

B+ tree
To address the problem of B+ tree, researchers proposed jump
indexes technique.

Jump Indexes
Jump index can be used to index monotonic sequences, such as
documents IDs in a posting list, as a replacement for non
trustworthy B+ trees.
Jump index lookup performance is within a factor of 1.4 of the
performance of an equivalent b+ tree.

Jump Indexes
In jump index, to reach a particular number k < N, we can jump
from 0 to k in powers of two.
For example, let b1, b2,… bp be the binary representation of k.
We can reach k in p steps by starting at zero, then jumping forward
by b1*2p-1
integers, then jumping forward by b2*2p-2
integers;
and so on, until finally a bp* 20
jump brings us to the number k.

Jump Indexes
The ith jump pointer stored with jump
index entry (node) L will point to the
smallest jump index entry (node) L’
such that:
L+2i
<= L’ <= L+2i+1
Lookups can be done in O(log2N) time
where N=2p
.
1 0 1 2 3 4
Jump Pointers
2 0 1 2 3 4
Jump Pointers
5 0 1 2 3 4
Jump Pointers
10
0 2
Jump P
7 0 1 2 3 4
Jump Pointers
15 0 1 2 3 4
Jump Pointers
Binary Jump Index

Jump Indexes
By using block-structured jump index we can gain better space and
time efficiency.
Block-structured jump index in which p posting entries are stored
together in blocks of size L.

Jump Indexes
Pointers are associated with blocks,
rather than with every entry.
Jump pointers are calculated using
powers of B rather than two where
p>=B.
Each pointer is uniquely identified by a
pair (i, j), where
0<= i < logB(N) and 1<= j <B
1 2 5 7 0،1 0،2 1،1 2،1 2،2 3،1 3،2 4،1 4،2
Block 0
8 10 15 19 0،1 0،2 1،1 2،1 2،2
Jump Pointers
Jump Pointers
21 22 25 0،1 0،2 1،1 2،1 2
Jump Pointers
Block-Structured Jump Index
Block 1
Block 2

In trustworthy indexing approach described before, an adversary
can insert malicious entries into the index.

Malicious entries fall into categories:
1- Those that cause subsequent legitimate insertions to fail.
2- Those that will only be noticed when a lookup operation finds a
dangling pointer or returns a record that does not match the query.
Both events will draw immediate unwanted attention to the attack.

If an adversary gains physical access to storage, he may tamper
with the index contents.
If we have trusted hardware that can periodically sign portions of
the index such SCPU, then any discrepancy between the signature
and the current index contents can be detected.

Trustworthy migration:
It’s impractical to store record on a single server for decades, as the
server will become obsolete and too expensive to maintain.
When the records must be moved, the migration process needs to
be trustworthy even if a super user adversary performed the
migration.

Researchers have developed two schemes for trustworthy
migration
of records between compliance storage servers.

First migration schema
1- is initiated by the system operator retrieving a migration
certificate (MA) from the regulatory authority (RA).
The MA is a signature on a message containing the timestamped
identities of SCPU1 and SCPU2.

2- Upon migration, the MC is presented to SCPU1and SCPU2 who
authenticates the signature of the RA.
3- If this succeeds, SCPU1 is ready to mutually authenticate and
perform a key exchange with SCPU2 using their internally stored
key pairs and certificates.

4- If step 3 succeeds, SCPU1 will be ready and willing to transfer a
description of the state of the compliance records and index
contents on secure channel provided by an agreed-upon symmetric
key.

After the state information has been migrated, the actual records
and index contents can be transferred by the main CPU, without
SCPU involvement.

Second migration schema
This approach relies on the existence of a trusted third party such
as a storage system vendor.

The migration process is divided by three phases:
Phase1:
The party in charge of migration prepares a plan for the migration.
The log of this plan includes the policies governing the migration
and, in compact form, a representation of the list of files and
directories to be migrated.

Phase2:
The current storage server generate certificates that attest to the
current state of the directory and file contents and add them to the
log.

Phase3:
Finally, the party in charge of migration moves the files to be
migrated and copies the log to the new server.
Using public key used by an organization’s series of storage
servers and validation routines to check whether the migration took
place appropriately.

Trustworthy deletion :
The primary purpose of WORM devices is to prevent data deletion.
However, simple erasure is not enough for trustworthy deletion, as
an erased record can be recreated by reverse-engineering an index.

For the deletion of document d to be strongly secure, the presence
or absence of any word w in any reconstruction of d should not
convey any information about its presence in the original
document.

The trustworthy indexing schemes earlier don’t support strongly
secure deletion.
* Generalized hash trees GHT offer weakly secure deletion.
* Trustworthy inverted index and jump index are even more
problematic with respect to deletion.

An inverted index for example,
when a record is deleted, it may be possible to exactly recreate the
record by looking at it’s index entries.
Therefore, the index entries must also be removed to ensure non
reproducibility of deleted records.

An inverted index for example,
also, a structural properties of index may allow an adversary to
infer that the deleted index entries existed.

To address this problem two options are proposed:
1- Dividing expiration times into epochs, and keeping a separate
set
of indexes for records expiring in each epoch.
Then one could delete the entire epoch of indexes once the epoch
is
over.
However, this option is impractical because the litigation holds
may require a document to retained even after its mandatory
retention period is over.

To address this problem two options are proposed:
2 - Rebuild the index in a trustworthy manner when records are
deleted.
However, the record arrival rate of today will be the required
record deletion rate in the future. Thus this option is too expensive
to be practical.

Using encryption:
Encryption the document identifiers before being stored in the
Index is inadequate, because one can still perform a join on the
encrypted document identifiers to recover the document content.

Using Inverted index with encryption:
An alternative for trustworthy inverted index is to merge posting
lists together as usual, then encrypt the term encoding associated
with each posting element and store it in the merged posting list
entries.

One possible encryption technique is to replace the keyword
encoding E in the posting element with it’s XOR with a random
secret, which can be stored with the record and deleted upon its
expiration.
T1
:
Tk
Tk+1
Tl
…….. 00101 d
01100
Supporting deletion from a
trustworthy inverted index
Encoding Document ID
Random Seqr0 =
Ө

The adversary will not be able to determine which of the q merged
keywords corresponds to the posting element, after the secrets is
discarded.
The schema does not achieve strongly secure deletion, though it is
immune to a variety of possible attacks.

Open Problems:
The biggest open issues and challenges in trustworthy record
retention
1- Corrections
Current models for records retention don’t support correction to
record content. An elegant, cost-effective approach is needed for
supporting corrections.

Open Problems:
retention
2- Deletions
No entirely satisfactory schema exists for trustworthy deletion of
records. Traces of record metadata may remain in indexes or
migration logs, allowing an adversary to infer the contents of a
deleted record

Open Problems:
retention
3- Structured Information
Database records need a similar level of protection, but no work to
date has addressed this problem.

Open Problems:
retention
4- Exploiting trusted hardware
It’s important to explore how to deploy trusted hardware WOPM
to
achieve increased security and efficiency in the upper layers (e.g.
indexing).

Trustworthy Records Retention

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Trustworthy Records Retention

Similar to Trustworthy Records Retention (20)

Recently uploaded

Recently uploaded (20)

Trustworthy Records Retention