Hot and cold data storage

HOT COLD
Unified Virtual File System
For Hot & Cold Data Storage
Aditya Ambre Madhura S. Raghavan Rohit Arora
ENTERPRISE STORAGE ARCHITECTURE
GROUP 2

HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
AGENDA
➔ Problem Statement
➔ Project Goals and Features
➔ Architecture and Workflow
➔ Verification Cases
➔ Summary

Least
Frequently
Accessed
Data
HOT COLD
PROBLEM STATEMENT
➔ Lifecycle of Data.
◆ Access frequency.
◆ Storage capacity and hardware characteristics.
➔ User intervention - Running jobs/scripts.
➔ Acknowledging Data temperature
➔ Tight coupling needed between storage components
Frequently
Accessed
Data

HOT COLD
WHAT IS A HOT FILE?
Data File that
➔ Very frequently accessed.
➔ Mostly contains business critical information.
➔ Needs to be accessed quickly.

HOT COLD
WHAT IS A COLD FILE?
Data File that
➔ Is infrequently accessed.
➔ Contains less important information.
➔ Need not be quickly accessed.

HOT COLD
GOAL: WHAT OUR PROJECT IS?
➔ From decoupled storage components - To - tightly coupled two-
tiered storage system
➔ Manage hot & cold data between primary and secondary storage.
➔ Manage primary storage space utilization.
➔ File transfer do not interrupt FS operations.
➔ User agnostic about file transfer and storage.
➔ Optimal storage of cold data.

HOT COLD
WHAT OUR PROJECT IS?

HOT COLD
FEATURES
➔ Infinite Storage illusion
➔ Automatic cold data identification and transfer
➔ Consistent CRUD operations for both hot and cold files
➔ Block level storage
➔ On the fly deduplication
➔ Uninterrupted file access
➔ File level Consistency
➔ Optimal storage space utilization

HOT COLD
OUR ARCHITECTURE
Cold File
Tracking
Hot File
Tracking
File Tracking
Layer
Data Block
Processing Layer
Write block
to cold
Get block
from cold
De-duplication
COLD
STORAGE
APPLICATION
Write Read
FUSE OPERATIONS
Read, Write, Delete, Rename, etc.
2f0f3ff2c7439635e7faa85…
3f35ec5fe4ae0b963779c8…
4a8f9ec938243beac4b2d…
Hot File
Cold File

HOT COLD
HOT-TO-COLD WORKFLOW
COLD
STORAGE
APPLICATION
Write
FUSE {WRITE} OPERATIONS
File Tracking
Layer
Data Block
Processing Layer

HOT COLD
COLD
STORAGE
APPLICATION
Write
Check: Storage > 70%
File Tracking
Layer
Data Block
Processing Layer

HOT COLD
COLD
STORAGE
APPLICATION
Write
Cold File
Tracking
File Tracking
Layer
Data Block
Processing Layer

HOT COLD
File Tracking
Layer
1. List all the files
2. Sort files by access time - oldest to newest
3. Select files to be transferred - (till <=50%)
4. Sort above files by size - large to small
5. Send the largest & least accessed files to
Data Processing layer
Cold File tracking

HOT COLD
File Tracking
Layer
File 1
1:30 PM
100 KB
File 2
4:30 PM
500 KB
File 3
3:30 PM
250 KB
File 4
2:30 PM
350 KB
File 1
1:30 PM
100 KB
File 2
4:30 PM
500 KB
File 3
3:30 PM
250 KB
File 4
2:30 PM
350 KB
File 1
1:30 PM
100 KB
File 4
2:30 PM
350 KB
File 3
3:30 PM
250 KB

HOT COLD
COLD
STORAGE
APPLICATION
Write
File Tracking
Layer
Cold File
Tracking
Data Block
Processing Layer
Write block
to cold
Cold File

2f0f3ff2…
7439635…
e7faa85…
3f35ec5f…
e4ae0b9...
HOT COLD
Data Block
Processing Layer
1. Request Hashtable
2. Get Hashtable
Write Block
to Cold
COLD
STORAGE
2. Gets Hashtable

HOT COLD
Data Block
Processing Layer
2. Get Hashtable
3. Calculate block level hash
4. Check for de-duplication
Write Block
to Cold
COLD
STORAGE
4. Duplicate?
2f0f3ff2…
7439635…
e7faa85…
3f35ec5f…
e4ae0b9...
Block 1
Block 2
Block 3

HOT COLD
Data Block
Processing Layer
2. Get Hashtable
5. Transfer if not duplicate
6. Free block’s memory
Write Block
to Cold
COLD
STORAGE
5. Transfer Block
2f0f3ff2…
7439635…
e7faa85…
3f35ec5f…
e4ae0b9...
5. Update
Hashtable
Block 1 Block 2 Block 3

HOT COLD
Data Block
Processing Layer
2. Get Hashtable
5. Transfer if not duplicate
6. Free block’s memory
7. Send updated hashtable to
cold storage
Write Block
to Cold
COLD
STORAGE
7. Send Updated
Hashtable
2f0f3ff2…
7439635…
e7faa85…
3f35ec5f…
e4ae0b9...
6.

HOT COLD
COLD
STORAGE
APPLICATION
Write
Check: Storage <= 50%
File Tracking
Layer
Cold File
Tracking
Data Block
Processing Layer
Write block
to cold
Cold File De-duplication
2f0f3ff2c7439635e7faa85…
3f35ec5fe4ae0b963779c8…

HOT COLD
COLD-TO-HOT WORKFLOW
COLD
STORAGE
APPLICATION
FUSE {READ} OPERATIONS
File Tracking
Layer
Data Block
Processing Layer
Read
Request
2f0f3ff2c7439635e7faa85…
3f35ec5fe4ae0b963779c8…

HOT COLD
COLD
STORAGE
APPLICATION
File Tracking
Layer
Data Block
Processing Layer
Read
Request
Check: Is File on Hot Storage?
2f0f3ff2c7439635e7faa85…
3f35ec5fe4ae0b963779c8…

HOT COLD
COLD
STORAGE
APPLICATION
File Tracking
Layer
Data Block
Processing Layer
Read
Request
Check: Is File on Hot Storage?
Get block
from cold
No 2f0f3ff2c7439635e7faa85…
3f35ec5fe4ae0b963779c8…

HOT COLD
Data Block
Processing Layer
1. Request copy of Hashtable
2. Get Hashtable
Get Block
from Cold
COLD
STORAGE
2. Gets Hashtable

2f0f3ff2…
7439635…
e7faa85…
3f35ec5f…
e4ae0b9...
HOT COLD
Data Block
Processing Layer
2. Get Hashtable
3. Read block presence on cold
Get Block
from Cold
COLD
STORAGE
3. Is block
present?

HOT COLD
Data Block
Processing Layer
2. Get Hashtable
4. Request/Get block from cold
Get Block
from Cold
COLD
STORAGE
4 Request Block
4. Gets Block
2f0f3ff2…
7439635…
e7faa85…
3f35ec5f…
e4ae0b9...
2f0f3ff2…
7439635…
e7faa85…
3f35ec5f…
e4ae0b9...
Block 1 Block 2 Block 3

2f0f3ff2…
7439635…
e7faa85…
3f35ec5f…
e4ae0b9...
HOT COLD
Data Block
Processing Layer
2. Get Hashtable
5. Write transferred’ block
content to memory block
6. Construct complete file
Get Block
from Cold
COLD
STORAGE
Block 1
Block 2
Block 3
6.

2f0f3ff2…
7439635…
e7faa85…
3f35ec5f…
e4ae0b9...
HOT COLD
Data Block
Processing Layer
2. Get Hashtable
5. Write transferred’ block
content to memory block
6. Construct complete file
7. Delete copy of Hashtable
Get Block
from Cold
COLD
STORAGE
Block 1
Block 2
Block 3
7. Delete
Hashtable

HOT COLD
COLD
STORAGE
APPLICATION
File Tracking
Layer
Data Block
Processing Layer
ReadRead
Request
Get block
from cold
Block Read
Request
No 2f0f3ff2c7439635e7faa85…
3f35ec5fe4ae0b963779c8…

HOT COLD
MINIMAL THRESHOLD WORKFLOW
COLD
STORAGE
APPLICATION
File Tracking
Layer
Data Block
Processing Layer
Some
Operation
Get block
from cold
Block Read
Request
Yes 2f0f3ff2c7439635e7faa85…
3f35ec5fe4ae0b963779c8…
Check: Storage <= 30%
Get Cold FileHot File
Tracking

HOT COLD
READ OPERATION WORKFLOW
COLD
STORAGE
APPLICATION
File Tracking
Layer
Data Block
Processing Layer
Some
Operation
Get block
from cold
Block Read
Request
Yes 2f0f3ff2c7439635e7faa85…
3f35ec5fe4ae0b963779c8…
Check: Storage >30% & < 70%
Get Cold FileHot File
Tracking

HOT COLD
QUICK DEMO

HOT COLD
SCENARIOS / VERIFICATION CASES
I. GENERAL
➔ File System 70% full -> Transfer to cold storage.
➔ File System drops less than 30% -> Transfer from cold storage.
➔ File transfers -> Do not interrupt general FS operations.
➔ Redundant/Duplicate blocks ->Not transferred.

HOT COLD
SCENARIOS / VERIFICATION CASES
II. SPECIFIC
➔ Files transferred –> Based on access and size.
➔ File removed on hot storage –> After last block is transferred.
➔ File in transition accessed –> Abort transfer, access granted!
➔ File space reclamation and File access –> Synchronized.
➔ Only one background process running at specific time.
➔ Delayed delete (rm) -> Transparent to user.

HOT COLD
ASSUMPTIONS
➔ Network is always available.
➔ Hot-Cold classification at file level
➔ Cold Storage is infinite.
➔ Files are not very small or very large.
➔ Delay is accepted for rarely accessed files.
➔ File access granularity – in seconds.

HOT COLD
SUMMARY
➔ Acknowledged data temperatures - hot and cold
➔ Project Features
◆ Auto file identification.
◆ File transfer
◆ Deduplication
➔ Architecture and workflows in action.
➔ Design and implementation of file tracking layer
➔ Design and implementation of Block Data Process Layer
➔ Design decisions for specific verification scenarios.

HOT COLD
FUTURE SCOPE
➔ Variable block size and Block size specifications.
➔ Garbage collection on secondary/cold storage.
➔ Cold file identification parameters and profiles.
➔ Distributed cold storage.

HOT COLD
REFERENCES
1. S. Quinlan and S. Dorward, “Venti: A new approach to archival storage,” in
Proceedings of the First USENIX Conference on File and Storage
Technologies (FAST), 2002. http://plan9.bell-labs.com/sys/doc/venti/venti.
pdf
2. Chuanyi Liu, Dapeng Ju, et al, “Semantic data de-duplication for archival
storage systems,” in Proceedings of the 13th IEEE Asia-Pacific Computer
Systems Architecture Conference (ACSAC 2008), Hsinchu, Taiwan, August,
2008.
3. Sean Quinlan, Jim McKie Russ Cox, “Fossil, an Archival File Server”, Lucent
Technologies Bell Labs, Unpublished memorandum (September 2003).
4. http://www.storiant.com/resources/Cold-Storage-Is-Hot-Again.pdf
5. “What is Unified Storage system ” http://searchstorage.techtarget.
com/definition/unified-storage
6. File System in User Space - http://fuse.sourceforge.net/

HOT COLD
QUESTIONS ?

Hot and cold data storage

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Viewers also liked

Viewers also liked (16)

Similar to Hot and cold data storage

Similar to Hot and cold data storage (20)

More from Rohit Arora

More from Rohit Arora (7)

Recently uploaded

Recently uploaded (20)

Hot and cold data storage