Hard disk-based file system is a core component of today`s computer systems. As the principal means of sharing and storing data, the file system needs to be both reliable and fast. Unfortunately, improvements in file system performance have lagged improvements in processor performance and disk bandwidth. The performance gap is most pronounced when applications manipulate small files. The gap is especially felt in the UNIX world, where small tools that manipulate small files are the norm.
Small file performance can be improved by grouping and clustering, each of which places multiple files in a directory and places blocks of the same file on disks contiguously. These schemes make it possible for file systems to use large data transfers in accessing small files, reducing disk accesses. However, as file systems become aged, disks become too fragmented to support the grouping and clustering of small files. This fragmentation makes it difficult for file systems to take advantage of large data transfers, increasing disk I/Os.
This dissertation introduces a novel file system called De-Fragmented File System (DFFS), which offers a solution to the problem caused by file system aging. DFFS provides two adaptive mechanisms, IntrA-file De-fragmentation (IAD) and IntEr-file De-fragmentation (IED) mechanisms. These mechanisms relocate and cluster related data in a dynamic manner by using data cached in memory. First, IAD relocates and clusters data blocks of small fragmented files. Second, IED clusters related small files in the same directory at contiguous disk locations.
DFFS exploits a technique called journaling to prevent data on disks from being lost due to system failures such as a power failure. The journaling records file system changes in an append-only log file. In particular, DFFS logs blocks that will be overwritten by the relocating blocks when a target region is de-fragmented. After the crash, DFFS replays the logs to rollback the logged blocks to their ori...