File Organization Storage
There are different ways of storing data in the database. Storing data in files is one of them. A user can store the data in files in an organized manner. These files are organized logically as a sequence of records and reside permanently on disks. Each file is divided into fixed-length storage units known as Blocks. These blocks are the units of storage allocation as well as data transfer. Although the default block size in the database is 4 to 8 kilobytes, many databases allow specifying the size at the time of creating the database instance.
Usually, the record size is smaller than the block size. But, for large data items such as images, the size can vary. For accessing the data quickly, it is required that one complete record should reside in one block only. It should not be partially divided between one or two blocks. In RDBMS, the size of tuples varies in different relations. Thus, we need to structure our files in multiple lengths for implementing the records. In file organization, there are two possible ways of representing the records:
- Fixed-length records
- Variable-length records
Let’s discuss this in detail.
Fixed-Length Records
Fixed-length records means setting a length and storing the records into the file. If the record size exceeds the fixed size, it gets divided into more than one block. Due to the fixed size there occurs following two problems:
- Partially storing subparts of the record in more than one block requires access to all the blocks containing the subparts to read or write in it.
- It is difficult to delete a record in such a file organization. It is because if the size of the existing record is smaller than the block size, then another record or a part fills up the block.
However, including a certain number of bytes is the solution to the above problems. It is known as File Header. The allocated file header carries a variety of information about the file, such as the address of the first record. The address of the second record gets stored in the first record and so on. This process is similar to pointers. The method of insertion and deletion is easy in fixed-length records because the space left or freed by the deleted record is exactly similar to the space required to insert the new records. But this process fails for storing the records of variable lengths.
Variable-Length Records
Variable-length records are the records that vary in size. It requires the creation of multiple blocks of multiple sizes to store them. These variable-length records are kept in the following ways in the database system:
- Storage of multiple record types in a file.
- It is kept as Record types that enable repeating fields like multisets or arrays.
- It is kept as Record types that enable variable lengths either for one field or more.
In variable-length records, there exist the following two problems:
- Defining the way of representing a single record so as to extract the individual attributes easily.
- Defining the way of storing variable-length records within a block so as to extract that record in a block easily.
Thus, the representation of a variable-length record can be divided into two parts:
- An initial part of the record with fixed-length attributes such as numeric values, dates, fixed-length character attributes for storing their value.
- The data for variable-length attributes such as varchar type is represented in the initial part of the record by (offset, length) pair. The offset refers to the place where that record begins, and length refers to the length of the variable-size attribute. Thus, the initial part stores fixed-size information about each attribute, i.e., whether it is the fixed-length or variable-length attribute.
Slotted-page Structure
There occurs a problem to store variable-length records within the block. Thus, such records are organized in a slotted-page structure within the block. In the slotted-page structure, a header is present at the starting of each block. This header holds information such as:
- The number of record entries in the header
- No free space remaining in the block
- An array containing the information on the location and size of the records.
Inserting and Deleting Method
The variable-length records reside in a contiguous manner within the block.
When a new record is to be inserted, it gets the place at the end of the free space. It is because free space is contiguous as well. Also, the header fills an entry with the size and location information of the newly inserted record.
When an existing record is deleted, space is freed, and the header entry sets to deleted. Before deleting, it moves the record and occupies it to create the free space. The end-of-free-space gets the update. Then all the free space again sets between the first record and the final entry.
The primary technique of the slotted-page structure is that no pointer should directly point the record. Instead, it should point to the header entry that contains the information of its location. This stops fragmentation of space inside the block but supports indirect pointers to the record.