Modes of file access
Serial file
. A serial file is one in which records are stored, one after the other, in the order in which they are added – not in order of a key field. This means that new records are stored at the end of the file.
The following shows a serial file that is used to store the number of entries for EdExcel GCSE Mathematics. The entries were received in the order: Kettlewood, Queens Park, St Mary’s, Wilton High, West Orling. Centre Number
Centre Name
No of Candidates
27102
Kettlewood
85
38240
Queens Park
103
64715
St Mary’s
121
30446
Wilton High
156
12304
West Orling
105
Note that the key field in this file would be Centre Number (it uniquely identifies each school) Both disks and tapes can be used to store a file serially.
Sequential file A sequential file is one in which the records are stored, one after the other, in the order of the key field. The following shows a sequential file that is used to store the number of entries for EdExcel GCSE Mathematics. The entries were added in the order: Kettlewood, Queens Park, St Mary’s, Wilton High, West Orling but they are stored in the order of the key field – Centre Number: Centre Number
Centre Name
No of Candidates
12304
West Orling
105
27102
Kettlewood
85
30446
Wilton High
156
38240
Queens Park
103
64715
St Mary’s
121
As with a serial file, both tape and disks can be used to store a file sequentially and access to the records must take place from the beginning of the file.
Benefits
Sequential files allow the records to be displayed in the order of the key field – this makes the process of adding a record slower, but significantly speeds up searches.
Indexed sequential file An indexed sequential file is one in which the records are stored, one after the other, in the order of the key field, but which also has an index that enables records to be accessed directly.
Index An index is a file with two fields, created from the main file, which contains a list of: the key fields (sorted sequentially); pointers – to where the records can be found in the main file. •
•
Indexed sequential files are useful when: •
it is sometimes necessary to process all the records in sequential order; and
•
it is sometimes necessary to access individual records randomly.
Examples of indexed sequential files
Company employee file At the end of each month all the records will be processed sequentially, in order to produce payslips. However, some records will need to be accessed randomly, at other times – for example, when an employee changes address. A school’s student file When an attendance report is printed, the file will be accessed sequentially, but when the details of an individual student are required the index will be used to find the required record quickly.
Random (direct) access file A random access file is one in which a record can be written or retrieved without first examining other records. A random access file must be stored on disk and the disk address is calculated from the primary key. In its simplest form a record with a primary key of 1 will be stored at block 1, a record with a primary key of 2 will be stored at block 2; a record with primary key 3 will be stored at block 3 etc:
It should be noted that this very simple method where [disk address] = [primary key] is very inefficient in respect of disk space. For example: • •
if the lowest primary key is 1001, then all the disk space below block 1001 will be wasted. If there are some values which the primary key never takes (for example odd values) – these storage spaces will be wasted.
In order to be more efficient with the use of disk space, random access files calculate disk addresses by using a hashing algorithm (also known as just hashing ).
Hashing
Hashing is a calculation that is performed on a primary key in order to calculate the storage address of a record. A hashing algorithm will typically divide the primary key by the number of disk blocks that are available for storage, work out the remainder and add the start address. The answer will be the storage address of the record. [disk address] = [primary key] MOD [number of blocks] + [start address]
Example
If a file was to be stored on the first 5000 blocks of a disk then: [disk address] = [primary key] MOD 5000 That is, the primary key of each of the records would be divided by 5000 and the remainder would be the disk address for the record. This means that a record with primary key of 27102 would be stored at the disk address calculated as follows: 27102 5000
=
5 remainder 2102
This means that the disk address for this record will be 2102. The table shows some other disk addresses calculated using the same hashing algorithm: Centre Number
Centre Name
No of Candidates
Disk Address
27102
Kettlewood
85
2102
38240
Queens Park
103
3240
64715
St Mary’s
121
4715
30446
Wilton High
156
446
12304
West Orling
105
2304
Problems with hashing One problem that could occur with hashing is that a block may already contain a record and be full. For example records with key fields of 38240 and 43240 will both be assigned a disk address of 3240. If this happens then the new record will need be written somewhere else. Two common ways of determining this alternative location are: •
•
the record can be written to the next available block – note that if it is the last address block which is full then the search for an available space will start from the first block. the record could be written to a separate ‘overflow’ area and a tag is placed in the calculated location to indicate exactly where in this overflow area the record can be found.