COBOL Using Files • • • • • • • •
File Descriptors File Organizations and Access Modes File Open Modes and I/O Operations/Verbs Operations/Verbs I/O operations on SEQUENTIAL files I/O operations on INDEXED files Random Access form of READ, WRITE, and REWRITE for INDEXED files Sequential Access form of READ, WRITE, and REWRITE for INDEXED files Random observations based on program testing
File Descriptors The DATA DIVISION of a COBOL (sub)program contains two sections, the FILE SECTION and the WORKING-STORGAGE SECTION. The latter is used to describe, via "data description entries" (level numbers, PICTURE clauses, etc.), the hierarchical structure of data items that exist during execution of the program. The former is used to describe, in a similar way, the layout of records in any files that the program uses. For each file that the program uses, the FILE SECTION contains a "file description entry", the beginning of which is signaled by the keyword FD. The typical form of such an entry (the general form includes a number of optional clauses not shown here) is as follows: FD [RECORD CONTAINS CHARACTERS] [DATA RECORD IS ].
Note on notation: Square brackets surrounding an entity indicate that its appearance is optional.
Immediately after the file description entry comes the "data description entry" for the file's data record (beginning with the level number 01). Here is a typical example: FD Employee-File RECORD CONTAINS 65 CHARACTERS DATA RECORD IS Employee-Rec. 01 Employee-Rec. 02 Employee-ID 02 Employee-Name. 03 Last-Name 03 First-Name 03 Middle-Init 02 Position. 03 Job-Code 03 Department 03 Manager-ID
PIC X(10). PIC X(20). PIC X(12). PIC X. PIC X(4). PIC X(3). PIC X(10).
02 Hourly-Pay
PIC 9(3)V99.
The above says that Employee-File is a file in which each record has a length of 65 characters, with the first ten containing an e mployee ID, the next twenty containing an employee's last name, etc., etc. When a COBOL program executes, enough main memory is allocated to hold not only the data items described in the WORKING-STORAGE SECTION but also those described in the FILE SECTION (i.e., one data record from each file). Thus, one could view the data record of a file as being a one-record one -record buffer for that file. When a record is retrieved from a file (via the READ verb), it is placed into the file's data record. Similarly, when a record is written to a file (via the WRITE verb), it is the contents of the file's data record that are written into the file. Note: From class discussion, you should recall that there are also file buffers that are not directly accessible by the application programmer. An input buffer holds (typically) several records that already have been read in (physically) but are waiting to be read in logically (via READ) by b y the COBOL program. An output buffer holds (typically) several records that already have been written logically (via WRITE) by the COBOL program but are waiting to be written (physically) into a file.
File Organization and Access Modes COBOL directly supports 1. 2. 3. 4.
three file file organizat organizations: ions: SEQUEN SEQUENTIAL, TIAL, INDEXED INDEXED,, and RELATIVE RELATIVE three file access modes: SEQUENTIAL SEQUENTIAL,, RANDOM, RANDOM, and and DYNAMIC DYNAMIC four file open modes: modes: INPUT INPUT,, OUTPUT OUTPUT,, EXTEND, EXTEND, I-O I-O seven I/O I/O operations operations:: OPEN, OPEN, CLOSE, CLOSE, READ, READ, WRITE, WRITE, REWRITE, REWRITE, DELETE, DELETE, START
A file's organization organization (i.e., the way it is structured) imposes restrictions upon how it ca n be accessed (i.e., upon which access modes are applicable to it). A file whose organization is SEQUENTIAL (which is the default) allows only the SEQUENTIAL access mode, which means that its records may be accessed (i.e., read or written) only in logical order, one after another. (This restriction makes sense, as such a file has no index (or any an y other auxiliary fast-search-enabling structure) associated with it to allow for efficient access to arbitrary records.) An INDEXED file is one for which an index exists, thereby making it possible to locate a record quickly, given the value of its key field (i.e., the indexing field). A RELATIVE RELATIVE file is one that allows access by b y relative record number (RRN). A file whose organization is INDEXED or RELATIVE allows any of the three a ccess modes to be applied to it: SEQUENTIAL, RANDOM, or DYNAMIC. The notion of
SEQUENTIAL access, as it applies to INDEXED and RELATIVE RELATIVE files, is the same as with SEQUENTIAL files: records are accessed in their logical order. In INDEXED files, the logical order of records corresponds to increasing order of key field value. (For example, if Employee-ID were the key field of the Employee-File described above, then the record containing 'Jones00001' in that field would occur before the record containing 'Simpson012', as the former value is less than the latter according to COBOL's rules for ordering character strings.) In RELATIVE RELATIVE files, the logical order of records corresponds to their RRN's, with record i coming before record j record j if and only if i if i < j. j. As for RANDOM access, in the case of an INDEXED file it means access according to the value stored in the field that is specified as the key of the file (in the SELECT statement for the file). (Such a file has an index for which its key field is the indexing field.) For example, if Employee-File (see below) has as its key the field Employee-ID (an alphanumeric string of length ten), we have the ability to READ or WRITE a record whose Employee-ID field contains a specified value, such as 'Simpson032'. In the case of a RELATIVE RELATIVE file, RANDOM access means access according to the logical position of a record within the file. A position is given in terms of a relative record number (RRN), which is simply a positive integer. For example, we can issue a c ommand to READ or WRITE the record in position 327. DYNAMIC access mode is a combination of both SEQUENTIAL and RANDOM access. That is, if a program is to access records from some file both sequentially and randomly (e.g., the former in performing a range query and the latter in performing a single-record fetch), DYNAMIC access mode is appropriate. The organization of a file and the access mode to be used on that file by a particular COBOL program are specified in a SELECT statement appearing in the FILECONTROL paragraph of the INPUT-OUTPUT INPUT-OUTPUT SECTION in the ENVIRONMENT DIVISION. The form taken by the SELECT statement depends upon the file's organization. (Note: In order to keep things simple, we do not describe the SELECT statement in all its generality.) generality.) For a sequential file, it looks like this: SELECT [OPTIONAL] ASSIGN TO [ORGANIZATION IS SEQUENTIAL] [ACCESS MODE IS SEQUENTIAL] [FILE STATUS IS ]
The default organization is SEQUENTIAL, so that if we omit the ORGANIZATION ORGANIZATION clause, COBOL will interpret this to mean that the file is SEQUENTIAL. The presence of the optional keyword OPTIONAL indicates that the file may or may not already alread y exist (when the program begins execution). (OPTIONAL files may be opened in any mode except OUTPUT.)
The data item specified in the FILE STA STATUS clause should be one defined with a PIC X(2) picture clause. Each time an I/O operation is performed on the file, a two-digit code, called the file status code, is placed into this data item. The file status code indicates whether the operation completed successfully (value "00") or whether something "unusual" occurred (e.g., value "41" indicates an attempt to OPEN a file that was already open, "10" indicates the end-of-file condition, etc., etc.). For more details, see page 301 of Comprehensive of Comprehensive COBOL. COBOL. The form taken by the SELECT statement when the file has INDEXED organization is this: SELECT [OPTIONAL] ASSIGN TO ORGANIZATION IS INDEXED [ACCESS MODE IS {SEQUENTIAL, RANDOM, DYNAMIC}] RECORD KEY IS [FILE STATUS IS ]
Note on notation: A list of items in curly braces indicates that exactly one of them is to be chosen.
For example, the SELECT statement for the Employee file mentioned above might look like this: SELECT Employee-File ASSIGN TO "Employees.dat" ORGANIZATION IS INDEXED ACCESS MODE IS RANDOM RECORD KEY IS Employee-ID.
The data-name specified in the RECORD KEY clause must be one of the fields within the file's data record; it must be (or becomes, be comes, if the file doesn't yet exist) an indexing field of the file (which is to say that, if the file already exists, so must an index on that field). The form taken by the SELECT statement when the file has RELATIVE RELATIVE organization is one of these two: SELECT [OPTIONAL] ASSIGN TO ORGANIZATION IS RELATIVE ACCESS MODE IS SEQUENTIAL [RELATIVE KEY IS ] [FILE STATUS IS ]
or SELECT [OPTIONAL] ASSIGN TO ORGANIZATION IS RELATIVE ACCESS MODE IS {RANDOM, DYNAMIC} RELATIVE KEY IS [FILE STATUS IS ]
That is, for a RELATIVE RELATIVE file, if SEQUENTIAL access mode is chosen, specifying its RELATIVE RELATIVE KEY is optional (and seemingly useless!), but specifying the RELATIVE RELATIVE KEY is mandatory if the access mode is RANDOM or DYNAMIC. Whenever random access is made to a RELATIVE RELATIVE file, the contents of the field that was identified as its RELATIVE RELATIVE KEY are taken to be the RRN of the record to be accessed. Note: Simply including the clause ORGANIZATION ORGANIZATION IS INDEXED (or RELATIVE), RELATIVE), when SELECT-ing SELECT-ing a file, does not magically transform the specified file into one having the appropriate structure. If, for example, you c reated a file using a standard file editor and then tried to SELECT it using the ORGANIZATION ORGANIZATION IS INDEXED (or RELATIVE) RELATIVE) clause within the SELECT statement, you would not ach ieve the desired results. Rather, to construct an INDEXED (or RELATIVE) RELATIVE) file, you would create it via the execution e xecution of some COBOL program in which the file is opened for OUTPUT and records are written to that file.
File Open Modes and I/O Operations/Verbs There are four "file open modes": INPUT, INPUT, OUTPUT, EXTEND, and I-O. A COBOL program "announces its intention" to access ac cess a file by opening it, via the OPEN verb. When opening a file, one of these four modes must be specified, as in OPEN INPUT Course-File
When the program is finished using a file (perhaps only temporarily) it closes it via the CLOSE verb, as in CLOSE Course-File
A file opened in INPUT mode is one that may be accessed on ly via the READ verb (plus the START verb, if the file is INDEXED or RELATIVE). A file opened in OUTPUT mode is one that may be accessed only via the WRITE verb; furthermore, if the file existed prior to being opened, its contents are destroyed (so that, when execu tion ends, the file contains only those records written to the file during execution of the program). A file opened in EXTEND mode, which applies only to SEQUENTIAL files, is one that may be accessed only via the WRITE verb; furthermore, the file must have existed prior to being opened (unless the word OPTIONAL appeared in the SELECT statement for that file), and any records written to it during execution are placed after the after the ones already there. (Note: A file opened in I-O mode is one on which bo th reading and writing of records may be carried out, via the READ and REWRITE verbs. (The WRITE and START START verbs may be applied, a pplied, too, if the file is INDEXED or RELATIVE.) RELATIVE.) Note that a file may be opened more than once during execution of a program, possibly with different open modes each time. However, a file that is open must be closed (via the CLOSE verb) before it can be opened again. For example, a program may open a file for OUTPUT, OUTPUT, write records into it, close it, open it for INPUT, and then read records from it.
I/O operations on SEQUENTIAL SE QUENTIAL files The OPEN and CLOSE verbs were described above (although not in full generality---see a COBOL reference for more details). Here we consider the remaining verbs v erbs that may be applied to a SEQUENTIAL file: READ, WRITE, and REWRITE. Which of these three operations are applicable to a file depends upon the mode into which the file was opened: +--------------------------------------+ | M o d e | Operation | | | INPUT OUTPUT EXTEND I-O | +---------+---------+----------+-------+ READ | x | | | x | +---------+---------+----------+-------+ WRITE | | x | x | | +---------+---------+----------+-------+ REWRITE | | | | x | +---------+---------+----------+-------+ Allowed operations on a file declared to be accessed SEQUENTIAL-ly
Syntactic format of the READ verb applied to a file declared to be accessed in SEQUENTIAL mode (and thereby necessarily opened in INPUT or I-O mode): READ [NEXT] [INTO data-name] AT END [NOT AT END ] END-READ
Example: READ Employee-File AT END SET Employee-Eof TO TRUE NOT AT END PERFORM Process-Employee END-READ
The effect of this command is as follows: 1. If the the end-of end-of-fi -file le condit condition ion is is off, off, then: then: 1. If there there is no "next" "next" record record (e.g., because because the the file is empty empty or its its last record was previously read in), the end-of-file condition is turned on and the imperative statement following the AT AT END clause is executed. 2. If there there is a "next" record, record, it is read read into into the file's file's data data record record (and (and then copied into the data item specified in the INTO clause, if it is present). Also, if the NOT AT AT END clause is present, the imperative statement there the re is executed. 2. If the end-ofend-of-file file conditio condition n is on (e.g., (e.g., because because a previous previous attempt attempt at READing READing turned it on), the program aborts.
Note: As mentioned above, by the file's "data record" we mean the 01-level data item declared in the data description entry immediately following the file's file description entry (the stuff coming after the keyword FD). In the example above, the data record is Employee-Rec.
The presence or absence of the word NEXT within this form of the READ statement makes no difference. Note that, in COBOL, the end-of-file e nd-of-file condition does not become true until an attempt is made to READ beyond be yond the last record in the file. (When this attempt is made, the imperative statement following the AT AT END clause of the READ statement is executed.) e xecuted.) This is in contrast to Ada and Pascal, in which the end-of-file condition bec omes true immediately after the last record has been read. For this reason, a typical file processing loop in COBOL has a somewhat different form than an equivalent loop in Ada or Pascal. Consider this Ada-like pseudocode: WHILE not End_of_File(f) LOOP Get(f, rec); --read next record of file f into rec END LOOP;
The "equivalent" code segment in COBOL would be written in either of these two forms (given in COBOL-like pseudocode): SET eof TO FALSE READ f AT END SET eof TO TRUE END-READ TRUE PERFORM UNTIL eof process rec> READ f AT END SET eof TO TRUE END-READ END-PERFORM
| | | |
SET eof TO FALSE PERFORM UNTIL eof READ f AT END SET eof TO
| | | | | |
NOT AT END
In order to make the program on the left a little more concise, we could place the READ statement into a separate paragraph ---call it Read-f-Rec--- and then replace each of the two occurrences of the READ statement by PERFORM Read-f-Rec. Syntactic format of the WRITE verb applied to a SEQUENTIAL file (necessarily opened in OUTPUT or EXTEND mode): WRITE [FROM ]
Example: WRITE Employee-Rec FROM Temp-Empl-Rec The effect is that the file's data record (after the specified data item has been copied into it, if the FROM clause is present) is written at the end of the file (i.e., after the last record in the file). Recall that opening a file in OUTPUT mode destroys the file's previous
contents, whereas opening a file in EXTEND mode leaves its contents intact, allowing the program to write new records after the ones already there. Note that the WRITE verb cannot be applied to a SEQUENTIAL file opened in I-O mode, as this mode allows REWRITE-ing but not WRITE-ing. Syntactic format of REWRITE verb applied to a SEQUENTIAL file file (necessarily opened in I-O mode): REWRITE [FROM ]
The effect is that the file's data record (or, if the FROM clause is present, the specified data item) is written to the file, replacing the record most recently read from the file. An example program that uses the REWRITE verb appears within the course web pages. Note that, for reasons that I have never seen explained anywhere, the READ verb refers to the file whereas the WRITE and REWRITE verbs refer to the file's data record.
I/O operations on INDEXED files As noted above, an INDEXED file may have any of three access mode s --SEQUENTIAL, RANDOM, or DYNAMIC--- and may be opened in any of three modes ---INPUT, ---INPUT, OUTPUT, or I-O. Which I/O operations are a pplicable to an INDEXED file depend upon both its access mode and its open mode: +-------------------------------------+ | | O p e n M o d e | | | | | Verb | INPUT OUTPUT I-O | +---------+---------+---------+-------+ SEQUENTIAL | READ | x | | x | only) | WRITE | | x | | only) | REWRITE | | | x | only) | DELETE | | | x | | START | x | | x | +---------+---------+---------+-------+ RANDOM | READ | x | | x | | WRITE | | ? | x | | REWRITE | | | x | | DELETE | | | x | | START | | | | +---------+---------+---------+-------+ DYNAMIC | READ | x | | x | | WRITE | | x | x | | REWRITE | | | x | | DELETE | | | x | | START | x | | x | +---------+---------+---------+-------+ File Access Mode
(sequential form (sequential form (sequential form
(surprising!) (random form only) (random form only) (random form only)
(either form) (either form) (either form)
As suggested in the remarks to the right of the table above, each of the READ, WRITE, and REWRITE verbs has two forms, one for sequential access and one for random access.
Random Access form of READ, WRITE, and REWRITE RE WRITE for INDEXED files This section pertains to an INDEXED file for which, in the program under consideration, the ACCESS MODE has been specified to be either RANDOM or DYNAMIC. To read a record ---with a specified value in its key field--- from an INDEXED file opened in either INPUT or I-O mode: 1. Place desir desired ed value into into the key key field field (in (in the file's file's data data record) record) 2. READ READ e> [INTO [INTO data-nam data-name] e] 3. [INVALID [INVALID KEY ] nt>] 4. [NOT INVALID KEY ] statement>] 5. ENDEND-RE READ AD
For example, DISPLAY 'Enter course ID:' WITH NO ADVANCING ACCEPT Course-ID READ Course-File INVALID KEY DISPLAY 'No such record' NOT INVALID KEY PERFORM Display-Course-Rec END-READ
The effect is that, if a record with the specified value in the key field exists in the file, that record is read into the file's data record (and is then copied into the data item specified in the INTO clause, if present), and, if present, the imperative statement in the NOT INVALID KEY clause is executed. Otherwise, if the INVALID KEY clause is present, the imperative statement there is executed. To write a record into an INDEXED file opened in I-O (or OUTPUT??) mode: 1. Place desired desired content contentss into file's file's data data record (or (or the data data item specifi specified ed in the FROM clause). 2. WRITE WRITE d> 3. [INVALID [INVALID KEY 4. [NOT INVALID KEY 5. ENDEND-WR WRIT ITE E
[FROM [FROM dat data-na a-name] me] ] nt>] ] statement>]
Example: WRITE Employee-Rec FROM Temp-Empl-Rec INVALID KEY DISPLAY 'Cannot WRITE; record with same key exists' NOT INVALID KEY DISPLAY 'WRITE is successful' END-WRITE
The effect is that, if the file contains no record whose key field matches that currently in the file's data record (or, in the case that the FROM clause is present, that currently in the specified data item), the data record is written, as a new record, into the file, and, if the NOT INVALID INVALID KEY clause is present, the imperative statement there is executed. Otherwise (i.e., there exists a record in the file whose key field equals that of the data record), if the INVALID INVALID KEY clause is present, the imperative statement there is executed. To replace a record in an INDEXED file opened in I-O mode: 1. READ the record record to be replaced replaced (into (into the the file's file's data record) record).. 2. Change the contents contents of the the data record record (but not not the key field). field). 3. REWRITE e> [FROM data-name] data-name] 4. [INVALID [INVALID KEY ] nt>] 5. [NOT INVALID KEY ] statement>] 6. ENDEND-RE REWR WRIT ITE E
Example: REWRITE Employee-Rec INVALID KEY DISPLAY 'Cannot REWRITE; no record with that key exists' NOT INVALID KEY DISPLAY 'Record rewritten successfully' END-REWRITE
The effect is that, if there exists a record in the file having the same value in its key field as the file's data record (or, in the case that the FROM clause is present, the data item specified there), that record is replaced by b y the contents of the data record (or the FROM data item) and the NOT INVALID INVALID KEY clause's imperative statement is ex ecuted. Otherwise, the INVALID INVALID KEY clause's imperative statement is executed. The difference between REWRITE and WRITE is that the former can only replace an existing record whereas the latter can only on ly insert a new record. To delete a record in an INDEXED file opened in I-O mode: mode : 1. READ the the record record (into (into the file's file's data data record) record) (Questi (Question: on: Depending Depending upon upon the implementation, it may suffice to place the desired value into the key field of the file's data record, without necessarily doing so by reading the corresponding record. However, it is a good idea to READ first anyway, anyway, just to verify that the record to be deleted is really there.) 2. DELET DELETE E e> 3. [INVALID [INVALID KEY 4. [NOT INVALID KEY 5. ENDEND-DE DELE LETE TE
] nt>] ] statement>]
Example: DISPLAY 'Enter Course ID of course to be cancelled:'
ACCEPT Course-ID DELETE Course-File INVALID KEY CONTINUE NOT INVALID KEY DISPLAY 'Record deleted successfully' END-DELETE
The effect is that, if the file contains a record whose key field matches that of the file's data record, that record is deleted from the file and the imperative statement in the NOT INVALID INVALID KEY clause, if present, is executed. Otherwise, the imperative statement in the INVALID KEY clause, if present, is executed. Note that, in order to apply app ly either the REWRITE or DELETE verb to a record, the most recent I/O operation must have been a successful READ of that record. (Warning: This statement may be incorrect.)
Sequential Access form of READ, WRITE, and REWRITE for INDEXED files This section pertains to an INDEXED file for which, in the program under consideration, the ACCESS MODE has been specified to be either SEQUENTIAL or DYNAMIC. To position the file pointer (i.e., to seek) to the first record satisfying a specified condition in an INDEXED file opened in I-O or INPUT mode: START KEY IS { =, >, NOT <, >= } [INVALID KEY ] [NOT INVALID KEY ] END-START
NOTE: Some compilers may require that data-name be declared as the RECORD KEY of the file (in the SELECT clause in the ENVIRONMENT DIVISION). Some compilers require the INVALID INVALID KEY clause to be present.
Example: MOVE 'Jones00001' TO Employee-ID START Employee-File KEY IS NOT < Employee-ID INVALID KEY DISPLAY 'something wrong' NOT INVALID KEY CONTINUE END-START
The effect is to place the file pointer to the first record (i.e., the one having smallest key value) satisfying the condition specified, so that a sequential READ will cause that to be the record read in. If no record satisfies the specified condition (e.g., the key value sought is larger than any in the file), the imperative statement in the INVALID INVALID KEY clause, if present, is executed. Otherwise, the imperative statement in the NOT INVALID INVALID KEY clause, if present, is executed.
To read "the next" record (i.e., the one following the record most recently read, or the one "found" by an application of the START verb) in an INDEXED file opened in either INPUT or I-O mode: READ NEXT RECORD [INTO data-name] [AT END ] [NOT AT END ] END-READ
Example: MOVE 'Jones00001' TO Employee-ID START Employee-File KEY IS NOT < Employee-ID INVALID KEY DISPLAY '*** Error ***' NOT INVALID KEY PERFORM UNTIL Finished OR (Employee-ID > 'Smith99999') READ Employee-File NEXT RECORD AT END SET Finished TO TRUE NOT AT END PERFORM Process-Empl-Rec END-READ END-PERFORM END-START
To replace the record most recently read from an INDEXED file opened in I-O mode: REWRITE [FROM data-name]
To write a new record (necessarily (nec essarily having a larger key than any already in the file??) into an INDEXED file opened in I-O (or OUTPUT?) mode: WRITE [FROM data-name]
????
Random observations based on program testing A syntax error occurs if you attempt to open an INDEXED file in EXTEND mode . When a file whose SELECT clause specifies DYNAMIC ACCESS mode is opened in OUTPUT or I-O mode: •
•
•
SEQUENTIAL WRITE seems to place the record in the "right" place (according to its key value), rather than at the end of file. Sequential REWRITE replaces the record at the current file pointer position (i.e., the position of the last record read in by either sequential or random READ, or the one found via START) If sequential REWRITE immediately follows STAR START T, then a sequential READ will read in the rewritten record! However, if REWRITE follows a READ (seq or ran), the next record read by a SEQ READ is the following one.
•
•
REWRITE (both kinds) need not be preceded by a READ of the record to be rewritten. There seems to be no difference between the two kinds of REWRITEs! Oddly, Oddly, trying to REWRITE with a non-present key field causes ia run-time error, rather than the INVALID INVALID KEY clause being fired. When in SEQUENTIAL ACCESS mode, for some reason the program must include a random WRITE and/or REWRITE, if there is a sequential one. However, I could not get WRITE (sequential or random) to work at all. Ea ch time, a run-time error occurred. However, both random and sequential REWRITE works! (However, the key of new record must match that of record just read in. Having seeked to a record via START START is not enough.)