Intel 8086 microprocessor architecture Memory Program, data and stack memories occupy the same memory space. The total addressable memory size is 1MB KB. As the most of the processor instructions use 16-bit pointers the processor can effectively address only 64 KB of memory. To access memory outside of 64 KB the CPU uses special segment registers to specify where the code, stack and data 64 KB segments are positioned within 1 MB of memory (see the "Registers" section below). 16-bit pointers and data are stored as: address: low-order byte address+1: high-order byte 32-bit addresses are stored in "segment:offset" format as: address: low-order byte of segment address+1: high-order byte of segment address+2: low-order byte of offset address+3: high-order byte of offset Physical memory address pointed by segment:offset pair is calculated as: address = ( * 16) + Program memory - program can be located anywhere in memory. Jump and call instructions can be used for short jumps within currently selected 64 KB code segment, as well as for far jumps anywhere within 1 MB of memory. All conditional jump instructions can be used to jump within approximately +127 - -127 bytes from current instruction. Data memory - the processor can access data in any one out of 4 available segments, which limits the size of accessible memory to 256 KB (if all four segments point to different 64 KB blocks). Accessing data from the Data, Code, Stack or Extra segments can be usually done by prefixing instructions with the DS:, CS:, SS: or ES: (some registers and instructions by default may use the ES or SS segments instead of DS segment). Word data can be located at odd or even byte boundaries. The processor uses two memory accesses to read 16-bit word located at odd byte boundaries. Reading word data from even byte boundaries requires only one memory access. Stack memory can be placed anywhere in memory. The stack can be located at odd memory addresses, but it is not recommended for performance reasons (see "Data Memory" above). Reserved locations: • •
0000h - 03FFh are reserved for interrupt vectors. Each interrupt vector is a 32-bit pointer in format segment:offset. FFFF0h - FFFFFh - after RESET the processor always starts program execution at the FFFF0h address.
Interrupts The processor has the following interrupts: INTR is a maskable hardware interrupt. The interrupt can be enabled/disabled using STI/CLI instructions or using more complicated method of updating the FLAGS register with the help of the POPF instruction. When an interrupt occurs, the processor stores FLAGS register into stack, disables further interrupts, fetches from the bus one byte representing interrupt type, and jumps to interrupt processing routine address of which is stored in location 4 * . Interrupt processing routine should return with the IRET instruction. NMI is a non-maskable interrupt. Interrupt is processed in the same way as the INTR interrupt. Interrupt type of the NMI is 2, i.e. the address of the NMI processing routine is stored in location 0008h. This interrupt has higher priority then the maskable interrupt. Software interrupts can be caused by: • •
INT instruction - breakpoint interrupt. This is a type 3 interrupt. INT instruction - any one interrupt from available 256 interrupts.
1
• • •
INTO instruction - interrupt on overflow Single-step interrupt - generated if the TF flag is set. This is a type 1 interrupt. When the CPU processes this interrupt it clears TF flag before calling the interrupt processing routine. Processor exceptions: divide error (type 0), unused opcode (type 6) and escape opcode (type 7).
Software interrupt processing is the same as for the hardware interrupts. I/O ports 65536 8-bit I/O ports. These ports can be also addressed as 32768 16-bit I/O ports. Registers Most of the registers contain data/instruction offsets within 64 KB memory segment. There are four different 64 KB segments for instructions, stack, data and extra data. To specify where in 1 MB of processor memory these 4 segments are located the processor uses four segment registers: Code segment (CS) is a 16-bit register containing address of 64 KB segment with processor instructions. The processor uses CS segment for all accesses to instructions referenced by instruction pointer (IP) register. CS register cannot be changed directly. The CS register is automatically updated during far jump, far call and far return instructions. Stack segment (SS) is a 16-bit register containing address of 64KB segment with program stack. By default, the processor assumes that all data referenced by the stack pointer (SP) and base pointer (BP) registers is located in the stack segment. SS register can be changed directly using POP instruction. Data segment (DS) is a 16-bit register containing address of 64KB segment with program data. By default, the processor assumes that all data referenced by general registers (AX, BX, CX, DX) and index register (SI, DI) is located in the data segment. DS register can be changed directly using POP and LDS instructions. Extra segment (ES) is a 16-bit register containing address of 64KB segment, usually with program data. By default, the processor assumes that the DI register references the ES segment in string manipulation instructions. ES register can be changed directly using POP and LES instructions. It is possible to change default segments used by general and index registers by prefixing instructions with a CS, SS, DS or ES prefix. All general registers of the 8086 microprocessor can be used for arithmetic and logic operations. The general registers are: Accumulator register consists of 2 8-bit registers AL and AH, which can be combined together and used as a 16-bit register AX. AL in this case contains the low-order byte of the word, and AH contains the high-order byte. Accumulator can be used for I/O operations and string manipulation. Base register consists of 2 8-bit registers BL and BH, which can be combined together and used as a 16-bit register BX. BL in this case contains the low-order byte of the word, and BH contains the high-order byte. BX register usually contains a data pointer used for based, based indexed or register indirect addressing. Count register consists of 2 8-bit registers CL and CH, which can be combined together and used as a 16-bit register CX. When combined, CL register contains the low-order byte of the word, and CH contains the high-order byte. Count register can be used as a counter in string manipulation and shift/rotate instructions. Data register consists of 2 8-bit registers DL and DH, which can be combined together and used as a 16-bit register DX. When combined, DL register contains the low-order byte of the word, and DH contains the high-order byte. Data register can be used as a port number in I/O operations. In integer 32-bit multiply and divide instruction the DX register contains high-order word of the initial or resulting number. The following registers are both general and index registers: Stack Pointer (SP) is a 16-bit register pointing to program stack. Base Pointer (BP) is a 16-bit register pointing to data in stack segment. BP register is usually used for based, based indexed or register indirect addressing.
2
Source Index (SI) is a 16-bit register. SI is used for indexed, based indexed and register indirect addressing, as well as a source data address in string manipulation instructions. Destination Index (DI) is a 16-bit register. DI is used for indexed, based indexed and register indirect addressing, as well as a destination data address in string manipulation instructions. Other registers: Instruction Pointer (IP) is a 16-bit register. Flags is a 16-bit register containing 9 1-bit flags: • • • • • • • • •
Overflow Flag (OF) - set if the result is too large positive number, or is too small negative number to fit into destination operand. Direction Flag (DF) - if set then string manipulation instructions will auto-decrement index registers. If cleared then the index registers will be auto-incremented. Interrupt-enable Flag (IF) - setting this bit enables maskable interrupts. Single-step Flag (TF) - if set then single-step interrupt will occur after the next instruction. Sign Flag (SF) - set if the most significant bit of the result is set. Zero Flag (ZF) - set if the result is zero. Auxiliary carry Flag (AF) - set if there was a carry from or borrow to bits 0-3 in the AL register. Parity Flag (PF) - set if parity (the number of "1" bits) in the low-order byte of the result is even. Carry Flag (CF) - set if there was a carry from or borrow to the most significant bit during last result calculation.
Instruction Set 8086 instruction set consists of the following instructions: • • • • • • •
Data moving instructions. Arithmetic - add, subtract, increment, decrement, convert byte/word and compare. Logic - AND, OR, exclusive OR, shift/rotate and test. String manipulation - load, store, move, compare and scan for byte/word. Control transfer - conditional, unconditional, call subroutine and return from subroutine. Input/Output instructions. Other - setting/clearing flag bits, stack operations, software interrupts, etc.
Addressing modes Implied - the data value/data address is implicitly associated with the instruction. Register - references the data in a register or in a register pair. Immediate - the data is provided in the instruction. Direct - the instruction operand specifies the memory address where data is located. Register indirect - instruction specifies a register containing an address, where data is located. This addressing mode works with SI, DI, BX and BP registers. Based - 8-bit or 16-bit instruction operand is added to the contents of a base register (BX or BP), the resulting value is a pointer to location where data resides. Indexed - 8-bit or 16-bit instruction operand is added to the contents of an index register (SI or DI), the resulting value is a pointer to location where data resides. Based Indexed - the contents of a base register (BX or BP) is added to the contents of an index register (SI or DI), the resulting value is a pointer to location where data resides. Based Indexed with displacement - 8-bit or 16-bit instruction operand is added to the contents of a base register (BX or BP) and index register (SI or DI), the resulting value is a pointer to location where data resides.
3
8086 CPU ARCHITECTURE The microprocessors functions as the CPU in the stored program model of the digital computer. Its job is to generate all system timing signals and synchronize the transfer of data between memory, I/O, and itself. It accomplishes this task via the three-bus system architecture previously discussed. The microprocessor also has a S/W function. It must recognize, decode, and execute program instructions fetched from the memory unit. This requires an Arithmetic-Logic Unit (ALU) within the CPU to perform arithmetic and logical (AND, OR, NOT, compare, etc) functions. The 8086 CPU is organized as two separate processors, called the Bus Interface Unit (BIU) and the Execution Unit (EU). The BIU provides H/W functions, including generation of the memory and I/O addresses for the transfer of data between the outside world -outside the CPU, that is- and the EU. The EU receives program instruction codes and data from the BIU, executes these instructions, and store the results in the general registers. By passing the data back to the BIU, data can also be stored in a memory location or written to an output device. Note that the EU has no connection to the system buses. It receives and outputs all its data thru the BIU.
The only difference between an 8088 microprocessor and an 8086 microprocessor is the BIU. In the 8088, the BIU data bus path is 8 bits wide versus the 8086's 16-bit data bus. Another difference is that the 8088 instruction queue is four bytes long instead of six. The important point to note, however, is that because the EU is the same for each processor, the programming instructions are exactly the same for each. Programs written for the 8086 can be run on the 8088 without any changes. FETCH AND EXECUTE Although the 8086/88 still functions as a stored program computer, organization of the CPU into a separate BIU and EU allows the fetch and execute cycles to overlap. To see this, consider what happens when the 8086 or 8088 is first started. 1. The BIU outputs the contents of the instruction pointer register (IP) onto the address bus, causing the selected byte or word to be read into the BIU. 2. Register IP is incremented by 1 to prepare for the next instruction fetch.
4
3. Once inside the BIU, the instruction is passed to the queue. This is a first-in, first-out storage register sometimes likened to a "pipeline". 4. Assuming that the queue is initially empty, the EU immediately draws this instruction from the queue and begins execution. 5. While the EU is executing this instruction, the BIU proceeds to fetch a new instruction. Depending on the execution time of the first instruction, the BIU may fill the queue with several new instructions before the EU is ready to draw its next instruction.
The BIU is programmed to fetch a new instruction whenever the queue has room for one (with the 8088) or two (with the 8086) additional bytes. The advantage of this pipelined architecture is that the EU can execute instructions almost continually instead of having to wait for the BIU to fetch a new instruction. There are three conditions that will cause the EU to enter a "wait" mode. The first occurs when an instruction requires access to a memory location not in the queue. The BIU must suspend fetching instructions and output the address of this memory location. After waiting for the memory access, the EU can resume executing instruction codes from the queue (and the BIU can resume filling the queue). The second condition occurs when the instruction to be executed is a "jump" instruction. In this case control is to be transferred to a new (nonsequential) address. The queue, however, assumes that instructions will always be executed in sequence and thus will be holding the "wrong" instruction codes. The EU must wait while the instruction at the jump address is fetched. Note that any bytes presently in the queue must be discarded (they are overwritten). One other condition can cause the BIU to suspend fetching instructions. This occurs during execution of instructions that are slow to execute. For example, the instruction AAM (ASCII Adjust for Multiplication) requires 83 clock cycles to complete. At four cycles per instruction fetch, the queue will be completely filled during the execution of this single instruction. The BIU will thus have to wait for the EU to pull over one or two bytes from the queue before resuming the fetch cycle. A subtle advantage to the pipelined architecture should be mentioned. Because the next several instructions are usually in the queue, the BIU can access memory at a somewhat "leisurely" pace. This means that slow-mem parts can be used without affecting overall system performance. PROGRAMING MODEL As a programmer of the 8086 or 8088 you must become familiar with the various registers in the EU and BIU.
5
The data group consists of the accumulator and the BX, CX, and DX registers. Note that each can be accessed as a byte or a word. Thus BX refers to the 16-bit base register but BH refers only to the higher 8 bits of this register. The data registers are normally used for storing temporary results that will be acted on by subsequent instructions. The pointer and index group are all 16-bit registers (you cannot access the low or high bytes alone). These registers are used as memory pointers. Sometimes a pointer reg will be interpreted as pointing to a memory byte and at other times a memory word. As you will see, the 8086/88 always stores words with the high-order byte in the high-order word address. Register IP could be considered in the previous group, but this register has only one function -to point to the next instruction to be fetched to the BIU. Register IP is physically part of the BIU and not under direct control of the programmer as are the other pointer registers. Six of the flags are status indicators, reflecting properties of the result of the last arithmetic or logical instructions. The 8086/88 has several instructions that can be used to transfer program control to a new memory location based on the state of the flags. Three of the flags can be set or reset directly by the programmer and are used to control the operation of the processor. These are TF, IF, and DF. The final group of registers is called the segment group. These registers are used by the BIU to determine the memory address output by the CPU when it is reading or writing from the memory unit. To fully understand these registers, we must first study the way the 8086/88 divides its memory into segments. SEGMENTED MEMORY Even though the 8086 is considered a 16-bit processor, (it has a 16-bit data bus width) its memory is still thought of in bytes. At first this might seem a disadvantage: Why saddle a 16-bit microprocessor with an 8-bit memory? Actually, there are a couple of good reasons. First, it allows the processor to work on bytes as well as words. This is especially important with I/O devices such as printers, terminals, and modems, all of which are designed to transfer ASCII-encoded (7- or 8bit) data. Second, many of the 8086's (and 8088's) operation codes are single bytes. Other instructions may require anywhere from two to seven bytes. By being able to access individual bytes, these odd-length instructions can be handled. We have already seen that the 8086/88 has a 20-bit address bus, allowing it to output 210, or 1'048.576, different memory addresses. As you can see, 524.288 words can also be visualized.
6
As mentioned, the 8086 reads 16 bits from memory by simultaneously reading an odd-addressed byte and an even-addressed byte. For this reason the 8086 organizes its memory into an even-addressed bank and an odd-addressed bank. With regard to this, you might wonder if all words must begin at an even address. Well, the answer is yes. However, there is a penalty to be paid. The CPU must perform two memory read cycles: one to fetch the low-order byte and a second to fetch the high-order byte. This slows down the processor but is transparent to the programmer. The last few paragraphs apply only to the 8086. The 8088 with its 8-bit data bus interfaces to the 1 MB of memory as a single bank. When it is necessary to access a word (whether on an even- or an odd-addressed boundary) two memory read (or write) cycles are performed. In effect, the 8088 pays a performance penalty with every word access. Fortunately for the programmer, except for the slightly slower performance of the 8088, there is no difference between the two processors. MEMORY MAP Still another view of the 8086/88 memory space could be as 16 64K-byte blocks beginning at hex address 000000h and ending at address 0FFFFFh. This division into 64K-byte blocks is an arbitrary but convenient choice. This is because the most significant hex digit increments by 1 with each additional block. That is, address 20000h is 65.536 bytes higher in memory than address 10000h. Be sure to note that five hex digits are required to represent a memory address.
The diagram is called a memory map. This is because, like a road map, it is a guide showing how the system memory is allocated. This type of information is vital to the programmer, who must know exactly where his or her programs can be safely loaded. Note that some memory locations are marked reserved and others dedicated. The dedicated locations are used for processing specific system interrupts and the reset function. Intel has also reserved several locations for future H/W and S/W products. If you make use of these memory locations, you risk incompatibility with these future products. SEGMENT REGISTERS Within the 1 MB of memory space the 8086/88 defines four 64K-byte memory blocks called the code segment, stack segment, data segment, and extra segment. Each of these blocks of memory is used differently by the processor. The code segment holds the program instruction codes. The data segment stores data for the program. The extra segment is an extra data segment (often used for shared data). The stack segment is used to store interrupt and subroutine return addresses. You should realize that the concept of the segmented memory is a unique one. Older-generation microprocessors such as the 8-bit 8086 or Z-80 could access only one 64K-byte segment. This mean that the programs instruction, data and subroutine stack all had to share the same memory. This limited the amount of memory available for the program itself and led to disaster if the stack should happen to overwrite the data or program areas.
7
The four segment registers (CS, DS, ES, and SS) are used to "point" at location 0 (the base address) of each segment. This is a little "tricky" because the segment registers are only 16 bits wide, but the memory address is 20 bits wide. The BIU takes care of this problem by appending four 0's to the low-order bits of the segment register. In effect, this multiplies the segment register contents by 16.
The point to note is that the beginning segment address is not arbitrary -it must begin at an address divisible by 16. Another way if saying this is that the low-order hex digit must be 0. Also note that the four segments need not be defined separately. Indeed, it is allowable for all four segments to completely overlap (CS = DS = ES = SS). Memory locations not defined to be within one of the current segments cannot be accessed by the 8086/88 without first redefining one of the segment registers to include that location. Thus at any given instant a maximum of 256 K (64K * 4) bytes of memory can be utilized. As we will see, the contents of the segment registers can only be specified via S/W. As you might imagine, instructions to load these registers should be among the first given in any 8086/88 program. LOGICAL AND PHYSICAL ADDRESS Addresses within a segment can range from address 00000h to address 0FFFFh. This corresponds to the 64K-byte length of the segment. An address within a segment is called an offset or logical address. A logical address gives the displacement from the address base of the segment to the desired location within it, as opposed to its "real" address, which maps directly anywhere into the 1 MB memory space. This "real" address is called the physical address. What is the difference between the physical and the logical address? The physical address is 20 bits long and corresponds to the actual binary code output by the BIU on the address bus lines. The logical address is an offset from location 0 of a given segment.
8
When two segments overlap it is certainly possible for two different logical addresses to map to the same physical address. This can have disastrous results when the data begins to overwrite the subroutine stack area, or vice versa. For this reason you must be very careful when segments are allowed to overlap. You should also be careful when writing addresses on paper to do so clearly. To specify the logical address XXXX in the stack segment, use the convention SS:XXXX, which is equal to [SS] * 16 + XXXX. ADVANTAGES OF SEGMENTED MEMORY Segmented memory can seem confusing at first. What you must remember is that the program op-codes will be fetched from the code segment, while program data variables will be stored in the data and extra segments. Stack operations use registers BP or SP and the stack segment. As we begin writing programs the consequences of these definitions will become clearer. An immediate advantage of having separate data and code segments is that one program can work on several different sets of data. This is done by reloading register DS to point to the new data. Perhaps the greatest advantage of segmented memory is that programs that reference logical addresses only can be loaded and run anywhere in memory. This is because the logical addresses always range from 00000h to 0FFFFh, independent of the code segment base. Such programs are said to be relocatable, meaning that they will run at any location in memory. The requirements for writing relocatable programs are that no references be made to physical addresses, and no changes to the segment registers are allowed. REFERENCE
Books The 80x86 IBM PC and Compatible Computers (Vol 1 and Vol 2) Microcomputer Systems: The 8086/8088 Family
9
Introduction to 8086 Assembly Language CS 272 Sam Houston State University Dr. Tim McGuire
• •
Initialized variables take up space in the program's code file Declare uninitialized variables after initialized ones so they do not take up space in the program's code file
Structure of an assembly language program Reserving space for variables •
Assembly language programs divide roughly into five sections o header o equates o data o body o closing
The Header • •
•
.data numRows DB 25 numColumns DB ? videoBase DW 0800h •
The header contains various directives which do not produce machine code Sample header:
%TITLE "Sample Header" .8086 .model small .stack 256
• •
• • • •
Symbolic names associated with storage locations represent addresses Named constants are symbols created to represent specific values determined by an expression Named constants can be numeric or string Some named constants can be redefined No storage is allocated for these values
•
• •
= is used for numeric values only Cannot change value of EQU symbol EQUated symbols are not variables EQU expressions are evaluated where used; = expressions are evaluated where defined
The Data Segment • •
Begins with the .data directive Two kinds of variables, initialized and uninitialized.
Pseudo-ops to define data or reserve storage o DB - byte(s) o DW - word(s) o DD - doubleword(s) o DQ - quadword(s) o DT - tenbyte(s) These directives require one or more operands o define memory contents o specify amount of storage to reserve for runtime data
Defining Data
Constant values are known as equates Sample equate section: Count EQU 10 Element EQU 5 Size = Count * Element MyString EQU "Maze of twisty passages" Size = 0
• • • •
•
•
Equates
DB and DW are common directives (define byte) and (define word) The symbols associated with variables are called labels Strings may be declared using the DB directive:
aTOm DB "ABCDEFGHIJKLM" Program Data and Storage
Named Constants •
Sample DATA SEGMENT
• •
Numeric data values o 100 - decimal o 100b - binary o 100h - hexadecimal o '100' - ASCII o "100" - ASCII Use the appropriate DEFINE directive (byte, word, etc.) A list of values may be used - the following creates 4 consecutive words DW 40Ch,10b,-13,0
•
A ? represents an uninitialized storage location
DB 255,?,-128,'X' Naming Storage Locations •
Names can be associated with storage locations ANum DB -4 DW 17 ONE
10
UNO DW 1 X DD ? • • • • •
These names are called variables ANum refers to a byte storage location, initialized to FCh The next word has no associated name ONE and UNO refer to the same word X is an uninitialized doubleword
Arrays •
Any consecutive storage locations of the same size can be called an array X DW 040Ch,10b,-13,0 Y DB 'This is an array' Z DD -109236, FFFFFFFFh, -1, 100b
• • •
Components of X are at X, X+2, X+4, X+6 Components of Y are at Y, Y+1, …, Y+15 Components of Z are at Z, Z+4, Z+8, Z+12
DUP • •
Allows a sequence of storage locations to be defined or reserved Only used as an operand of a define directive
DB 40 DUP(?) DW 10h DUP(0) DB 3 DUP("ABC") DB 4 DUP(3 DUP (0,1), 2 DUP('$')) Word Storage •
Word, doubleword, and quadword data are stored in reverse byte order (in memory)
Directive Bytes in Storage DW 256 00 01 DD 1234567h 67 45 23 01 DQ 10 0A 00 00 00 00 00 00 00 X DW 35DAh DA 35 Low byte of X is at X, high byte of X is at X+1 The Program Body • • • • •
Also known as the code segment Divided into four columns: labels, mnemonics, operands, and comments Labels refer to the positions of variables and instructions, represented by the mnemonics Operands are required by most assembly language instructions Comments aid in remembering the purpose of various instructions
An example
Label Mnemonic Operand Comment --------------------------------------------------------.data exCode DB 0 ;A byte variable myWord DW ? ;Uninitialized word var. .code MAIN: mov ax,@data ;Initialize DS to address mov ds,ax ; of data segment jmp Exit ;Jump to Exit label mov cx,10 ;This line skipped! Exit: mov ah,04Ch ;DOS function: Exit prog mov al, exCode ;Return exit code value int 21h ;Call DOS. Terminate prog END MAIN ;End Program and specify entry point The Label Field • • • • • • •
Labels mark places in a program which other instructions and directives reference Labels in the code segment always end with a colon Labels in the data segment never end with a colon Labels can be from 1 to 31 characters long and may consist of letters, digits, and the special characters ? . @_$% If a period is used, it must be the first character Labels must not begin with a digit The assembler is case insensitive
Legal and Illegal Labels •
•
Examples of legal names o COUNTER1 o @character o SUM_OF_DIGITS o $1000 o DONE? o .TEST Examples of illegal names o TWO WORDS contains a blank o 2abc begins with a digit o A45.28 . not first character o YOU&ME contains an illegal character
The Mnemonic Field • • • • •
For an instruction, the operation field contains a symbolic operation code (opcode) The assembler translates a symbolic opcode into a machine language opcode Examples are: ADD, MOV, SUB In an assembler directive, the operation field contains a directive (pseudo-op) Pseudo-ops are not translated into machine code; they tell the assembler to do something
The Operand Field •
For an instruction, the operand field specifies the data that are to be acted on by the instruction. May have zero, one, or two operands
11
• •
NOP ;no operands -- does • END is a pseudo-op; the single "operand" is the label nothing specifying the beginning of execution, usually the first INC AX ;one operand -- adds 1 instruction after the .code pseudo-op to the contents of AX ADD WORD1,2 ;two operands -Assembling a Program adds 2 to the contents ; of memory word WORD1 • The source file of an assembly language program is usually named with an extension of .asm In a two-operand instruction, the first operand is the destination operand. The second operand is edit myprog.asm the source operand. For an assembler directive, the operand field • The source file is processed (assembled) by the usually contains more information about the assembler (TASM) to produce an object file (.obj) directive. tasm myprog produces myprog.obj
The Comment Field
• The object file must be linked by the linker (TLINK) A semicolon marks the beginning of a comment to produce an executable file (.exe) field • The assembler ignores anything typed after the tlink myprog produces myprog.exe semicolon on that line Dealing with Errors • It is almost impossible to understand an assembly language program without good comments • TASM will report the line number and give an error message for each error it finds • Good programming practice dictates a comment on almost every line • Sometimes it is helpful to have a listing file (.lst), created by using TASM with the -l option • The .lst file contains a complete listing of the program, Good and Bad Comments along with line numbers, object code bytes, and the symbol table • Don't say something obvious, like •
MOV CX,0 ;move 0 to CX •
Instead, put the instruction into the context of the program MOV CX,0 ;CX counts terms, initially 0
•
The Closing
•
• • • •
Useful for logic errors that the assembler misses See the text for a complete tutorial You do not need to use the TDH386.SYS driver or the TD386.EXE debugger with the latest version of the assembler To use the debugger on myprog.asm
An entire line can be a comment, or be used to create visual space in a program ; ; Initialize registers ; MOV AX,0 MOV BX,0
•
Using the Debugger
The last lines of an assembly language program are the closing Indicates to assembler that it has reached the end of the program and where the entry point is MAIN ENDP ;End of program END MAIN ; entry point for linker use
tasm /zi myprog tlink /v myprog td myprog .COM and .EXE files • • • • • •
The .COM code file format is a relic of the first version of MS-DOS Not recommended for general purposes All code, data, and the stack occupy one 64K segment (Borland's "tiny" model) .EXE code files are more efficient in use of RAM Data and code occupy separate segments The programmer is responsible for setting up the data and code segments properly
Ending a Program • • •
All programs, upon termination, must return control back to another program -- the operating system Under MS-DOS, this is COMMAND.COM This is done by doing a DOS system call
12
Data Transfer Instructions
•
•
XCHG destination,source o reg, reg o reg, mem o mem, reg MOV and XCHG cannot perform memory to memory moves This provides an efficient means to swap the operands o No temporary storage is needed o Sorting often requires this type of operation o This works only with the general registers
MOV destination,source o reg, reg o mem, reg • o reg, mem • o mem, immed o reg, immed • Sizes of both operands must be the same • reg can be any non-segment register except IP cannot be the target register • MOV's between a segment register and memory Arithmetic Instructions ADD dest, source or a 16-bit register are possible SUB dest, source INC dest Examples DEC dest NEG dest • mov ax, word1 o "Move word1 to ax" • Operands must be of the same size o Contents of register ax are replaced by • source can be a general register, memory location, or the contents of the memory location constant word1 • dest can be a register or memory location • xchg ah, bl o except operands cannot both be memory o Swaps the contents of ah and bl • Illegal: mov word1, word2 ADD and INC o can't have both operands be memory locations • ADD is used to add the contents of o two registers Sample MOV Instructions o a register and a memory location b db 4Fh w dw 2048 o a register and a constant mov bl,dh • INC is used to add 1 to the contents of a register or mov ax,w memory location mov ch,b mov al,255 Examples mov w,-100 mov b,0 • add ax, word1 o "Add word1 to ax" • When a variable is created with a define o Contents of register ax and memory location directive, it is assigned a default size attribute word1 are added, and the sum is stored in ax (byte, word, etc) • inc ah • You can assign a size attribute using LABEL o Adds one to the contents of ah • Illegal: add word1, word2 LoByte LABEL BYTE o can't have both operands be memory locations aWord DW 97F2h Addresses with Displacements SUB, DEC, and NEG b db 4Fh, 20h, 3Ch w dw 2048, -100, 0 mov bx, w+2 • SUB is used to subtract the contents of mov b+1, ah o one register from another register mov ah, b+5 o a register from a memory location, or vice mov dx, w-3 versa o a constant from a register • Type checking is still in effect • DEC is used to subtract 1 from the contents of a register or memory location • The assembler computes an address based on the expression • NEG is used to negate the contents of a register or memory location • NOTE: These are address computations done at assembly time Examples MOV ax,b-1 will not subtract 1 from the value stored at b eXCHanGe
•
sub ax, word1 o "Subtract word1 from ax"
13
o • •
Contents of memory location word1 is subtracted from the contents of register ax, and the sum is stored in ax
dec bx o Subtracts one from the contents of bx Illegal: sub byte1, byte2 o can't have both operands be memory locations
Type Agreement of Operands •
The operands of two-operand instructions must be of the same type (byte or word) o mov ax, bh ;illegal o mov ax, byte1 ;illegal o mov ah,'A' ;legal -- moves 41h into ah o mov ax,'A' ;legal -- moves 0041h into ax
o o o o
Program Skeleton .MODEL small .STACK 100h .DATA ;declarations .CODE MAIN: ;main proc code ;return to DOS ;other procs (if any) go here end MAIN
Translation of HLL Instructions •
•
o memory-memory moves are illegal A = B - 2*A
mov ax,B sub ax,A sub ax,A mov A,ax Program Segment Structure •
• •
•
Data Segments o Storage for variables o Variable addresses are computed as offsets from start of this segment Code Segment o contains executable instructions Stack Segment o used to set aside storage for the stack o Stack addresses are computed as offsets into this segment Segment directives .DATA .CODE .STACK size
Memory Models •
Select a memory model Define the stack size Declare variables Write code
B=A
mov ax,A mov B,ax
•
compact: code<=64K, one code segment large: multiple code and data segments huge: allows individual arrays to exceed 64K flat: no segments, 32-bit addresses, protected mode only (80386 and higher)
.Model memory_model o tiny: code+data <= 64K (.com program) o small: code<=64K, data<=64K, one of each o medium: data<=64K, one data segment
organize into procedures
Mark the end of the source file •
define the entry point
Input and Output Using 8086 Assembly Language •
•
Most input and output is not done directly via the I/O ports, because o port addresses vary among computer models o it's much easier to program I/O with the service routines provided by the manufacturer There are BIOS routines (which we'll look at later) and DOS routines for handling I/O (using interrupt number 21h)
Interrupts •
•
The interrupt instruction is used to cause a software interrupt (system call) o An interrupt interrupts the current program and executes a subroutine, eventually returning control to the original program o Interrupts may be caused by hardware or software int interrupt_number ;software interrupt
Output to Monitor •
•
DOS Interrupts : interrupt 21h o This interrupt invokes one of many support routines provided by DOS o The DOS function is selected via AH o Other registers may serve as arguments AH = 2, DL = ASCII of character to output
14
o
Character is displayed at the current cursor position, the cursor is advanced, AL = DL
Output a String •
Interrupt 21h, function 09h o DX = offset to the string (in data segment) o The string is terminated with the '$' character To place the address of a variable in DX, use one of the following o lea DX,theString ;load effective address o mov DX, offset theString ;immediate data
• •
Interrupt 21h, function 01h Filtered input with echo o This function returns the next character in the keyboard buffer (waiting if necessary) o The character is echoed to the screen o AL will contain the ASCII code of the noncontrol character AL=0 if a control character was entered
An Example Program %TITLE "Case Conversion" .8086 .MODEL small .STACK 256 .DATA MSG1 DB 'Enter a lower case letter: $' MSG2 DB 0Dh,0Ah,'In upper case it is: ' CHAR DB ?,'$' Print String Example exCode DB 0 %TITLE "First Program -- HELLO.ASM" .CODE .8086 MAIN: .MODEL small ;initialize ds .STACK 256 mov ax,@data ; Initialize DS to address .DATA mov ds,ax ; of data segment msg DB "Hello, World!$" ;print user prompt .CODE mov ah,9 ; display string fcn MAIN: lea dx,MSG1 ; get first message mov ax,@data ;Initialize DS to int 21h ; display it address ;input a character and convert to upper case mov ds,ax ; of data segment mov ah,1 ; read char fcn lea dx,msg ;get message int 21h ; input char into AL mov ah,09h ;display string sub al,20h ; convert to upper case function mov CHAR,al ; and store it int 21h ;display message ;display on the next line Exit: mov ah,4Ch ;DOS function: Exit mov dx,offset MSG2 ; get second message program mov ah,9 ; display string function mov al,0 ;Return exit code value int 21h ; display message and upper case int 21h ;Call DOS. Terminate ;return to DOS program Exit: END MAIN ; End program/entry mov ah,4Ch ; DOS function: Exit program point mov al,exCode ; Return exit code value Input a Character int 21h ; Call DOS. Terminate program END MAIN ; End of program / entry point •
15