W RITTEN E XAM
2B1447 SOC Architectures 2004-12-13, 9.00-13.00
Material The student may use the following material in the exam: • Calculator
• Write your name on top of each paper. • Do not use a paper for more than one task. • Write only on the front side of each paper. • Motivate your answers. A correct answer with a wrong or no motivation can result in 0 points! • The exam consists of 4 tasks on 5 pages, which together give 32 points. • 16 points are required to pass the exam!
G OOD L UCK !
1
1. (8 POINTS) Given is the following floating-point format b5 b4 b3 b2 b1 b0 , where the value is calculated as −1b5 ∗ (1.b1 b0 )2 ∗ 2(b4 b3 b2 )2 −2 . (a) Give a representation for the decimal numbers 0.375 and -27.7. In case a number can not be represented give the representation for closest number and the quantization error. (b) Add the following numbers using the floating-point representation above: b5 b4 b3 b2 b1 b0 = 010001 and b5 b4 b3 b2 b1 b0 = 001011. Give the quantization error, if the result is stored with the same number of bits. (c) In a specific DSP application all decimal numbers in the range from -30 to +30 have to be represented with a maximal quantization error of 0.1. Do you need more bits, if you use a floating-point representation or a scaled fixed-point representation. Give a convincing motivation for your answer.
2
2. (8 POINTS) Three processors P1 , P2 , P3 with their individual write-back, write-allocate caches are connected via a bus with a shared memory. (a) In the initial state memory location x has the value 2 and the caches are empty and in invalid state. Given is the following sequence of operations: i. ii. iii. iv. v.
P3 P1 P2 P2 P1
reads location x. writes 3 into location x. reads location x. writes 6 into location x. writes 5 into location x.
Give the bus action, the state of the cache controller (if a protocol is used) and the contents of the caches and the memory (x) after each step, if the MSI write-back invalidation protocol from Figure 1 is used. Assume that a Flush updates also the memory. PrWr/− PrRd/−
M BusRd/Flush PrWr/BusRdX
S
PrWr/BusRdX PrRd/BusRd
BusRdX/Flush BusRdX/−
I
Figure 1: MSI protocol Write your solution in a form as in Table 1. Processor Action Initial 1. P1 reads x ...
Bus Action
State P1 I
State P2 I
State P3 I
Cache P1 -
Cache P2 -
Cache P3 -
Memory x 2
Data supplied by -
Table 1: Format for solution of Task 2 (b) Develop a state diagram for a cache coherence protocol that assumes that all processors have write-through, write-allocate caches. (c) Where is the directory in a directory-based cache coherence protocol located? What kind of information is stored in this directory?
3
3. (8 POINTS) Figure 2 shows the toplogy of a direct network. A packet needs 1 cycle to travel from node to node. All connections between the nodes are bidirectional channels, where each channel has a bandwith b.
50
51
40
41
30
31
20
21
10
11
00
01
Figure 2: Topology
(a) Determine the bisection bandwidth of the network. (b) Give the diameter of the network. (c) Assuming perfect routing and perfect balancing determine the ideal throughput of the network for uniform random traffic (each node sends to each other node including itself with the same probability). (d) Assume a traffic pattern, where each node sends packets only to the node within a distance of two hops to the ”north”, i.e. node 00 sends to 20, node 01 to node 21, . . . , node 40 to 00, node 41 to 01, node 50 to 10, and node 51 to 11. Propose an oblivious routing algorithm that allows a good utilization of the channels in the network. What is the maximum throughput you can achieve with that algorithm? You can assume perfect flow control!
4
4. (8 POINTS) The function w(x, y, z) = f4 (f2 (f1 (x, y)), f3 (z)) which is composed by several sub-functions shall be implemented in such a way that the total execution time is not more than 800 ns. Before execution x, y and z are stored in memory and also the result w has to be stored in memory. Each memory access (Load and Store) takes 50 ns. The function can either be implemented on a single processor or on a combination of a processor and a hardware accelerator (Figure 3). Proc Mem
Mem
Proc
Single Processor
Acc
Processor and Accelerator
Figure 3: Implementation Alternatives Table 2 shows the execution time for different processors depending on the processor speed S, where the speed S can have one of the following values: 0.67, 1.0, 1.5, 2.0, 2.5. Function f1 f2 f3 f4
Processor 200/S 400/S 400/S 200/S
Accelerator 100 200 200 not available
Table 2: Execution Times The cost of a processor is 200 ∗ S 2 and of the accelerator 200. Each processor and accelerator has a sufficient number of registers. A register access is assumed to take zero time. (a) Give the single processor solution that satisfies the requirements at minimal cost. (b) Give the solution that makes use of processor and accelerator, which satisfies the requirements at minimal cost.
5