1
Digital System Design with PLDs and FPGAs Field Programmable Gate Arrays Kuruvilla Varghese DESE Indian Institute of Science Kuruvilla Varghese
Topics • • • • • • • •
2
FPGA Architecture (Xilinx, Altera, Actel) FPGA related Design issues FPGA related Timing issues Tool Flow FPGA Configuration SoPC Debugging Case Study Kuruvilla Varghese
1
Field Programmable Gate Arrays
3
• ASIC, MPGA/Standard Cell, FPGA • Volumes, NRE cost, Turn around time • Array of logic resources with programmable interconnection. • Logic resources (Combinational, Flip flops) • Combinational: LUT, Multiplexers, Gates • Programmable interconnections: SRAM, Flash, Anti-fuse • Special Resources: PLL/DLL, RAMs, FIFOs, • Memory Controllers, Network Interfaces, Processors Kuruvilla Varghese
Commercial FPGA’s
4
• Xilinx – Spartan-3, Spartan-6 – Virtex-4, Virtex-5, Virtex-6 – Artix-7, Kintex-7, Virtex-7, Zynq
• Altera – Cyclone, Cyclone II, Cyclone III, Cyclone IV, Cyclone V – Arria II, Arria V – Stratix II, Stratix III, Startix IV, Startix V Kuruvilla Varghese
2
Commercial FPGA’s
5
• Actel – Axcelerator (Antifuse) – IGLOO, IGLOOE (Flash) – ProASIC Plus (Flash) – ProASIC3, ProASIC3E (Flash) – RTAX (Radiation Tolerant, Anti-fuse) – RTSX -SU (Radiation Tolerant, Anti-fuse) – Smart Fusion, Smart Fusion 2 (ARM Cortex – M3) Kuruvilla Varghese
Structure of an FPGA
6
Kuruvilla Varghese
3
Structure of an FPGA
7
Kuruvilla Varghese
Source: Xilinx Data Sheets
Detailed View CLB
8
CLB
SB
CLB
CLB
SB
CLB
SB
CLB
Kuruvilla Varghese
4
Switch Block
9
Kuruvilla Varghese
Types of switch blocks
10
Kuruvilla Varghese
5
FPGA
11
Kuruvilla Varghese
FPGA
Source: Xilinx Data Sheets
12
• I/O Blocks (Tri-state output / Input, Synchronizing Flipflops) • Array of Configurable Logic Blocks • Horizontal and Vertical wires with programmable switches in between • Single length, Double length, Quad, Hex and Long lines • Resources available to user • Resources for configuring programmable switches in the interconnect structures and Logic blocks Kuruvilla Varghese
6
Programmable Connections
13
• SRAM (Pass Transistor) • Flash • Antifuse
Kuruvilla Varghese
SRAM (Pass Transistor)
Kuruvilla Varghese
14
Source: Xilinx Data Sheets
7
Pass Transistor with configuration cell
15
Flip-Flop
Write Transistor
Pass Transistor
• • • •
Flip-Flop to store the switch status (4 Transistors) Write Transistor to write Configuration status Total: 6 Transistors FFs controlling the Switches are organized as SRAM hence the name Kuruvilla Varghese
Flash Transistor
16
• MOS transistor with a floating gate • Conducts when not programmed off • Can be electrically programmed ‘off’ or ‘on’
Kuruvilla Varghese
8
Flash Transistor
17
Kuruvilla Varghese
Flash Cell Write
18
Kuruvilla Varghese
9
Flash Cell Erase
19
Kuruvilla Varghese
Anti-fuse
20
Kuruvilla Varghese
10
Programmable Connections
21
Name
Volatile
Re-programmable
Delay
Area
Flash
No
In-circuit
Large
Medium
SRAM
Yes
In-circuit
Large
Large
Anti-fuse
No
No
Small
Small
Kuruvilla Varghese
Logic Block size
22
• Coarse grain – Owing to SRAM interconnection area (6 transistors) the Logic Blocks are made large in SRAM based FPGA – Utilization is made high with configurability within the logic block
• Fine Grain – Since the antifuse occupies less area and has less time delay, antifuse based FPGA’s employs smaller size logic blocks
Kuruvilla Varghese
11
Logic Cell Structure – Coarse Grain
Kuruvilla Varghese
Logic Cell Structure – Fine Grain
23
Source: Xilinx Data Sheets
24
Kuruvilla Varghese
12
Design Methodology
25 HDL Source
Functional Simulation
Synthesis
Logic Simulation
Equations/Netlists
Static Timing Analysis
PAR/Fitting
Constraints
Configuration File
Timing Model
Timing Simulation
Programming Kuruvilla Varghese
Structure of an FPGA
26
Kuruvilla Varghese
Source: Xilinx Data Sheets
13
Commercial Tools
27
• Simulators – ModelSim (Mentor Graphics) – Active HDL (Aldec)
• Synthesis Tools – Synplify Pro (Synopsys) – Precision Synthesis (Mentor Graphics)
• Vendor Tools – – – –
Xilinx ISE (Synthesis, Simulation, PAR, Programming, …) Xilinx Vivado (Synthesis, Simulation, PAR, Programming, …) Altera Quartus II (Synthesis, Simulation, PAR, Programming, …) Actel Libero (Synthesis, Simulation, PAR, Programming, …) Kuruvilla Varghese
Commercial Tools
28
• Cadence Suite • Synopsis Suite • Mentor Graphics Suite
Kuruvilla Varghese
14
Xilinx Virtex FPGA • • • • • • • • • • •
29
SRAM based programmable connections, configuration LUT based combinational Logic Flip-Flops with sync/async reset/preset Large Configurable Logic Cells (CLBs) Block RAM (SPRAM, DPRAM, FIFO) LUT as Distributed RAM Low skew clock trees, DLL, Tri-state gates for Buses Carry Chains / Cascade Chains JTAG, Serial, and Parallel Configuration schemes I/O Blocks (Registered / Non-registered) Multiple I/O standards Kuruvilla Varghese
Xilinx Virtex FPGA
30
Present day FPGAs use PLL instead of DLL and has DSP blocks for fixed point arithmetic
Kuruvilla Varghese
Source: Xilinx Data Sheets
15
Virtex CLB
31
Kuruvilla Varghese
Source: Xilinx Data Sheets
LUT
32
X
A1
Y
A0
00 01 10 11
0 1 1 0 D0
• • • •
X XOR Y
Address lines as inputs, data line as output (read mode) Truth table written during configuration (write) 4 input, 6 input LUTs Fixed AND, Programmable OR Kuruvilla Varghese
16
FPGA Configuration / Programming
33
• Writing to configuration memory • Configuring options in Logic blocks – – – – –
Writing LUTs with truth tables Combining LUTs, Using LUTs as memory Selecting clocks, Set/Reset for FFs Configuring Various Muxes in Slices
• Using special resources (RAM, FIFOs, PLLs) • Programming Switch matrices • Programming I/O blocks Kuruvilla Varghese
Virtex Family
34
Kuruvilla Varghese
Source: Xilinx Data Sheets
17
Important Specifications
35
• CLB Array, Block RAM Bits • User I/O, Differential I/O • Distributed RAM Bits can be calculated from number of CLBs (multiply by 4 x 64) • System gates and logic gates are not useful, as these are equivalent gate counts, it is useless to compare across vendors
Kuruvilla Varghese
Structure of an FPGA
36
Kuruvilla Varghese
18
Virtex CLB
37
Kuruvilla Varghese
Kuruvilla Varghese
Source: Xilinx Data Sheets
Source: Xilinx Data Sheets
19
4 input LUT and Flip-Flops I3 I2 I1 I0
I3 I2 I1 I0
39 S
O
D
Q CK AR
S
O
D
Q CK AR
• Use LUT and FFs independently • Use LUT followed with FFs Kuruvilla Varghese
4 input LUT and Flip-Flops
40
• Independent LUT Outputs: X, Y • Dedicated inputs to FF: BX, BY
Kuruvilla Varghese
20
5 input LUT
41 I3 I2 I1 I0
O
F5 I3 I2 I1 I0
O
I4
• Two 4 input LUTs are Muxed for 5 input LUT using F5 Mux. Select line is connected to BX and hence cannot use bottom FF independently. F5 Mux output is connected to this FF. Kuruvilla Varghese
6 Input LUT I3 I2 I1 I0
42 I3 I2 I1 I0
O
O
F5 I3 I2 I1 I0 I4
F5 I3 I2 I1 I0
O
O
F6
I4
I5
• Two 5 inputs are Muxed using F6 for a 6 input LUT. Select line is connected to BY and hence cannot use top FF independently. F6 Mux output is connected to this FF. Kuruvilla Varghese
21
Cascading LUTs
43
• 5 inputs and 6 inputs LUT using F5 and F6 muxes are required in most general case, considering all possible minterms • But in a specific case of 6 input LUT can be implemented using cascade of two LUTs
Kuruvilla Varghese
6 inputs using 2 LUTs
44
Y = ABCDE or ABCDF Y = (ABCD) and (E or F) ABCD = X Y = X and E or F X A B C D
E F
Truth Table X or E or F
Truth Table ABCD Kuruvilla Varghese
22
5 inputs using 2 cascaded LUTs
45
Y = ABCDE Y = (ABCD) and E ABCD = X Y = X and E X A B C D
E
Truth Table ABCD
Truth Table X and E Kuruvilla Varghese
5 inputs using 2 cascaded LUTs Y = ABCDE or AB/CDE/ Y = (ABCD) and E ABCD = X Y = X and E A B C D
46
X
E
Truth Table ABCD
Truth Table X and E
Kuruvilla Varghese
23
5 inputs using 5 input LUT
47
Y = ABCD xor E ABCD = Z Y = ZE/ and Z/E A B C D
I3 I2 I1 I0
A B C D
I3 I2 I1 I0
O
F5
Y
O
E
Kuruvilla Varghese
Virtex CLB: LUT
48
• • • • •
LUT and FF can be used separately or together 4, 4 inputs LUTs 5 inputs LUT from two 4 inputs LUTs using F5 Mux 6 inputs LUT from two 5 inputs LUTs through F6 Mux Four 4 inputs LUTs / Two 5 inputs LUTs / One 6 inputs LUT • FF: Sync/Async Set-Reset, Clock Enable – Since, both set and reset is available. Registers can be initialized to any value, without extra overhead. Kuruvilla Varghese
24
LUT as RAM
49 I3 I2 I1 I0
O
LUT RAM Write
• General routing lines can be used to write LUT through the LUT RAM write control circuit to use LUT as Distributed RAM
Kuruvilla Varghese
Virtex CLB: LUT as distributed RAM
50
• LUT is written while configuring FPGA, when used for logic implementation. • Write control signals are available to be connected to routing wires so that it can be used a s RAM when it is not used for logic implementation. • Four 16x1 distributed RAM per CLB • These can be combined to make various memory sizes and data widths. • Since it is spread across CLBs, it is called Distributed RAM • Since, it is spread across, access latency can vary and should be careful, if you use it without read registering.
Kuruvilla Varghese
25
Carry Chain
51
• Adder S i = Ai ⊕ B i ⊕ C i C i +1 = Ai B i + ( Ai ⊕ B i )C i
• Requires two lookup tables (Si and Ci+1) at each stage. • This along with routing makes adder big and slow • Hence dedicated carry chain to make adder faster, implementing Ci+1. Kuruvilla Varghese
Carry Chain
52 Ci+1 0
1
LUT Ai Bi
⊕ Si Ci
C i +1 = A i B i + ( Ai ⊕ B i ) C i Kuruvilla Varghese
26
Kuruvilla Varghese
Carry Chain
Source: Xilinx Data Sheets
54
• For adders use the operator ‘+’ to be able to use carry chains. • For higher level functions like counters etc; synthesis tool infer and use carry chains. • The AND gate combining Ai and Bi shown in Slice diagram is for partial product generation in multipliers • In some FPGAs, carry chain has features to cascade (AND/OR) the LUT outputs.
Kuruvilla Varghese
27
Control of Sequential Circuits
FSM / Controller
en (RA_L)
55
Reg / Counter / …
clk
Kuruvilla Varghese
Clock Gating
56
D
D7:0
Q RA_E
RA_L
CLK’
CK
CLK
CLK RA-L CLK’
Kuruvilla Varghese
28
Re-circulating Multiplexer
57
0 D
D7:0
Q
1 RA_E
RA_L CLK
CK
CLK RA-L Register write on the clock edge
Kuruvilla Varghese
Re-circulating Multiplexer 0 1
58
D D
Q
Q
CE CK CK
if (clk’event and clk = ‘1’) then if (cntrl_sig = ‘1’) then q <= d; end if; end if; Kuruvilla Varghese
29
Clock Gating for low power
59
D RA_L
D
Q
D7:0
Q
CLK1 CLK2
CK
RA_E
CK CLK CLK RA-L CLK1 CLK2
Kuruvilla Varghese
Combinational Circuit Mapping
60
Comb
One or More LUTS
Kuruvilla Varghese
30
Sequential Circuit Mapping
61
One or more Flip Flops
FF
Comb
FF
One or More LUTS
Kuruvilla Varghese
Counter, FSM Mapping
62
One or more LUTs
NSL
FF
OL
One or More FFs
Kuruvilla Varghese
31
Virtex IOB
63
Kuruvilla Varghese
Source: Xilinx Data Sheets
Virtex IOB
64
• • • •
Three paths: Output, Input, Tri-state enable Direct or Through Flip-flops (synchronization) Flip-Flops: Set/reset, Clock enable, Clock selection Programmable delay at input to make hold time zero (not an issue once registered at IOB, as tcq > th) • Programmable pull-up, pull down. Hold, slew rate • PAR tool may move some of the input/output registers to IOB
Kuruvilla Varghese
32
Virtex IOB
65
• Various IO standards – – – – –
LVTTL LVCMOS33, LVCMOS25 LVCMOS18, LVCOMS15, LVCMOS12 … PCI33, PCI66 …
• Some IO standards require a Reference voltage for Inputs • Banks of I/O pins support some of the IO standards
Kuruvilla Varghese
Week keeper (Hold)
66
Bus
• Hold circuit hold the previous state of the bus, but provides a weak drive so that it could be driven to ‘0’ or ‘1’. • This avoids unnecessary switching of inputs by noise, if the bus would have been left in high impedance.
Kuruvilla Varghese
33
Detailed View CLB
67
CLB
SB
CLB
CLB
SB
CLB
SB
CLB
Kuruvilla Varghese
Virtex Routing
68
Kuruvilla Varghese
Source: Xilinx Data Sheets
34
Virtex Routing • • • • •
69
Direct connection to adjacent CLB 24 single length lines (per GRM in each direction 72 buffered Hex lines (per 6th GRM in each direction) 12 buffered long lines (horizontal & vertical) 4 tri-state lines (horizontal & vertical)
Kuruvilla Varghese
Bus Lines
70
• For Busing and Multiplexing it is better to use tri-state gates than multiplexers
Kuruvilla Varghese
Source: Xilinx Data Sheets
35
Fitting Example: FSM
71
• FSM, with 2 inputs, 3 states, and 2 Mealy outputs. How many CLBs to fit in? – – – – – – –
State Variables: 2 flip-flops (3 states) NSL: 2 state variables + 2 inputs = 4 inputs OL: 2 Inputs + 2 state variables = 4 inputs 2 LUTs for NSL 2 FFs for state variables, 2 LUTs for OL This requires 1 CLB minus two FFs In fact if output is registered still it can be accommodated in one CLB Kuruvilla Varghese
CLBs, FSM
72
NSL
FF
Kuruvilla Varghese
OL
Source: Xilinx Data Sheets
36
Fitting Example: Counter
73
• 8 bit up counter with parallel load feature – State Variables: 8 Flip-flops – Incrementer uses carry chain – NSL: 1 state variables + load + 1 din = 3 inputs per state variable – NSL requires 8 LUTs – This requires 2 CLBs ( 4 Slices)
Kuruvilla Varghese
CLB, Counter
74
+1
FF
Kuruvilla Varghese
37
Signal Paths in CLB
75
library ieee; use ieee.std_logic_1164.all; entity test is port (a, b, c, d, e, f, g, h: in std_logic; z: out std_logic); end entity test; architecture arch_test of test is begin
Kuruvilla Varghese
Signal Paths in CLB
76
process (a, b) begin if (a = '1') then z <= '0'; elsif (b'event and b = '1') then if (c = '1') then z <= (d and e and f and g) xor h; end if; end if; end process; end arch_test; Kuruvilla Varghese
38
d e f g
z
h d e f g
a b c Kuruvilla Varghese
Virtex DPRAM
78
Kuruvilla Varghese
Source: Xilinx Data Sheets
39
Virtex DPRAM
79
• • • • • •
True Dual port Memory Each port can be read/write, read or write Synchronous reads and writes Can be combined for larger widths and depths Instantiated through Core Generator Tool Conflict on simultaneous read/write to a location, read data could be wrong • Can be initialized in VHDL code Kuruvilla Varghese
Metastability
80 D
Q
ts: Setup time: Minimum time input must be valid before the active clock edge
CLK
th: Hold time: Minimum time input must be valid after the active clock edge
CLK
D
ts
tco: Propagation delay for input to appear at the output from active clock edge
th
Q
tco
Kuruvilla Varghese
40
Minimum Clock period
81
Data path D
Q
D
Comb
CLK
Q CLK
clk
tclk > tco + tcomb + tsetup tco(min) + tcomb(min) > th(max) Here we are considering the data path from first flip-flop to the next. We Are estimating the minimum clock period for proper latching of data on to second flip-flop Kuruvilla Varghese
Minimum Clock period
82
• Sequential Circuit / FSM Outputs Inputs Comb
D NS
PS
CK
Q AR
Clock Reset
tclk > tco + tcomb + tsetup tco(min) + tcomb(min) > th(max) Kuruvilla Varghese
41
Clock skew
83
• Previous analysis assumes that the clock reaches at flip flops at the same time, it is not practically true, as the wire delay and buffer delay gets added. • This creates relative delays between pair of flip flops or registers • For analysis it is important to consider the clock skew between flip-flops/registers where there is a data path between them. • Clock Skew: – Difference in arrival time of the clock at the flip flops
Kuruvilla Varghese
Max Path and Min Path CHIP
84 Min Path
clock Max Path
Kuruvilla Varghese
42
Clock Skew: Max path D
85
Q
D
Q
Comb CLK1
CLK2
clk
tclk – tskew > tcomax + tcombmax + tsetup
tclk CLK1
tco
tclk > tcomax + tcombmax + tsetup + tskew tcomb
ts
tskew
slack = tclk – (tcomax + tcombmax + tsetup + tskew)
CLK2 slack
Kuruvilla Varghese
Clock Skew: Max path
86
• Analysis for data path from first flip-flop to next • We assume tco + tcomb is greater than the hold time of flip-flop • Hence, when a clock edge comes to both the flip-flops, new data from first flip-flop arrives at the second flip-flop after the clock edge, even after the hold time and won’t get latched in second flip-flop • But, we estimate the clock period such that when the next clock edge comes to second flip-flop, data from the first flip-flop due to current clock edge get latched in the second flip-flop Kuruvilla Varghese
43
Clock Skew: Max path
87
• Since, the clock to the second flip-flop is skewed or comes early compared to first, clock period has to accommodate this skew, requiring a larger clock period than the case where there would have been no skew
Kuruvilla Varghese
Clock Skew: Min path D
Q
88 D
Q
Comb CLK1
CLK2
clk
• Same edge tcomin + tcombmin > tskewmax + thold
tclk CLK1
tco
tcomb
• Next edge tclk > tco + tcomb + tsetup - tskew
CLK2
th tskew
tskew Kuruvilla Varghese
44
Clock Skew: Min path
89
• Here, an analysis like the case in max path (i.e. from one clock edge at first flip-flop to next clock edge on second flip-flop) would result is a smaller clock period, as the clock edge arrives late on second flip-flop • But, now the real danger is the data from first flip-flop due to current edge, appearing in the hold time window of the current edge at the second flipflop • If that happens, solution is only to add extra delay to the data path between these flip-flops, or route the clock in opposite direction • Practically, this can happen in shift registers as there may not be combinational delay between flip-flops
Kuruvilla Varghese
Clock routing
90
• Requirement – Minimum relative delay between any 2 flip-flops, at least between flip flops where there is a datapath
• Solution – Balance the number of buffers and approximate the length of wire from clock input to the flip-flops – H Clock Tree
Kuruvilla Varghese
45
Virtex Clock Tree
91
Kuruvilla Varghese
Source: Xilinx Data Sheets
DLL
92 CLKOUT
CLKIN CLKI
CLKO
DLL delays CLKOUT by “tadd” that clock edges of both CLKIN and CLKOUT matches
CLKFB
CLKIN
CLKOUT
tskew
tadd Kuruvilla Varghese
46
DLL / PLL
93
• • • •
In a DLL, input clock is delayed for de-skew In a PLL, a VCO synthesizes a clock synchronous to the input clock DLL adjusts the phase of the input clock. PLL synthesizes the clock of same phase and frequency as that of the input clock. • PLL has the problem of working with a limited range of frequencies, but in FPGAs clock frequency may not change in most cases. • PLL also cleans up the input jitters. • Xilinx Virtex 5 has PLL blocks in addition to DLL in DCM. Kuruvilla Varghese
Current FPGAs
94
• PLL • Digital Clock Manager (DCM) – DLL for de-skewing – Phase shifter – Frequency multiplication / division
• Clock Buffers, Muxes (Glitchless) • All these can be connected in clock path – Clock pins, Clock tree Kuruvilla Varghese
47
Special Resources Usage
95
• Resources – – – –
Buffers DLL / PLL Block RAMs DSP Blocks
• Usage – Vendor library components – Inferred by synthesis tool, when possible – VHDL attributes with code Kuruvilla Varghese
Virtex Configuration • JTAG: Prototyping (PC • Master Serial:
96
Board)
– Configuring from a Serial PROM – Embedded boards
• Slave Serial – Works as a slave to master FPGA connected to a serial PROM
• SelectMAP – 8 /16 bit wide synchronous slave configuration of FPGA – Suitable for FPGA Interfaces to a CPU Kuruvilla Varghese
48
Virtex Configuration: Serial PROM
Kuruvilla Varghese
Serial Configuration • • • • • • • •
97
Source: Xilinx Data Sheets
98
Multiple FPGAs are configured from a single serial (Flash) PROM. Master FPGA supplies clock to PROM and slave FPGAs Master and slave FPGAs are daisy chained. After power on or after PROGRAM request, all FPGAs configuration memory is cleared. Init phase synchronization is done through INIT I/O pin Master FPGA programs first sending out ‘1’s on DOUT and slave FPGA waits. Once master FPGA is configured it sends configuration stream for first slave and so on. DONE synchronization is done through open drain output DONE, to form wired AND operation Kuruvilla Varghese
49
SelectMAP Scheme
99
Kuruvilla Varghese
Source: Xilinx Data Sheets
SelectMAP Configuration: Timing
Kuruvilla Varghese
100
Source: Xilinx Data Sheets
50
FPGA Controls while configuring
101
• While FPGA is being configured, its internal state is not defined and pins levels are also not defined. • Xilinx FPGA has two internal signals to keep the FPGA state sane during and after configuration. • GTS: This signal drives all FPGA outputs to tri-state • GSR: This signal goes to all flip flop set/reset and keeps all flip-flops set or reset as reset state specified. • Once FPGA is configured, these signals are released. • Use separate user resets, for normal reset operation. Kuruvilla Varghese
Spartan 6: Configuration
102
• Boundary Scan / JTAG / TAP / IEEE 1149.1 – Single Device, Chain
• Master Serial (Chain, Ganged) (SPI: x1, X2, X4) • Slave Serial (SPI: x1, X2, X4) • Master SelectMAP (x8, x16) – Single Device, Chain, Ganged
• Slave SelectMAP (x8, x16)
Kuruvilla Varghese
51
Spartan 6: Bit Stream encryption
103
• Bit steam is AES encrypted with 256 bit key using BitGen tool • Encryption key is programmed in to FPGA device through JTAG for decryption. • Once programmed FPGA can be configured for no read back • Configuration also can’t be read back. • AES key can be permanently fused in FPGA, Or in an SRAM with external battery backup Kuruvilla Varghese
Spartan 6: Bit Stream compression
104
• Bit steam can be compressed when there are lot of resources unused • Less memory for storage • Less configuration time
Kuruvilla Varghese
52
Spartan 6: Multi Boot
105
• Multiple Configuration Images in Program Flash • At least, one Main configuration and one fallback/golden configuration • During configuration, if CRC error of bit steam occurs, or sync word detection is timed out (WDT), configuration tries fall back configuration • Supported in SPI (x1, x2, x4) and BPI Modes
Kuruvilla Varghese
Spartan 6: DSP Slices • • • • •
106
Slices to support DSP computations 18 bit 2’s complement pre-adder 18 x 18 bit Multiplier, 36 bit result Result is sign extended to 48 bit 48 bit 2’s complement adder/subtracter
Kuruvilla Varghese
53
Spartan 6: DSP48A1Slice
Kuruvilla Varghese
107
Source: Xilinx Data Sheets
Debug: Internal Signal Probing
108
• • • • •
Probing the internal signals in FPGA for debug. Signal Probe / Logic Analysis Use a Signal Capture IP Interface this IP to the JTAG port PC based software to configure signal capture IP and display the signal waveforms • Xilinx: ChipScope Pro • Altera: Signal Probe Kuruvilla Varghese
54
Xilinx ChipScope Pro
109
Kuruvilla Varghese
Source: Xilinx Data Sheets
Virtex Pins
110
Kuruvilla Varghese
Source: Xilinx Data Sheets
55
Virtex Pins
111
Kuruvilla Varghese
Source: Xilinx Data Sheets
One hot encoding Inputs
Next State Logic
112 Outputs
NS D CK
Q
PS
Output Logic
AR
Clock Reset Outputs
tclk > tco + tlogic + tsetup
Inputs Logic
PS
D NS
CK
Q AR
Clock Reset
Kuruvilla Varghese
56
One hot encoding
113
• e.g. FSM with 5 inputs, 18 states, and 6 outputs • NSL: 5 + 5 = 10 inputs (worst case) • For Virtex (Worst Case) – Basic block: 4 input LUT – 1 CLB 6 input LUT – 16 CLB’s for 10 input LUT
• NSL would be distributed increasing the delay bringing down the clock frequency of FSM. • Solution: one hot encoding, where each state is encoded using a flip flop. Kuruvilla Varghese
One hot encoding
114 Si
condi condj Sj
Dj = condi . Qi + condj . Qj NSL: 5 + 2 inputs (Worst Case)
Kuruvilla Varghese
57
One-hot encoding Output logic
115
• Most Moore outputs are direct decode of a state or decode of more than one state • If output is a decode of a single state, then that state flip-flop output is the output signal • In case of multiple states produce an output, the output signal is the logical OR of all those state flip-flops • Thus, one-hot encoding reduces the output logic also, at the cost of extra state flip-flops Kuruvilla Varghese
One hot encoding
116
• State encoding – Sequential, gray, one-hot-one, one-hot-zero
• User defined attributes (state encoding) – attribute state-encoding of type-name: type is value; (sequential, gray, one-hot-one, one-hot-zero) attribute state_encoding of statetype: type is gray; – attribute enum_encoding of type-name: type is “string”; attribute enum_encoding of statetype: type is “00 01 11 10”; Kuruvilla Varghese
58
One-hot one, One-hot zero • One-hot one
117
• One-hot zero (Almost onehot)
00001 00010 00100 01000 10000
0000 0001 0010 0100 1000 • Easy to initialize (reset all flipflops • Starting state is never revisited Kuruvilla Varghese
One hot encoding
118
• Explicit declaration of states signal pr_state, nx_state: std_logic_vector(3 downto 0); constant a: std_logic_vector(3 downto 0) := “0001”; constant b: std_logic_vector(3 downto 0) := “0010”; constant c: std_logic_vector(3 downto 0) := “0100”; constant d: std_logic_vector(3 downto 0) := “1000”;
Kuruvilla Varghese
59
Altera Stratix • • • • • • • • • • • •
119
Two levels of interconnections SRAM based programmable connections Logic Array Block (10 LE’s) LUT as combinational Logic Flip-Flops with sync/async reset/preset RAM Block (SPRAM, DPRAM, FIFO) Low skew clock trees, PLL Carry, Cascade chains DSP Blocks (Multipliers, Shift Registers) I/O Blocks (Registered / Non-registered) Multiple I/O standards JTAG, Parallel, and Serial Configurations Kuruvilla Varghese
Altera Stratix
120
Kuruvilla Varghese
60
Altera Stratix
121
Kuruvilla Varghese
Source: Altera Data Sheets
Actel 54SX-A • • • • • •
122
Antifuse based programmable interconnections Simple Combinational and Registered cells Simple I/O Blocks Low skew Clock trees Muliple I/O standards Hardware probe pins
Kuruvilla Varghese
61
Actel 54SX-A, C Cell
123
Kuruvilla Varghese
Source: Actel Data Sheets
Actel 54SX-A, R Cell
124
Kuruvilla Varghese
Source: Actel Data Sheets
62
Actel 54SX-A
125
Kuruvilla Varghese
Source: Actel Data Sheets
Actel 54SX-A Routing
126
Kuruvilla Varghese
Source: Actel Data Sheets
63
Actel 54SX-A Probe
127
Kuruvilla Varghese
Source: Actel Data Sheets
Actel ProASIC Plus
128
Kuruvilla Varghese
Source: Actel Data Sheets
64
ProASIC Plus, Logic Tile
129
Kuruvilla Varghese
Source: Actel Data Sheets
Latch / FF
130 clk
Latch with Mux
1
D
Q
0
FF with Latches D
D
Q
C
D
Q
Q
C
CLK
Kuruvilla Varghese
65
ProASIC Plus Routing • • • • • • •
131
Fast Connect Short Lines (1, 2, 4), Long Lines Clock Tree Pad Ring (Pin Locking) SRAM Blocks Programming Tech: Flash Non-volatile
Kuruvilla Varghese
CPLD vs FPGA
132
Features Logic
CPLD AND-OR
FPGA Mux / LUT / Gates
Register to Logic ratio Timing Architecture Variation Programming Technology Capacity
Small
Large
Simple Small
Complex Large
Flash
SRAM, Anti-Fuse, Flash
10 K
2 M LUT + RAM
Kuruvilla Varghese
66
Static Timing Analysis (STA)
133
• Timing simulation: simulates the real time operation of the circuit, with timing models of blocks for the specified test vectors • Time consuming for exhaustive simulation • Static Timing Analysis, analyzes various path delay from Block and wire delays • Can make mistake as it is not aware of the real time behavior of the circuit (inputs, FSM/Controller behavior) • A path that is never used in circuit operation may be reported (False paths) • Registers which are not enabled every clock cycle may be reported (Multi-cycle paths) Kuruvilla Varghese
STA: Sequential Circuit Setup to clock
Input
134
Register to Register Path Clock to setup
D
Q
Comb
CK
Clock to output
D
Q
Output
CK
CLK
• •
Register to register path decides the clock frequency. But, if other 2 exceeds one need to choose the maximum value as the minimum clock period. In real life, this is not a great concern many a time we are designing some IPs which goes inside the chip interfaced to other blocks close by. Even in case inputs are outputs are brought to external pins, proper placement should take care of these delays. Kuruvilla Varghese
67
Static Timing Analysis: Sequential Circuit
135
• Clock to Setup: Register to register path with longest delay – Clock to Setup on destination clock
• Clock to Pad: FF output delay - from FF output to chip output pin – Clock to Pad
• Setup to Clock: Setup / Hold time of FF with respect to input pin/pad – Setup/Hold to clock Kuruvilla Varghese
Static Timing Analysis
136
• Take Maximum of the three to find the maximum clock frequency for timing simulation • But, the actual throughput is given by Clock to Setup: (Register to register path with longest delay) • In most cases, the Clock to Pad of a module is not of consequence, as these output when used in top level module goes as inputs to the nearby module.
Kuruvilla Varghese
68
False Paths
137
• Improbable Paths • Static Paths (e.g. Input Registers) • Paths between clock domains Kuruvilla Varghese
Multi-cycle path
D CE1
138
Q
D
Comb CE2
CE CK
Q
CE CK
clk
Clock Enable CE2 comes 3 clock cycles after CE1
Kuruvilla Varghese
69
Critical Path
139
FF1 D CE1
FF2 Q
CE CK
C1
D
C2 CE2
Q
CE CK
clk
Critical path delay = tCO + tC1 + tC2 + tS
Kuruvilla Varghese
Constraint driven PAR
140
• Constraint editor • I/O constraints – – – – – –
I/O locations I/O standards (LVTTL, PCI66-3, LVDS ..) Drive strength (current) Slew rate I/O termination (pull up, pull down, hold) Input delay Kuruvilla Varghese
70
Timing constraints
141
• Global – Clock period, pad to setup, clock to pad
• Per port – pad to setup, clock to pad
• Per group (by net and clock) – Pad to setup, Clock to pad – FROM – TO, FROM – THRU – TO
• False Paths • Multi-cycle paths Kuruvilla Varghese
71