Hochschule Kempten      
Fakultät Elektrotechnik      
Microelectronics       Fachgebiet Elektronik, Prof. Vollrath      

Microelectronics

10 Area, uP, Register, Scan cell

Prof. Dr. Jörg Vollrath



09 VHDL, FPGA and Delay


Video of lecture 11 (9.06.2021)


Länge: 1:30:00
0:0:15 Review Delay, Today: Registers, test

0:3:0 D-Flip-Flop

0:5:10 Function of a D-Flip-Flop

0:11:45 positive edge, rising clock triggered D-Flip-Flop

0:13:50 Setup and hold time

0:18:55 Slew rate measurement

0:22:15 Clock synchronous operation minimum cycle time

0:25:55 Measuring, simulating setup and hold time

0:29:25 Shmoo

0:34:0 Negativ setup time with delay on clk

0:34:58 Finite state machine

0:38:3 state transition table

0:38:32 State transition table

0:3:20 VHDL code

0:7:19 Test

0:12:50 Truth table and test

0:16:16 NAND Gate simulation with error

0:20:30 Pure logic test

0:24:0 Number of test vectors

0:26:10 Scann cell D-Flip-Flop

0:29:50 Scan cell in Tiny FPGA

0:32:7 Textual scan cell specification

0:34:0 State table

0:36:0 Scan cell schematic

Overview

Review:

Today:

Microelectronics System Design

Boundary conditions


Strategy


Use a different technology and perhaps wait for new choices according to the roadmap.
Reviewing microelectronics system design identifies the most important boundary conditions and the most successfull strategies for the task.

Area estimation

2 λ = F (feature size)
Typical layout densities
Element Area
random logic (2-level metal) 1000-1500 λ2 / transistor
datapath 250-750 λ2 / transistor
SRAM 1000 λ2 / bit
DRAM (in a DRAM process) 100 λ2 / bit
ROM 100 λ2 / bit
Source: CMOS VSLI Design, Weste, Harris, Addison Wesley
Typical examples with synthesis in Electric?
muddlib.jelib; sclib.jelib;

MIPS R3000A

32-bit 2nd generation commercial processor (1988)
Led by John Hennessy (Stanford, MIPS Founder)
external L1 Caches
1.2 µm process
115 K Transistor
20..33.3 MHz
48 mm2 die
145 I/O Pins
VDD = 5 V
4 Watts
SGI Workstations
Wikipedia
MIPS R3000A die

Book: The Pentium Chronicles

Robert Colwell, John Wiley & Sons, 32.99 Euro

A landmark chip like the P6 or Pentium 4 doesn't just happen. It takes a confluence of brilliant minds, dedication for beyond the ordinary, and management that nurtures the vision while keeping a firm hand on the project tiller.

As chief architect of the P6, Robert Colwell offers a unique perspective as he unfolds the saga of a project that ballooned from a few architects to hundreds of engineers, many just out of school.

Intel processor

Intel 4004 CPU
Intel 4004
1971
2300 Tr.
4-Bit
500kHz
10 µm
Intel 8086 CPU Die
Intel 8086
1978
29 kTr.
16-Bit
8 Mhz
3.2 µm
33 mm2
Intel A80386DX-20 CPU Die Image Intel 386
1985
275 kTr.
32-Bit
33 MHz
1.5 µm
Intel Pentium P54C die Intel Pentium
1993
3.1 MTr.
32-Bit
300 MHz
0.8µm
294mm2
Memory looks regular, logic has a random pattern in the chip top view pictures.
Over time more and more area was used for memory.
Due to miniaturization and minimum voltage limit (1V) the power density limits the maximum opeating speed.
Since there is a Vth distribution over the chip and wafer it is difficult to realize a Vth < 300mV. Therefore minimum Vdd = 1V.
Latest integrated circuits: more cores, just copy a block a couple of times and add glue logic

Microelectronics Limits

  • Feature size: Wave length of light source, a few nm
  • Feature size: Number of atoms needed for creating n, p regions with doping (100..1000), a few nm
  • Supply voltage: dielectric break down and statistic Vth variation VDD = 1 V
  • Power density: clock rate limit 3 GHz

D-Flip-Flop


This circuit shows a positive edge triggered D-Flip-Flop (DFF).
Data of input D is latched at a rising clock edge (CLK) and Qs and Qb are holding this data until the next rising clock edge.
This is the only relevant flip flop used in digital circuits.
A reset input and testability inputs can be added as shown later.

Setup and Hold Time



Important timing parameters are setup and hold time of a DFF.
Setup time is needed to transfer the input level to the internal inverters.
Setup time and propagation delay are limiting the maximum clock frequency.

Setup and Hold Time, SPICE Simulation

Vary positive CLK edge in regard to input data
Not enough setup time, bad output

Minimum setup time
Register delay from rising clock to output

Shmoo plot
VDD
1.2V  |xxxxx......|
1.1V  |xxxxx......|
1.0V  |xxxxx......|
0.9V  |xxxxx......|
0.8V  |xxxxx......|
    -0.5n        0.5n
tsetup
X: Fail 
.: Pass

Simulation waveform DFF_test.asc

Medwedew Finite State Machine

Maximum clock frequency:
tCLK > tpdLogic + tsetup + tpdRegister



There are also Mealy and Moore state machines, which make testing more complicated

State diagram, state transition table



State Diagram
  • Each state is drawn as a circle
  • Each transition is shown as an arrow
    If a signal is involved the name and state of the signal is shown.
State Transition Table
  • Old State, input signals, new state
SnUDSn+1
00010
01000
10001
11010
00101
01110
10100
11100

VHDL state machine entity

entity  STATEM is
  port ( RESET: in STD_LOGIC,
         CLK  : in STD_LOGIC, 
         X    : in STD_LOGIC, 
         R    : out STD_LOGIC_VECTOR(2 downto 0));	  
end STATEM;

FPGA: no reset but initialization: Q := '0';
ASIC: reset

VHDL state machine architecture

architecture BEHAVE of STATEM  is
signal RINT: std_logic_vector(2 downto 0) := "000"; -- FPGA
signal state: std_logic_vector(3 downto 0);
begin
state <= X & RINT;
SYN_STATE:process (CLK, RESET) 
begin
   if RESET = '1' then RINT <= "000";  -- ASIC
   elsif CLK='1' and CLK'event then    -- positive edge
	case state is                      -- State table
		when "0000" => RINT <= "001";
		when "1001" => RINT <= "010";
		when "0100" => RINT <= "000";
		when others => RINT <= "000";
	end case;
    end if;			                   -- incomplete if
end process SYN_STATE;
R <= RINT;                             -- connect output
end BEHAVE;

Overview: Test


Half of cost of integrated circuits are spent on test.

Truth table and test

Inputs Output
B A Y
0 0 1
0 1 1
1 0 1
1 1 0

Each line in the truth table is one column, a certain time in simulation or measurement.
All possible combinations have to be tested to prove correct function.
Due to propagation delay output signals are shifted to the right.
Expected output signal is compared to real output signal at certain times (arrows).

NAND Gate Simulation with Error

Observability of errors requires real voltage sources (Rser).
Missing wires, connections and high ohmic connections are simulated.
Bad behaviour can be seen. The analog output value has to be converted to '0' or '1' using mid level.
It can also be seen, that over time output level is sometimes changing.
This can be a delay fault or a memory effect.
If there is a switch and a capacitor available memories are created.
It is more difficult to test memories.

Pure Logic Test

  • Static Test: Functional Test (Truth Table)
    • All input combinations give the correct output
    • N-inputs, m outputs: 2N tests
  • Dynamic Test: Delay Test
    • All transitions can occur and have a different delay
Transitions
N N+1
0000
0001
0010
0011
0100
0101
0110
0111
Transitions
N N+1
1000
1001
1010
1011
1100
1101
1110
1111

State transition table

InputOld State (n)Output, new state (n+1)
UDSnSn+1
00010
00100
01001
01110
10000
10110
11000
11100

Test with Latches and Registers

Each latch acts as an input for logic circuits
Each internal latch can only be set indirectly with logic functions

Scannable D-Flip-Flop

Additional pins:
SE/TE:scan/test enable
SDI/TDI: scan/test data in
SDO/TDO: scan/test data out

Shift register to set and read register

Scan cell specification


Input, Output, States

Text


During test test data is written with a rising clock edge into the latch.
Any time output can be reset with an extra signal.
Any time during normal operation clock signal can be disabled with an extra signal.

Text with signal names and states


During test (TE='1') test data (TDI) is written with a rising clock (CLK) edge into the latch (Q(n+1)=TDI(n), Qb(n+1)=!TDI(n), TDO(n+1)=TDI(n)).
Any time output (Q, Qb) can be reset ('0') with an extra signal (RSTb).
Any time during normal operation (TE='0') clock signal (CLK) can be disabled with an extra signal (CE).

State table

Scan Flip Flop: State table

Positive edge triggered flip flop.
CLK not in state table.
Asynchronous reset: RSTb is '1' for normal operation.
TE disables CE and RSTb operation.
TERSTbCEDTDI QnQbnTDOn Comment
00XXX 010 Normal operation, reset to '0'
010XX Qn-1Qbn-1TDOn-1 Normal operation, CE='0' disables clock
0110X 010 Normal operation, Qn = D
0111X 101
10XXX TDI/TDITDI Test, TDOn = TDI
11XX0 010
11XX1 101

Serial test data is loaded into all FFs with TE="1" and CLK toggling.
Then logic is evaluated with TE="0" and a rising CLK edge.
Results are serial shifted out with TE="1" and CLK toggling.
Since Q is toggling during serial shift operation, no worst case timing behaviour is measured.


Scan Flip Flop: Schematic

This cell is included in sclib.jelib as Scan_cell.
Implementation of CE is difficult. It is assumed that CE is only going low during CLK high.
CDL affects propagation delay.
CDL=1p, 36ns CLK to Q; 57ns RSTb to Q;
CDL=0.1p 6ns CLK to Q
CDL=0.01p: 3ns CLK to Q

Scan Flip Flop: Simulation

800ns simulation time with CLK cycle 20ns gives 40 test vectors.
Nr CE TE TDI D RSTb xTDO xQ xQB
 0  0  0   1 0 1    1    1  0   Start 6x
 6  0  0   1 1 1    1    1  0   Keep state 
 7  0  0   1 0 1    1    1  0   Keep state
 8  0  1   1 1 1    1    1  0   TE active   

LTSPICE simulation code:
.global VDD
.option TEMP 90
VDD VDD 0 DC 1
VCLK CLK 0 PULSE(0 1 0n 1n 1n 8n 20n)
VCE CE 0 PULSE(0 1 300n 1n 1n 380n 600n)
VCLR RSTb 0 PULSE(1 0 705n 1n 1n 8n 2800n)
VD D 0 PULSE(0 1 117.2n 1n 1n 18n 40.05n)
VTDI TDI 0 PULSE(1 0 185n 1n 1n 38n 400n)
VTE TE 0 PULSE(0 1 165n 1n 1n 108n 2800n)
.include cmosedu_models.txt
.tran 0 800n

Each row in the state table is a CLK cycle column in the simulation.
Simulation starts with CE low.
Betwenn 166 ns and 274 ns TE operation is checked.
At 300 ns CE goes active. D is transfered into the D-FF.
At 705 ns reset is applied and the output Q goes to 0.

Is the testing complete?
Is the circuit working correctly?

Test coverage:
25 = 32 input combinations (vectors).
TE,RSTb,CE,D,TDI: 010X1, 110XX, 011XX, 00001
11 combinations tested
TC = 11/32 = 34.3%

JavsScript generating test vectors: TestJS.html

Scan Flip Flop: Schematic improved CE

This cell is not yet included in sclib.jelib .
CDL affects propagation delay.

Scan chain operation

Initial data can be scanned in with SE/TE and CLK
SE/TE is disabled and for one CLK cycle normal operation is done
Resulting data is scanned out using SE/TE and CLK

Application: Tiny FPGA

How many test vectors are needed?
How many subcircuits to test?
How to document test vectors?
This was a laboratory task.
This circuit demonstrates an FPGA with 2 inputs (I0, I1), lookup table LUT2, switches (MUX) and outputs O0, O1.
Internal output signals Y can be fed back with Y0, Y1 to the input MUXes for state machines.
The TE, TDI, TDO interface configures MUX and LUT2 circuits.

Dynamic Scan Flip Flop: Schematic

A separate path for test data is needed with a separate clock.
Test data is loaded serially, while normal data is still kept at the output Q and Qb.
Disabling TE and applying a clock transfers test data to the Q output and then through logic to the next FF.
This makes timing measurements possible.

Summary

Next: 11 Memories