Hochschule Kempten      
Fakultät Elektrotechnik      
Microelectronics       Fachgebiet Elektronik, Prof. Vollrath      

Microelectronics Laboratory

2026 FPU Investigation

Prof. Dr. Jörg Vollrath






Outline

Numbers and Number Formats



Minifloat


3-bit (1.1.1)

4-bit (1.2.1)

SignExponent Significand
Mantisse
S EM
SignExponent Significand
Mantisse
SE EM
.00.01.10.11
0...01infNan
1...-0-1-infNaN
.000.001.010.011 .100.101.110.111
0...00.511.5 23infNan
1...-0-0.5-1-1.5 -2-3-infNaN

Machine learning


Nvidia "fp8" format (1.5.2, E5M2)
Nvidia: FP8-E4M3 (1.4.3)
FP4-E2M1 (1.2.1)

Operations


Size of truth table:


Truth Table Operations 4-Bit




Floating point calculation


Add, subtract

Convert to fixed point (shift).
Add/subtract(add invert and cin="1").
Convert to floating point (shift).

Multiply

Add exponent
Multiply mantissa.
Adjust exponent (add) and shift mantissa.

Division

Multiply with max/x.
SignExponent Significand
Mantisse
S EE M

Add:
 SEEM
 +010 :   1       
++001 :   0.5
---Fixed point--------
 +01.0 :   1.0  
++00.1 :   0.5 
---Add---------------- 
 +01.1 :   1.5
-- Floating point ---- 
 SEEM
 +011  :   1.5

FPU Intel 8087 8-Bit


addition, subtraction, multiplication, divison, square root
exponential, logarithmic, trigonometric
50,000FLOPs
100..1000 cycles
65k transitors
CORDIC algorithm

Multi cycle serialization


Scalability for number of bits and time (pipeline possibility)

Arithmetic truth table implementation



Example full adder


3 Inputs: a,b,cin
2 outputs: s,cout

Standard cell: 28 Transistors per Bit

LUT/MUX: 2 * 24 transistors = 48 transistors

LUT2, MUX: 30 Transistors per Adder Bit

abcinscout
00000
00110
01010
01101
10010
10101
11001
11111

s = LUT3_69 = MUX2(LUT2_6,LUT2_9) = MUX2(NOT(LUT2_9),LUT2_9 ): 12 transistors
cout = LUT3_87 = MUX2(LUT2_8, LUT2_E) = MUX2(NOT(NAND(I0,I1)),NOT(NOR(I0,I1))): 18 transistors
Sum: 30 transistors


Truth table implementation


How many LUT2s and MUX2, MUX4 are needed for realization of a truth table?


Number of inputsSize of truth table LUT2INVMUX2MUX4Total Transistors
Number of transistors 362412Total Transistors
22^2=4100012
32^3=8211024
42^4=16420134
52^5=32833270
62^6=6416405104
72^7=128165110170
82^8=256166021300

2026 Objective


How many transistors are needed for a typical truth table?


4-Bit Floating point number: Size of truth table is 16
4-Bit Division max/x: Size of truth table is 16
Addder: Size of truth table is 256
Multiply: Size of truth table is 256

Implement 3 truth tables size 16 with LUTs and Verilog.
Make layout and simulate with LTSPICE.
(Expand to larger truth tables)

4-Bit Truth table Division


Map0 Divide0123456789101112131415Student
OriginalCode00.250.50.75123+infNaN-0.25-0.5-0.75-1-2-3-inf
Max/xCode+inf32110.50.250NaN-3-2-1-1-0.5-0.250

4-Bit Truth table Addder


Map 0 Add0123456789101112131415Student
00.250.50.75123+infNaN-0.25-0.5-0.75-1-2-3-inf
0000.250.50.75123+infNaN-0.25-0.5-0.75-1-2-3-inf
10.250.250.50.751123+infNaN0-0.25-0.5-0.75-2-3-inf
20.50.50.751123+inf+infNaN0.250-0.25-0.5-2-3-inf
30.750.7511223+inf+infNaN0.50.250-0.25-0.5-2-inf
41112223+inf+infNaN0.750.50.250-1-2-inf
5222333+inf+inf+infNaN22110-1-inf
6333+inf+inf+inf+inf+inf+infNaN332210-inf
7+inf+inf+inf+inf+inf+inf+inf+inf+infNaN+inf+inf+inf+inf+inf+inf0
8NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
9-0.25-0.2500.250.50.7523+infNaN-0.5-0.75-1-1-2-3-inf
10-0.5-0.5-0.2500.250.523+infNaN-0.75-1-1-2-3-inf-inf
11-0.75-0.75-0.5-0.2500.2512+infNaN-1-1-2-2-3-inf-inf
12-1-1-0.75-0.5-0.25012+infNaN-1-2-2-2-3-inf-inf
13-2-2-2-2-0.5-101+infNaN-2-3-3-3-inf-inf-inf
14-3-3-3-3-2-2-10+infNaN-3-inf-inf-inf-inf-inf-inf
15-inf-inf-inf-inf-inf-inf-inf-inf0NaN-inf-inf-inf-inf-inf-inf-inf

4-Bit Truth table Multiply


Map 0 Mul0123456789101112131415Student
00.250.50.75123+infNaN-0.25-0.5-0.75-1-2-3-inf
0000000001NaN000000-1
10.250000.250.250.50.75+infNaN00-0.25-0.25-0.5-0.75-inf
20.5000.250.50.512+infNaN0-0.25-0.5-0.5-1-2-inf
30.7500.250.50.50.7522+infNaN-0.25-0.5-0.5-0.75-2-2-inf
4100.250.50.75123+infNaN-0.25-0.5-0.75-1-2-3-inf
5200.5122+inf+inf+infNaN-0.5-1-2-2-inf-inf-inf
6300.75223+inf+inf+infNaN-0.75-2-2-3-inf-inf-inf
7+inf1+inf+inf+inf+inf+inf+inf+infNaN-inf-inf-inf-inf-inf-inf-inf
8NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
9-0.25000-0.25-0.25-0.5-0.75-infNaN000.250.250.50.75+inf
10-0.500-0.25-0.5-0.5-1-2-infNaN00.250.50.512+inf
11-0.750-0.25-0.5-0.5-0.75-2-2-infNaN0.250.50.50.7522+inf
12-10-0.25-0.5-0.75-1-2-3-infNaN0.250.50.75123+inf
13-20-0.5-1-2-2-inf-inf-infNaN0.25122+inf+inf+inf
14-30-0.75-2-2-3-inf-inf-infNaN0.5223+inf+inf+inf
15-inf-1-inf-inf-inf-inf-inf-inf-infNaN0.75+inf+inf+inf+inf+inf+inf

Lookup table synthesis


Example:
Operation: 1/x
Codes: 0,0.25,0.5,0.75,1,2,3,+inf,NaN,-0.25,-0.5,-0.75,-1,-2,-3,-inf
Output: +inf,3,2,1,1,0.5,0.25,0,NaN,-3,-2,-1,-1,-0.5,-0.25,0 Symmetry: yes (no)

Codes:
Output:



Truth table:
Equations:
Structural lookup MUX VHDL:
Verilog:
Simulation:

VHDL LUT Implementation


Example:
output2: 37E54251
  MUX4_0: MUX4 port map(net_L1, net_L5, net_L2, net_L4, I2, I3, Y2MA0);
  MUX4_1: MUX4 port map(net_L5, net_LE, net_L7, net_L3, I2, I3, Y2MA1);
  MUX2_0: MUX2 port map(Y2MA0, Y2MA0, 4, IY2);   // Y2MB0

Signal names:
I0,..,In: inputs
L0,..,LF: lookup outputs
Y0MA0,..,YnMAi: first multiplexer stage output
Y0MB0,..,YnMBi: next multiplexer stage output
..
Y0,..Yn:output

-------------------- Cell LUT4_37E5{sch} --------------------
entity LUT4_37E5 is port(I0, I1, I2, I3: in BIT; Y: out BIT);
  end LUT4_37E5;

architecture LUT4_37E5_BODY of LUT4_37E5 is
  component LUT2E port(I0, I1: in BIT; O: out BIT);
    end component;
  component LUT23 port(I0, I1: in BIT; O: out BIT);
    end component;
  component LUT25 port(I0, I1: in BIT; O: out BIT);
    end component;
  component LUT27 port(I0, I1: in BIT; O: out BIT);
    end component;
  component MUX4 port(I0, I1, I2, I3, I4, I5: in BIT; O: out BIT);
    end component;

  signal net_L5, net_L3, net_L7, net_LE: BIT;

begin
  LUT2E_0: LUT2E port map(I0, I1, net_LE);
  LUT23_0: LUT23 port map(I0, I1, net_L3);
  LUT25_0: LUT25 port map(I0, I1, net_L5);
  LUT27_0: LUT27 port map(I0, I1, net_L7);
  MUX4_0: MUX4 port map(net_L5, net_LE, net_L7, net_L3, I2, I3, Y);
end LUT4_37E5_BODY;

Verilog Counter Implementation



Verilog Truth Table Implementation


module tt4(clk,a,b); 
  input clk;
  input  [3:0] a;
  output [3:0] b;
  reg [3:0] newstate; 
  always @(posedge clk)   // clk sync truth table
  begin
   case (a)
    4'b0000 : newstate = 4'b0001;
    ..
    4'b1111 : newstate = 4'b0000;
    default : newstate = 4'b0000; // 
   endcase
  end 
  assign b = newstate;
endmodule
tt4 tt40(.clk(clk),.a(c),.b(c)); // makes counter 

4-Bit counter truth table
module Counter4tt(clk,a,b);
  input clk;
  input  [3:0] a;
  output [3:0] b;
  reg [3:0] newstate;
  
  always @(posedge clk)   // truth table
  begin
   case (a)
    4'b0000 : newstate = 4'b0001;
    4'b0001 : newstate = 4'b0010;
    4'b0010 : newstate = 4'b0011;
    4'b0011 : newstate = 4'b0100;
    4'b0100 : newstate = 4'b0101;
    4'b0101 : newstate = 4'b0110;
    4'b0110 : newstate = 4'b0111;
    4'b0111 : newstate = 4'b1000;
    4'b1000 : newstate = 4'b1001;
    4'b1001 : newstate = 4'b1010;
    4'b1010 : newstate = 4'b1011;
    4'b1011 : newstate = 4'b1100;
    4'b1100 : newstate = 4'b1101;
    4'b1101 : newstate = 4'b1110;
    4'b1110 : newstate = 4'b1111;
    4'b1111 : newstate = 4'b0000;
    default : newstate = 4'b0000; // 
   endcase
  end 
  assign b = newstate;
endmodule

Optimization


Deliverables


Final layout with area and number of transistors
Verification simulation (LTSPICE, Verilog)
Suggestions for optimization
Date for report: xx.xx.2026