A physical unclonable function (PUF) provides a digital fingerprint that can be used as unique identifier for
security-related applications like authentication. The fpga_puf
hardware module provides an unclonable 96-bit unique identifier (ID)
that is defined by the target chip's semiconductor characteristics. It is implemented in a technology-independent
way that does not use any device-specific macros, primitives or attributes so it can be implemented on any FPGA
(verified using Intel Quartus Prime, Lattice Radiant and Xilinx Vivado).
🔑 The PUF ID is unique for the combination of bitstream and one specific FPGA. The same bitstream will lead to a different PUF ID on a different FPGA of same type. If the bitstream of a specific FPGA is changed (by changing design logic) the PUF ID of that specific FPGA will also change.
Key features
- technology-, vendor- and platform-agnostic implementation
- easy to use
- tiny hardware footprint (less than 200 LUTs)
- secure and reliable (:construction: work-in-progress)
📢 This is an ongoing research / proof-of-concept project. Feedback from the community is highly appreciated (see notes in section "TODO")!
Each bit of the 96-bit PUF ID is generated by an individual PUF cell. Each cell consists of an asynchronous element that is based on a simple ring oscillator. The oscillator is constructed from a single inverter that negates it's own output signal. This feedback loop is interrupted by a latch. If the latch is open (=transparent) the oscillator starts oscillating. If the latch is closed, it stores the last state of the oscillator (high or low). The latch also provides a reset to bring each cell into a defined state.
The frequency of each oscillator is defined by the mapping of the logic cell and the according routing. Both factors are constant for a specific bitstream. Furthermore, the frequency is also randomly (but constantly for a given setup) "tuned" by the chip's semiconductor characteristics. These are caused by tiny productions fluctuation (for example the capacitance / delay of a routing wire is affected by variation in oxide thickness). The frequency of each oscillator is considered to be fixed. Hence, sampling several periods within a fixed time window will always end in the same state (oscillator output is high or low). Temperature drift disturbs all oscillators in the (nearly) same way and has to be computationally eliminated/compensated by the post-processing.
Since fpga_puf
does not use any device-specific attributes or primitives, there is a risk that the synthesis toolchain will
remove the asynchronous PUF cells or collapse them into a single PUF cell ("optimize away"). Therefore, the design
uses a shift register to control reset and the latch open/close phase of each cell individually and distributed over time. This concept is
based on the neoTRNG True Random Number Generator and allows a platform-independent implementation.
Whenever a new sampling of the PUF ID is started, a 96+1 bit wide shift register. A single '1' is applied to the
least significant bit that travels throughout the whole shift register chain during operation. The PUF cell's control signals (reset and
latch control) are connected to this shift register. Each shift register bit i
controls the reset of PUF cell
i
and the latch control of PUF cell i+1
. Thus, a cell's reset is active for one clock cycle. In the next clock
cycle the cell's latch is opened for exactly one clock cycle allowing the oscillator to run. In the next clock cycle
the latch is closed again and has captured the oscillator's last state. By this scheme only one oscillator is active
at a time reducing potential distortion by oscillator cross-talk (i.e. the oscillation of one osc. affect the
oscillations of another osc.).
When the single one-bit reaches the most significant bit (bit 96) of the shift register the sampling process is completed and the latch states are sampled into a register that provides the obtained raw PUF ID.
The following figure is taken from the Intel Quartus Prime technology viewer showing a single PUF cell (cell #44). The oscillator as well as the latch are mapped to a single LUT3 element in the FPGA. The output of this asynchronous element is sampled by a simple register. Hence, a single PUF cell only requires one LUT and one FF.
(DATAA
is the latch reset (high-active), DATAC
is the latch control (open when set), DATAD
is the
feedback of the inverter)
Some bits of the raw PUF ID might be quite noisy, which complicates defining a stable ID that does not change over time and over a broad range of operation conditions (e.g. temperature). A lot(!) of different approaches can be found in literatures. One promising approach is the use of error-correction codes to "stabilizes" an initially determined ID.
However, these concepts are out of (my) scope so I used a simple "averaging" concept for this example. My approach samples
the raw PUF ID several times (for example 4096 times) and checks how often each bit of the ID is set across
all sampled IDs. If an ID bit is set in more then half of the samples it is considered to be a static 1
, otherwise it is considered to
be a static 0
. Since a few bits of the raw ID might be quite noisy, a hysteresis is used to eliminate those bits from the final PUF:
if a bit is set or cleared more often than a certain threshold it is considered "stable" in the final ID. These lower and
upper hysteresis threshold define an uncertain band between them. For the final ID all bits tha fall
into this uncertainty band are masked to be always zero.
My post-processing concept can be found in sw/main.c
. The
source file provides more-detailed comments.
The top entity of the PUF IP module is fpga_puf
(rtl/fpga_puf.vhd
).
It can be directly instantiated without the need for any special libraries.
entity fpga_puf is
port (
clk_i : in std_ulogic; -- global clock line
rstn_i : in std_ulogic; -- SYNC reset, low-active
trig_i : in std_ulogic; -- set high for one clock to trigger ID sampling
busy_o : out std_ulogic; -- busy when set (sampling ID)
id_o : out std_ulogic_vector(95 downto 0) -- PUF ID (valid after sampling is done)
);
end fpga_puf;
To evaluate this concept and the quality/reliability of the PUF IDs I am using the NEORV32
as processor platform. The fpga_puf
IP module is added to the processor's "Custom Functions Subsystem (CFS)",
which is a blank template for implementing custom application-specific co-processors.
The setup is synthesized for different FPGAs using different toolchains (tested with Intel Quartus Prime, Lattice Radiant and Xilinx Vivado). A specific bitstream generated and programmed into several FPGAs of the same type to check for chip-specific ID variations. On one chip the ID is generated several times to check if it is "stable over time" and thus, reliable.
The application-specific PUF-wrapping CFS can be found in
rtl/neorv32_cfs.vhd
. Make sure to use this
CFS file instead of the default NEORV32 CFS source file when reproducing this setup.
The CFS uses four memory-mapped registers to interface the PUF ID module:
CSF register address (C access macro) | Access | Function |
---|---|---|
NEORV32_CFS.REG[0] |
r/w |
Control register |
NEORV32_CFS.REG[1] |
r/- |
PUF ID bits 31:0 |
NEORV32_CFS.REG[2] |
r/- |
PUF ID bits 63:32 |
NEORV32_CFS.REG[3] |
r/- |
PUF ID bits 95:64 |
Bit 0 of the control register (PUF_CTRL_EN
) control the synchronous reset signal of the fpga_puf
module
(rstn_i
). Setting this bit will activate the module, clearing it will put the module into reset state.
Bit 1 of the control register (PUF_CTRL_SAMPLE
) is used to start the sampling of the ID. Writing one to it
will set the trigger signal (trig_i
) high for one cycle. Reading this control register bit will return
the busy state of the fpga_puf
module (busy_o
).
💾 The low-level hardware-accessing functions are implemented in
sw/fpga_puf_neorv32_cfs.c
. The header file
sw/fpga_puf_neorv32_cfs.h
provides the function prototypes, a PUF ID data type and also the NEORV32 CFS register mappings and bit definitions.
The PUF test program (sw/main.c
) is used to sample
the chip's PUF ID and transmit it via UART to a terminal program:
Physical Unclonable Functions <fpga_puf> Test
PUF implemented as NEORV32 Custom Functions Subsystem (CFS)
Press any key to start PUF test (8 runs with 4096 samples each).
Starting test...
Run 0 ID: 0x37c0480063021011988c0095
Run 1 ID: 0x37c0480063021011988c0095
Run 2 ID: 0x37c0480063021011988c0095
Run 3 ID: 0x37c0480063021011988c0095
Run 4 ID: 0x37c0480063021011988c0095
Run 5 ID: 0x37c0480063021011988c0095
Run 6 ID: 0x37c0480063021011988c0095
Run 7 ID: 0x37c0480063021011988c0095
Test completed.
💾 A pre-compiled NEORV32 executable of this test program is also available in this
repository: sw/neorv32_exe.bin
(compiled for a minimal rv32i
NEORV32 CPU)
So far I have tested the PUF module on three Lattice iCE40 UltraPlus FPGAs and four Intel Cyclone IV FPGAs (shout-out to @emb4fun - thank you for your help!) and just one Xilinx Artix-7 FPGA. The ID was generated 8 times on each chip to "check" if it is reproducible (drawback: just in a very short time window...). The specific FPGA types are shown in section Hardware Utilization.
FPGA | PUF ID ("Fingerprint Key") |
---|---|
Lattice iCE40 UltraPlus - 1 | 0x37c0480063021011988c0095 |
Lattice iCE40 UltraPlus - 2 | 0x592063e50118040c2000112b |
Lattice iCE40 UltraPlus - 3 | 0x941c82505112323600b0c221 |
Intel Cyclone IV - 1 | 0x9fbe33dc9021be3156cae31b |
Intel Cyclone IV - 2 | 0xa726db495a2a8346b30f2000 |
Intel Cyclone IV - 3 | 0x413fe25557311be7f9f5edfb |
Intel Cyclone IV - 4 | 0xca8ee8dc7945fa60f770e1fa |
Xilinx Artix-7 - 1 | 0x6faaf93af77cf77fef91fe79 |
So far, the IDs are unique for each tested FPGA!
🚧 work in progress 🚧
A long-time test is used to sample and check raw IDs (not the pre-processed ones) for stability over time. Note that my setup is placed in pretty stable environment conditions.
The test generates an initital ID I
right at the beginning of execution. In an endless loop two consecutive
IDs A
and B
are sampled (a single "Run"). To evaluate the the "instability" the Hamming distance (number of bits that are
not identical across two samples) is computed. This test computes the relative Hamming distance of the two
consecutive samples A
and B
(= H(A,B)
) and also the relative Hamming distance between the initial sample
and sample A
(= H(I,A)
) from the current run. Furthermore, the maximum ob both distances is computed over
all runs (H_max(A,B)
and H_max(I,A)
).
To identify noisy bits an "accumulated bit-change-mask" F
is computed. This mask is computed for every run
by XOR-ing the obtained samples A
and B
with the initial ID I
. The mask from a run is OR-ed with the mask from
the previous run to accumulate the noisy bits over time.
Cut-out of the test log:
...
Run 43373: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9423beb156cae31b, B=0xbfbe3bfcb021beb156cae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=3, H_max(A,B)=9 - H(I,A)=19, H_max(I,A)=23
Run 43374: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9435beb156eae31b, B=0xbfbe3bfcb431beb156eae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=2, H_max(A,B)=9 - H(I,A)=21, H_max(I,A)=23
Run 43375: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9425beb156caeb1b, B=0xbfbe3bdcb421beb156cae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=4, H_max(A,B)=9 - H(I,A)=20, H_max(I,A)=23
Run 43376: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9425beb156eaeb1b, B=0xbfbe3bfcb021beb156eae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=4, H_max(A,B)=9 - H(I,A)=21, H_max(I,A)=23
Run 43377: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9021beb156caeb1b, B=0xbfbe3bdc9425beb156cae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=4, H_max(A,B)=9 - H(I,A)=18, H_max(I,A)=23
...
This very first test was run for approx. half an hours making ~20 runs per second.
It shows a maximal Hamming distance of 23 bits while 33 bits (F
) tend to be noisy (compared to the initial ID I
sampled once right at the beginning of the test). The number of noisy bits increases slowly over time, probably
because of increasing chip temperature. It tends to increase slower over time, so there might be a saturation at some point.
The temperature of the PUF cells is most important because that impacts the oscillator frequencies.
The PUF cells heat up due to the continuous PUF operation.
The PUF from this test setup provides 96-33=63 bits that seem to be stable over the observed time. This also means that the usable key space is reduced (63-bit instead of 96-bit).
Mapping results for the custom function subsystem (CFS), the fpga_puf
module and a single PUF cell.
Lattice ice40 UltraPlus iCE40UP5K-SG48I @24MHz |
Logic Cells | Logic Registers |
---|---|---|
neorv32_cfs_inst_true.neorv32_cfs_inst | 171 (71) | 232 (35) |
-fpga_puf_inst | 100 (4) | 197 (101) |
--fpga_puf_cell_inst[0].fpga_puf_cell_inst_i | 1 (1) | 1 (1) |
Intel Cyclone IV EP4CE22F17C6N @100MHz |
Logic Cells | Logic Registers |
---|---|---|
neorv32_cfs:\neorv32_cfs_inst_true:neorv32_cfs_inst | 241 (43) | 232 (35) |
-fpga_puf:fpga_puf_inst | 198 (102) | 197 (101) |
--fpga_puf_cell:\fpga_puf_cell_inst:0:fpga_puf_cell_inst_i | 1 (1) | 1 (1) |
Xilinx Artix-7 XC7A35TICSG324-1L @100MHz |
Logic Cells | Logic Registers |
---|---|---|
neorv32_cfs_inst_true.neorv32_cfs_inst (neorv32_cfs) | 133 | 328 |
-fpga_puf_inst (fpga_puf) | 133 | 293 |
--fpga_puf_cell_inst[0].fpga_puf_cell_inst_i (fpga_puf_cell) | 1 | 2 (Latch + FF) |
ℹ️ Xilinx Vivado can detect the PUF cell's latch and infers a FF primitive, which also provides a latch mode. This does not compromise the functionality of the PUF.
The PUF IDs are unique for each FPGA and can be successfully implemented on different FPGAs (Xilinx, Intel, Lattice).
The raw PUF ID is only partly reproducible, because some bits tend to be quite noisy (see results above). A better post-processing algorithm using error-correction codes should be able to compensate for that. This is work in progress.
The latest test showed that there is a certain number of bits that are quite noisy so they cannot be used to determine the PUF IF. Obviously, this reduces the maximum key space (for example only 63-bit of the 96-bit ID are usable). To circumvent this, the PUF can be made arbitrarily wide (for example providing a 256-bit raw ID). Of course this will also introduce additional unusable noisy bits, but we expect that the percentage of noisy bits within the PUF ID is a device-specific constant. Each additional PUF ID bit adds 2 FFs (one for the SREG and one for the PUFF cell) and 1 LUT. For each additional bit the ID sampling time is increased by one clock cycle.
📢 If you have any kind of ideas or feedback (for example how to improve reliability) feel free to open a new issue. I am also happy to get more data, so if you have ported the design on another FPGA you can open a pull request and add your results.
- test more FPGAs
- check stability in a controlled environment (e.g. temperature chamber)
- evaluate more sophisticated post-processing algorithms
- sample more data from more long-time test
- faulty bits over time (and temperature)
- hamming distance over time (and temperature)
- ...
- to be continued...