This repository accompanies the paper "ZEKRA: Zero-Knowledge Control-Flow Attestation".
The directory structure of the ZEKRA repository is as follows:
- embench-iot-applications: contains a select subset of applications from embench-iot
- scripts: contains a set of helper scripts written in Python 3
- poseidon: module is a Python implementation of the Poseidon hashing function [1]
- circuit_input_formatter.py: script to format inputs for the ZEKRA circuit (including the hashing)
- compile_circuit.py: script to compile a ZEKRA circuit
- extractor.py: script to (1) compile an application and then (2) extract its CFG and a sample execution path
- zekra_java: contains the produced java files for ZEKRA after transforming the xJsnark code to java code using xJsnark's front-end extension of the MPS Framework.
- zekra_xjsnark: contains the high-level ZEKRA program code written in the xJsnark framework
- readable: contains the ZEKRA program code in a readable format (without having to install xjsnark and JetBrains MPS 3.3)
- zekra.mps: the actual ZEKRA program code written using the xjsnark library
- setup.sh: shell script to install all dependencies
- xjsnark_backend.jar: the xJsnark backend for compiling circuits
- ZEKRA
- Directory Structure
- Table of Contents
- Setting up the environment
- Extracting Control-Flow Graphs and sample execution paths
- Preparing inputs to the ZEKRA circuit
- Compiling the ZEKRA circuit
- Executing/Profiling the ZEKRA circuit using jsnark's inferface to libsnark's implementation of Groth16
- Editing the ZEKRA xjsnark program code
- References
- Disclaimer
To generate a circuit from the zekra_java code using the xjsnark_backend.jar, it is necessary to have Java installed. Furthermore, to work with the generated circuit using the libsnark/jsnark_interface (e.g., to generate the proving and verification keys for the circuit and also generate and verify proofs), both jsnark (libsnark) and its dependencies must be properly setup as described here. However, to make things easier, the shell script setup.sh includes the necessary commands to setup jsnark, its dependencies, a simple java environment, and the Python 3 packages necessary to run the helper scripts, such as the angr package to extract the static CFG from binaries and also to extract an example execution path through symbolic execution.
To run the script, execute the following commands (tested on Ubuntu 20.04):
chmod +x setup.sh
./setup.sh
Usage: scripts/extractor.py [options]
Options:
-h This help message
-v Verbose output
-d <path/to/dir> Directory containing target applications organized into folders (default is to consider the './embench-iot-applications' directory)
-a <path> Path to specific target application
-e <name1,name2> Comma separated list of application folders to exclude
We used the demonstrative application from the open source benchmarking suite embench-iot (with minor modifications) in our paper to demonstrate our circuit's applicability for such deeply embedded applications. We provide the set of applications used in the embench-iot-applications directory. To reproduce the datasets used in our paper, we recommend using our extractor.py script to compile each application binary and extract its CFG and example execution path, as follows:
python3 scripts/extractor.py -d ./embench-iot-applications
It is also possible to target only a specific application, as follows:
python3 scripts/extractor.py -a ./embench-iot-applications/crc32
For each application, the extractor.py script will generate the following files in the targeted application's directory:
adjlist
: containing the raw adjacency list (hex addresses)numified_adjlist
: containing an example transformation of the raw adjacency list to a numerically labeled adjacency listnumified_path
: containing the labeled execution path as translated using the translator mappingrecorded_path
: containing the raw execution path (hex addresses)translator
: containing the mapping between the raw and numified adjacency liststats.log
: containing a copy of the stats which are printed to stdout during command execution
Below you can see the contents of ./embench-iot-applications/crc32/stats.log
after executing the command python3 scripts/extractor.py -a ./embench-iot-applications/crc32
:
./embench-iot-applications/crc32
Min addr: 0x400000
Max addr: 0x807fff (bitwidth=24)
CFG has 88 nodes and 106 edges
Adjacency list max_neighbors: 2 ['3', '4']
Bucket-rems pairs (levels) required to express the encoded adjacency list: 2
Execution path died because it: reached an exit
Execution path length pre compression: 3090
Number of consecutively repeated sequences: 1
Average (mean) length of sequences: 3.00 (std=0.00)
Average (mean) number of sequence repetitions: 1023.00 (std=0.00)
Execution path length post compression: 24
Max stack depth: 6
Execution path is valid according to the adjlist: True
Usage: scripts/circuit_input_formatter.py -a <dir> [options]
Options:
-h This help message
-a <dir> Path to specific target application's directory containing the 'adjlist', 'numified_adjlist', 'translator', 'recorded_path', and 'numified_path' files
--pad-adjlist-to <len> Pad the adjlist and translator with zeros (empty entries) until their length is <len> (default is to not pad)
--pad-path-to <len> Pad the execution path with zeros (empty moves) until its length is <len> (default is to not pad)
--adjlist-levels <num> Use <num> levels (pairs of bucket-rems) for encoding the adjacency list. Note that this primarily affects the adjacency list compression which is done prior to hashing to reduce the number of calls to Poseidon (default is to use the minimum levels necessary to maximize compression and minimize the size of the generated circuit)
--output-dir <dir> Store the circuit input files in <dir> (default is to store the files in the same directory as the targeted application)
--nonce-verifier <num> Verifier's nonce to hash with the execution path (default 0)
--nonce-path <num> Blinding factor (nonce) to hash with the execution path (default 0)
--nonce-translator <num> Blinding factor (nonce) to hash with the address-to-label translator (default 0)
--nonce-adjlist <num> Blinding factor (nonce) to hash with the encoded adjacency list (default 0)
--label-bitwidth <num> Use <num> bits to represent each numified destination address when compressing/hashing the numified execution path (default is to use the minimum number of bits as determined by the size of the adjacency list).
--bucket-bitwidth <num> Use <num> bits to represent each quotient (bucket) in the adjacency list encoding (default is to use the miminum number of bits as determined by the length of the adjacency list).
--address-bitwidth <num> Use <num> bits to represent each destination address when compressing/hashing the raw/recorded execution path (default is to use the minimum number of bits as determined by the recorded execution path provided as input).
For example, to prepare the circuit inputs adjacency list to fit a circuit data structure that supports up to 500 nodes, use 15 levels to represent each node's encoded neighbors (with 7-bit buckets on each level), support execution paths comprising 500 transitions (with 24-bit addresses), and supporting a shadow stack depth of 15, execute the following command:
python3 scripts/circuit_input_formatter.py \
-a ./embench-iot-applications/crc32/ \
--pad-adjlist-to 500 \
--pad-path-to 500 \
--adjlist-levels 15 \
--nonce-verifier 12353 \
--nonce-path 123 \
--nonce-translator 123 \
--nonce-adjlist 123 \
--label-bitwidth 10 \
--bucket-bitwidth 7 \
--address-bitwidth 24
Below is an example of the above command's output:
The input adjacency list contains 500 entries/nodes (i.e., max label is 500, which occupies 9 bits)
The maximum address in the input adjacency list is 0x601060 and occupies 23 bits
Minimum: ADJLIST_LEVELS=2 LABEL_BITWIDTH=9 BUCKET_BITWIDTH=6 ADDR_BITWIDTH=23
Considering: ADJLIST_LEVELS=15 LABEL_BITWIDTH=10 BUCKET_BITWIDTH=7 ADDR_BITWIDTH=24
The encoded adjacency list contains 500 nodes (88 without padding)
Wrote encoded adjacency list to file './embench-iot-applications/crc32/in_encoded_adjlist'
Starting hashing of the encoded adjacency list
neighbors_bitwidth: 225 (15 levels * (7-bit buckets + 8-bit rems))
elems_per_field_element: 1 (using 254-bit p)
occupies 500 field elements after compression
padding compressed path with 4 additional field elements (Poseidon call has arity 8 and we need to reserve 1 element for the adjacency list nonce)
calls to Poseidon needed: 63
Wrote encoded adjlist digest to file './embench-iot-applications/crc32/in_encoded_adjlist_digest'
Encoded adjacency list hash: 5985449892039608890456185764561858730723264824691156509074284478893635440892
The translator contains 501 addresses (88 without padding)
Wrote translator to file './embench-iot-applications/crc32/in_translator'
Starting hashing of the translator
ADDR_BITWIDTH: 24
elems_per_field_element: 10 (using 254-bit p)
occupies 51 field elements after compression
padding translator with 5 additional field elements (Poseidon call has arity 8 and we need to reserve 1 element for the translator nonce)
calls to Poseidon needed: 7
Wrote translator digest to file './embench-iot-applications/crc32/in_translator_digest'
Translator hash: 5551575237474407726413860559067084002486780251770875449597188721978322980717
The numified execution path contains 500 transitions (24 without padding)
Wrote numified execution path to file './embench-iot-applications/crc32/in_numified_path'
Wrote initial node = 0 to file './embench-iot-applications/crc32/in_initial_node'
Wrote final node = 72 to file './embench-iot-applications/crc32/in_final_node'
Wrote verifier's nonce to file './embench-iot-applications/crc32/in_nonce_verifier'
Wrote execution path's nonce to file './embench-iot-applications/crc32/in_nonce_path'
Wrote translator's nonce to file './embench-iot-applications/crc32/in_nonce_translator'
Wrote adjacency list's nonce to file './embench-iot-applications/crc32/in_nonce_adjlist'
The recorded execution path contains 500 transitions (24 without padding)
Wrote recorded execution path to file './embench-iot-applications/crc32/in_recorded_path'
Starting hashing of the execution path
transition_bitwidth: 50 (2-bit jumpkind||24-bit dest address||24-bit ret address)
elems_per_field_element: 5 (using 254-bit p)
occupies 100 field elements after compression
padding compressed path with 4 additional field elements (Poseidon call has arity 8 and we need to reserve 2 elements for the verifier and execution path nonces)
calls to Poseidon needed: 13
Wrote recorded execution path digest to file './embench-iot-applications/crc32/in_recorded_path_digest'
Recorded execution path hash: 21393937064302472340191694070938458033674045799645644822430973995137600407803
As shown in the above command output, for each application, the circuit_input_formatter.py script will store the following, additional files to output-dir/
:
output-dir/in_encoded_adjlist
: the encoded adjacency listoutput-dir/in_encoded_adjlist_digest
: Poseidon hash of the encoded adjacency list (post padding)output-dir/in_final_node
: containing the final (exit) node of the input execution pathoutput-dir/in_initial_node
: containing the initial (starting) node of the input execution pathoutput-dir/in_nonce_adjlist
: containing the encoded adjacency list's nonce (blinding factor)output-dir/in_nonce_path
: containing the execution path's nonce (blinding factor)output-dir/in_nonce_translator
: containing the address-to-label translator's nonce (blinding factor)output-dir/in_nonce_verifier
: containing the verifier's nonceoutput-dir/in_numified_path
: the execution path already translated into its numeric form (which is verified against the translator inside the circuit)output-dir/in_recorded_path
: containing the recorded (raw) execution pathoutput-dir/in_recorded_path_digest
: Poseidon hash of the recorded execution path (post padding)output-dir/in_translator
: the translatoroutput-dir/in_translator_digest
: Poseidon hash of the address-to-label translator (post padding)
Note: The compile_circuit.py Python script compiles the ZEKRA circuit using the xjsnark_backend.jar and thus requires an existing Java environment (see preceding section). Furthermore, the command requires xjsnark_backend.jar to be in the current working directory.
Usage: scripts/compile_circuit.py --adjlist-len <num> --adjlist-levels <num> --path-len <num> --stack-depth <num> --label-bitwidth <num> --bucket-bitwidth <num> --address-bitwidth <num> [options]
Options:
-h This help message
-v Output also how the workload (in number of constraints) is distributed among the six individual ZEKRA components/gadgets
--zekra-dir <dir> Path to the directory containing the ZEKRA Java files (default is ./zekra) --adjlist-len <num> Set <num> as the number of nodes in the adjacency list (adjacency lists < max must be padded before being passed as input)
--adjlist-levels <num> Set <num> as the number of levels used to represent each node's neighbors in the adjacency list
--path-len <num> Set <num> as the number of transitions in the execution path (execution paths < max must be padded before being passed as input)
--stack-depth <num> Set <num> as the maximum depth of the shadow stack data structure (execution paths that surpass this upper bound will cause the proof to become rejected so it must be set appropriately)
--label-bitwidth <num> Set <num> as the number of bits to represent each numified destination address when compressing/hashing the numified execution path
--bucket-bitwidth <num> Set <num> as the number of bits to represent each quotient (bucket) in the adjacency list encoding
--address-bitwidth <num> Set <num> as the number of bits to represent each destination address when compressing/hashing the raw/recorded execution path
--input-dir <dir> Directory containing the output files from 'circuit_input_formatter.py' (default is to check the current working directory)
--output-dir <dir> Store the <zekra.arith> and <zekra_Sample_Run1> files in <dir> (default is to store the files in the current working directory)
--components-dir <dir> If -v is used, then <dir> is the path to the directory containing the different ZEKRA components 'zekra_c1,...,zekra_c6' (default is ./components)
For example, to prepare the ZEKRA circuit with an adjacency list data structure to support 500 nodes, use 15 levels to represent each node's encoded neighbors, support execution paths comprising 500 transitions, and supporting a shadow stack depth of 15, execute the following command (xjsnark will also prepare the circuit inputs from input-dir
to be supplied to libsnark):
python3 scripts/compile_circuit.py \
--zekra-dir zekra_java/zekra \
--adjlist-len 500 \
--adjlist-levels 15 \
--path-len 500 \
--stack-depth 15 \
--label-bitwidth 10 \
--bucket-bitwidth 7 \
--address-bitwidth 24 \
--input-dir ./embench-iot-applications/crc32 \
--components-dir zekra_java/components \
-v
Below is an example of the above command's output:
Successfully compiled the ZEKRA circuit.
Total constraints: 336230
Arithmetic circuit stored in: zekra.arith
Formatted inputs stored in: zekra_Sample_Run1.in
Compiling individual components to find workload distribution.
Component #1's constraints: 25513 (7.6% of total)
Component #2's constraints: 5263 (1.6% of total)
Component #3's constraints: 2833 (0.8% of total)
Component #4's constraints: 128668 (38.3% of total)
Component #5's constraints: 108089 (32.1% of total)
Component #6's constraints: 65864 (19.6% of total)
The compile_circuit.py script will store the following files to output-dir/
:
output-dir/zekra.arith
: containing the generated ZEKRA arithmetic circuitoutput-dir/zekra_Sample_Run1.in
: containing the formatted inputs to the circuit
Executing/Profiling the ZEKRA circuit using jsnark's inferface to libsnark's implementation of Groth16
Usage:
./jsnark/libsnark/build/libsnark/jsnark_interface/run_ppzksnark gg \
zekra.arith \
zekra_Sample_Run1.in
Where, given a circuit definition <zekra.arith>
and formatted inputs <zekra_Sample_Run1.in>
(from Compiling the ZEKRA circuit), the called procedure will profile the execution of the following algorithms: key generation, proof generation, and proof verification.
Editing the ZEKRA xjsnark program code
To directly edit or compile the zekra.mps program code, follow these instructions:
- First download (or clone) xjsnark and follow its installation instructions, which includes an installation of JetBrains MPS 3.3.
- Copy the zekra.mps file to
xjsnark/languages/xjsnark/sandbox/models/xjsnark
. - Open the xjsnark project in JetBrains MPS 3.3.
- Right-click
xjsnark
in the project viewer and click "Rebuild Language 'xjsnark'". - Expand the
zekra
module underxjsnark.sandbox
and open thezekra.zekra
program. - Perform any modifications/improvements/expansions:
- After performing any modications, you must right-click the
zekra
module and click "Rebuild Model 'xjsnark.zekra'". - Then, to compile the modified ZEKRA program code, right-click the
zekra.zekra
program and click "Run 'Class zekra'", which will generate the same output files as before in a directory of your choosing, namely:zekra.arith
: containing the ZEKRA arithmetic circuit.zekra_Sample_Run1.in
: containing the formatted inputs to the ZEKRA arithmetic circuit.
[1] POSEIDON: A New Hash Function for Zero-Knowledge Proof Systems , Lorenzo Grassi, Dmitry Khovratovich, Christian Rechberger, Arnab Roy, and Markus Schofnegger, USENIX Security Symposium 2021
This is an early release that could contain issues and inconsistencies. The implementations provided in this repository are currently only research prototypes.