reference/BPFVM.txt

Copyright 2020 Marco Merlini. This file is part of the fpga-bpf project,
whose license information can be found at 
https://github.com/UofT-HPRC/fpga-bpf/blob/master/LICENSE


The machine has two 32 bit registers, A (the accumulator) and X (used mostly for
addressing).

The "main memory" is the packet we are testing against. It is read-only. There 
is also a small, constant-sized scratch memory with read/write access. In the 
reference implementation, there are 16 words in the scratch memory.

There are eight classes of opcodes: 
 - LD:	Loading into the Accumulator
 - LDX:	Loading into the X register
 - ST:	Storing from the Accumulator
 - STX:	Storing from the X register
 - ALU:	Arithmetic & Logic
 - JMP:	Branching
 - RET:	Returning?
 - MISC:Other?

For LD instructions, you can use a Word (32 bits), half-word, or byte.
For LD instructions, there are the following addressing modes:
 - IMM: Use an immediate (specified in instruction)
 - ABS: Use absolute address (specified in instruction)
 - IND: Indirect (mem[X + value in instruction])
 - MEM: Instead of indexing into the packet itself, use scratch memory
	(address is always the immediate in the instruction)
 - LEN: Load the packet's length into a register
 - MSH: Some kind of weird shifting?

ST instructions only use scratch memory, and only support absolute addressing

For ALU instructions, the following operations are available:
 - ADD: A += X
 - SUB: A -= X
 - MUL: etc.
 - DIV: (integer division)
 - OR:  
 - AND: 
 - LSH: 
 - RSH: 
 - NEG: 
 - MOD: 
 - XOR: 

For JMP instructions, the following tests are available:

 - JA:   "Jump Always" (unconditional jump)
 - JEQ:  Jump if EQual
 - JGT:  Jump if Greater Than
 - JGE:  Jump if Greater than or Equal
 - JSET: Jump if A masked (i.e. ANDed) with a constant or X is nonzero
The instruction contains "jt" and "jf" fields, which say where to jump to. When
testing for equal, gt, or ge, you can specify it to test A (>|>=|=) constant
(where the constant is stored in the instruction), or A (>|>=|=) X. Note that 
this adds an offset to PC; it is not an absolute instruction index.

I'm not sure I understand RET instructions. I thought BPF programs were supposed
to return booleans? Anyway, you can either return A or a constant given in the
instruction.

There seem to only be two MISC instructions:
 - TAX: A = X
 - TXA: X = A
(just like in the 6502)

----------------------------

The instruction category is stored in the least-significant 3 bits (i.e. bits 2,
1, and 0).

If the instruction is LD or LDX, the size is in bits 4 and 3, and the addressing
type is in bits 7, 6, and 5. Where applicable, these are the same positions for 
the ST instructions.

If the instruction is an ALU instruction, the operation is in bits 7-4, and the
source for the second operand is in bit 3.

If the instruction is a JMP instruction, the type is in bits 6-4. The value to 
use for comparison (either an immediate or the X register) is indicated with bit
3.

For RET instructions, the value to return is specified by bits 4 and 3.

For MISC, the specific instruction is selected with bits 7-3. Right now, there 
is only 0x00 for TAX, and 0x80 for TXA. 


For complete reference, the exact values for each are given here, copied out of
pcap/bpf.h:

/*
 * The instruction encodings.
 *
 * Please inform tcpdump-workers@lists.tcpdump.org if you use any
 * of the reserved values, so that we can note that they're used
 * (and perhaps implement it in the reference BPF implementation
 * and encourage its implementation elsewhere).
 */

/*
 * The upper 8 bits of the opcode aren't used. BSD/OS used 0x8000.
 */

/* instruction classes */
#define BPF_CLASS(code) ((code) & 0x07)
#define		BPF_LD		0x00
#define		BPF_LDX		0x01
#define		BPF_ST		0x02
#define		BPF_STX		0x03
#define		BPF_ALU		0x04
#define		BPF_JMP		0x05
#define		BPF_RET		0x06
#define		BPF_MISC	0x07

/* ld/ldx fields */
#define BPF_SIZE(code)	((code) & 0x18)
#define		BPF_W		0x00 //Word, half-word, and byte?
#define		BPF_H		0x08
#define		BPF_B		0x10
/*				0x18	reserved; used by BSD/OS */
#define BPF_MODE(code)	((code) & 0xe0)
#define		BPF_IMM 	0x00
#define		BPF_ABS		0x20
#define		BPF_IND		0x40
#define		BPF_MEM		0x60
#define		BPF_LEN		0x80
#define		BPF_MSH		0xa0
/*				0xc0	reserved; used by BSD/OS */
/*				0xe0	reserved; used by BSD/OS */

/* alu/jmp fields */
#define BPF_OP(code)	((code) & 0xf0)
#define		BPF_ADD		0x00
#define		BPF_SUB		0x10
#define		BPF_MUL		0x20
#define		BPF_DIV		0x30
#define		BPF_OR		0x40
#define		BPF_AND		0x50
#define		BPF_LSH		0x60
#define		BPF_RSH		0x70
#define		BPF_NEG		0x80
#define		BPF_MOD		0x90
#define		BPF_XOR		0xa0
/*				0xb0	reserved */
/*				0xc0	reserved */
/*				0xd0	reserved */
/*				0xe0	reserved */
/*				0xf0	reserved */

#define		BPF_JA		0x00
#define		BPF_JEQ		0x10
#define		BPF_JGT		0x20
#define		BPF_JGE		0x30
#define		BPF_JSET	0x40
/*				0x50	reserved; used on BSD/OS */
/*				0x60	reserved */
/*				0x70	reserved */
/*				0x80	reserved */
/*				0x90	reserved */
/*				0xa0	reserved */
/*				0xb0	reserved */
/*				0xc0	reserved */
/*				0xd0	reserved */
/*				0xe0	reserved */
/*				0xf0	reserved */
#define BPF_SRC(code)	((code) & 0x08)
#define		BPF_K		0x00
#define		BPF_X		0x08

/* ret - BPF_K and BPF_X also apply */
#define BPF_RVAL(code)	((code) & 0x18)
#define		BPF_A		0x10
/*				0x18	reserved */

/* misc */
#define BPF_MISCOP(code) ((code) & 0xf8)
#define		BPF_TAX		0x00
/*				0x08	reserved */
/*				0x10	reserved */
/*				0x18	reserved */
/* #define	BPF_COP		0x20	NetBSD "coprocessor" extensions */
/*				0x28	reserved */
/*				0x30	reserved */
/*				0x38	reserved */
/* #define	BPF_COPX	0x40	NetBSD "coprocessor" extensions */
/*					also used on BSD/OS */
/*				0x48	reserved */
/*				0x50	reserved */
/*				0x58	reserved */
/*				0x60	reserved */
/*				0x68	reserved */
/*				0x70	reserved */
/*				0x78	reserved */
#define		BPF_TXA		0x80
/*				0x88	reserved */
/*				0x90	reserved */
/*				0x98	reserved */
/*				0xa0	reserved */
/*				0xa8	reserved */
/*				0xb0	reserved */
/*				0xb8	reserved */
/*				0xc0	reserved; used on BSD/OS */
/*				0xc8	reserved */
/*				0xd0	reserved */
/*				0xd8	reserved */
/*				0xe0	reserved */
/*				0xe8	reserved */
/*				0xf0	reserved */
/*				0xf8	reserved */

/*
 * The instruction data structure.
 */
struct bpf_insn {
	u_short	code;
	u_char 	jt;
	u_char 	jf;
	bpf_u_int32 k;
};

---------------------------------------------------------------

In some sense, the types of packets that BPF is working on are "stateless". That
is, even if the state of your machine matters when dealing with TCP packets, 
figuring out whether or not a new packet is TCP doesn't depend on any previously 
received packets.

To put in different words, BPF is only designed to work on one packet at time 
while throwing away anything it discovered about the previous packet(s).

This may cause a few difficulties for me, since Galapagos kernels are typically
streaming, and may not feel the need to tack a header on everything.