-
-
Notifications
You must be signed in to change notification settings - Fork 19
Technical details
The bin
training data format used for this trainer is a modified version of the bin
training data format used by the official Stockfish team. The original format uses a block of 256 bit to store a position. However, for most variants with large boards and/or many piece types 256 bit are not sufficient to store a position, or it would at least require a very specialized format for each variant, which is undesirable in this context where a generic solution/format is more suitable.
In principle with up to 26 different piece types and up to 120 squares (12x10 board) plus arbitrarily many pieces in hand, a training data format being able to store positions of any configurable variant in Fairy-Stockfish would need to be very big. However, this is not very practical as >95% of variants in practice fit into 512 bit. Therefore as a pragmatic decision the training data format was decided to use 512 bit. The current code from the training data generator that packs a position into 512 bit is at https://github.com/ianfab/Fairy-Stockfish/blob/a7f2df622b1e5719307265f492786d7709dd8085/src/tools/sfen_packer.cpp#L169-L230. It is structured as follows:
- 1 bit for the side to move
- 2 * 7 bit for the king squares (7 bit are required to encode one square on a 12x10 board)
- 6 bit per non-king piece on the board (5 bit for the piece type and 1 bit for the color) and 1 bit per empty square, using huffman encoding (https://github.com/ianfab/Fairy-Stockfish/blob/a7f2df622b1e5719307265f492786d7709dd8085/src/tools/sfen_packer.cpp#L144-L163)
- 5 bit per piece type and color to store the number of pieces in hand, i.e., a number between 0 and 31
- 4 bit for castling rights
- 1 or 8 bit for the en passant square
- 6 bit for the half-move clock
- 8 bit for the full move counter
- 9 more bit related to half-move and fullmove counter
As you might see from this there is no easy rule when this is below 512 bit. You need to consider the board size, number of piece types, and number of pieces on board to calculate whether this is the case. However, the format designed in a way that e.g. shogi despite its large board, large number of pieces and piece types, as well as pieces in hand, still fits. The "worst case" example outlined in https://github.com/ianfab/Fairy-Stockfish/blob/a7f2df622b1e5719307265f492786d7709dd8085/src/tools/sfen_packer.cpp#L127-L136 would be shogi on a 12x10 board.
The Half(K)Av2 architecture in Fairy-Stockfish is based on HalfKAv2 from official Stockfish with several generalizations in the feature transformer:
As explained in the aforementioned link, the HalfKAv2 architecture has 64 * 11 input features per king square, i.e., one per piece type (6), color (2), and square (64), but with the two kings merged into one (2*6-1 = 11). In Fairy-Stockfish this is generalized for arbitrary board sizes and number of piece types, so the number of features per king square is RANKS * FILES * (2 * PIECE_TYPES - 1)
.
On top of that there can be additional features in case a variant uses pieces in hand. If this is the case, for each color and piece type except kings there are an additional 2 * FILES
number of features representing the count of pieces in hand. This choice was made since most drop variants have one pawn-like piece on each file, so even when one side has all pawns in hand (both his and his opponents), then the number equals twice the number of files. E.g., in shogi one can have at most 18 (2*9) pawns, or in crazyhouse at most 16 (2*8). So in the example of crazyhouse this adds 2 (colors) * 5 (non-king piece types) * 16 (max hand count)
number of features.
Since not only the number of features for a given king square, but also the number of king squares can vary, this is also taken account of. Usually the number of king squares simply equals the number of squares on the board. However, for variants where the king piece is limited to a certain area of the board, these accessible squares are mapped to a range corresponding to their number. E.g., for Xiangqi the palace has 9 squares, so the number of king squares is 9.
For variants that do not have any king-like piece type that has a constant count of 1, the number of king squares is set to 1. Therefore, this is referred to Half(K)Av2, since in such a case the overparametrization with king squares disappears. One additional modification is required in this scenario, because the optimization of merging the two king feature planes is no longer possible, since they would otherwise be indistinguishable due to the missing overparametrization. Therefore the RANKS * FILES * (2 * PIECE_TYPES - 1)
features are extended to RANKS * FILES * (2 * PIECE_TYPES)
.
So overall the number of input features for a given variant is:
INPUT_FEATURES = KING_SQUARES * [RANKS * FILES * (NUM_KSQ != 1 ? 2 * PIECE_TYPES - 1 : 2 * PIECE_TYPES) + (2 * FILES) * (2 * NON_KING_PIECE_TYPES)]
Also see the same calculation in https://github.com/ianfab/variant-nnue-pytorch/blob/master/halfka_v2.py.
Since the feature transformer weights consume a large part of the size of an NNUE file, the file size can be roughly estimated as the number of input features times the size of the first layer, which is 512 + 8 (for the PSQT part), times the size of one weight, which is 2 Byte (16 bit integer). Therefore, multiplying the above number of input features by 520 and by 2 Byte per weight gives a lower bound of the file size which should be close to the actual size. Neglecting the features of pieces in hand and the minor adjustments due to the merging of king features, this can be simplified to SIZE >= SQUARES * KING_SQUARES * PIECE_TYPES * 2080
as the size in bytes.