Skip to content

Main Controller

Jordi Fornt edited this page Oct 1, 2024 · 2 revisions

The Main Controller is the block that orchestrates the different data flows in the accelerator, controls the execution of the different counters and stalls the pipeline when necessary.

The module is composed of three distinct blocks: the Feeders FSM controls the execution of the Data Feeder, and the Weight Fetcher, and their interaction. It stalls the feeding process when any FIFO becomes full, stalls the pipeline when any FIFO becomes empty, and deals with the interaction of both modules with each other. The Context FSM controls the interactions between the full computation pipeline (Data Feeder, Weight Fetcher plus systolic array) and the output extraction flow. Finally, the Context Switch Controller is in charge of precisely generating the context switch signals towards the systolic array that trigger the swap between the accumulator and the reserve register (see Systolic Array).

Feeders FSM

The Feeders FSM organizes the execution of the feeders, which develops as follows. The data feeder and weight fetcher are activated at the beginning of the computation, after the first preload values have been shifted into the array by the Output Buffer. As the feeders take some cycles to push values to the FIFOs, the FSM waits until all FIFOs from both feeders are non-empty, plus some additional cycles before raising the pipeline enable signal. After this, the Feeders FSM waits until both feeders have finished their respective readout sequences (see Data Feeder). This includes tiling loops, meaning that when the feeders are done, there are no more inputs in the SRAM to be used. After that, the FSM will wait until all FIFOs are empty and all values have been consumed by the array.

After the Final Push stage is complete, the Feeders FSM will wait until all values have been consumed by the array and all FIFOs are empty. After that, it will wait until the Context FSM resets it to start a new computation. The figure below shows the flow diagram of the FSM, which implements all the aforementioned steps.

On top of controlling the different feeding stages, the Feeders FSM module also includes a lower level control circuit in order to stall the computational pipeline when necessary. The general rule for when the pipeline is enabled is the following: the pipeline gate signal from the Context FSM is high, the pipeline gate from the Feeders FSM is high and not a single FIFO is empty. If any of these conditions does not hold, the computation is stalled. There are, however, some exceptions:

  • During the FIFO emptying phases, the state of the FIFOs is not checked (there can and should be empty FIFOs).
  • If any feeder finishes reading all data before reaching the FEEDING state on the FSM, the FIFOs on that feeder are allowed to be empty, to prevent deadlock situations. This happens when there is very few data to read, in which case it is fine to let the pipeline continue and introduce some zeros after the last values.

Context Switch Controller

The Context Switch Controller module fulfills two critical roles. On one side, it counts the number of MAC operations performed by the array. When the number of operations reaches the total number of MACs per context (passed as a configuration parameter), it tells the Context FSM that a context switch is imminent. In order to properly perform the count, the block emulates the behavior of the PE at the position [0,0] by taking the pipeline_en and other control signals into account. On the other side, upon a MAC counter overflow and when enabled by the Context FSM, it generates the sequence that controls the context switch signals. The detailed schematic of this module can be found at the end of this page.

Context FSM

The Context FSM fulfills the role of integrating the array computation, preload insertion, output extraction and context switch. It controls and monitors the Feeders FSM, Scan FSM and Context Switch Controller. The main idea behind the control sequence for these elements as well as the most important control and feedback signals are depicted in the figure below.

At the very start of a computation, triggered by the host, the first preload values are read, shifted into the reserve registers of the array and propagated to the accumulators. When this is done (signaled by first_rdy from the Scan FSM), the Feeders FSM is started, and the array starts computing as soon as enough values are available at the FIFOs. Two more preload reads are triggered at this point in order to fill all the register levels with inputs (see Partial Sums Manager). When the current context finishes (detected by the Context Switch Controller, which raises the cdone signal), and the context switch has propagated through the whole array, the Scan FSM is triggered to perform the output extraction and preload insertion procedure.

If everything runs smoothly, the array will compute all contexts without stalling due to the Context FSM (even though it can stall for other reasons, mainly due to the Feeders FSM), as shown in the figure below. This is generally true when the computation has a large enough number of operands. However, in general, there are a lot of contingencies to take into account. Consider for example the following cases:

  1. The current context finishes before the Scan FSM is done fetching preloads.
  2. The current context finishes in the middle of a shift.
  3. The current context finishes in the middle of a context switch.

These are some of the difficult cases that the Context FSM must be able to solve without breaking the computation. For instance, (1) could be solved by stalling the computational pipeline until the Scan FSM is done. However, (2) and (3) require a different response, since both the shift-chain and the context switch propagation are gated by the pipeline enable as well. In these cases, a "soft-stall" is performed, in which the FIFOs stop popping values and their outputs are overridden with zeros. This allows to continue shifting the pipeline without modifying the current MAC results. Even though most of these cases correspond to inefficient workloads that one would like to avoid in practice, it is important for correctness that the accelerator computation works in all cases, even when the workload is inefficient.

The control sequence and all the necessary contingencies are enforced by two FSMs that interact with each other. The Stalls FSM (see figure below) is a helper FSM that decides if a "hard-stall" of the array and feeders (setting the pipeline enable to low) must be applied and holds it until the array is ready to accept more data.

On the other hand, the Main FSM (or just Context FSM, see figure below) keeps track of the system state and enforces the required actions and contingencies in every contemplated case.

Clone this wiki locally