The objective of this exercise is to write a P4 program that enables a host to monitor the utilization of all links in the network. This exercise builds upon the basic IPv4 forwarding exercise so be sure to complete that one before attempting this one. Specifically, we will modify the basic P4 program to process a source routed probe packet such that it is able to pick up the egress link utilization at each hop and deliver it to a host for monitoring purposes.
Our probe packet will contain the following three header types:
// Top-level probe header, indicates how many hops this probe
// packet has traversed so far.
header probe_t {
bit<8> hop_cnt;
}
// The data added to the probe by each switch at each hop.
header probe_data_t {
bit<1> bos;
bit<7> swid;
bit<8> port;
bit<32> byte_cnt;
time_t last_time;
time_t cur_time;
}
// Indicates the egress port the switch should send this probe
// packet out of. There is one of these headers for each hop.
header probe_fwd_t {
bit<8> egress_spec;
}
We will use the pod-topology for this exercise, which consists of four hosts connected to four switches that are wired up as they would be in a single pod of a fat tree topology.
In order to monitor the link utilization our switch will maintain two register arrays:
byte_cnt_reg
- counts the number of bytes transmitted out of each port since the last probe packet was transmitted out of the port.last_time_reg
- stores the last time that a probe packet was transmitted out of each port.
Our P4 program will be written for the V1Model architecture implemented on P4.org's bmv2 software switch. The architecture file for the V1Model can be found at: /usr/local/share/p4c/p4include/v1model.p4. This file desribes the interfaces of the P4 programmable elements in the architecture, the supported externs, as well as the architecture's standard metadata fields. We encourage you to take a look at it.
Spoiler alert: There is a reference solution in the
solution
sub-directory. Feel free to compare your implementation to the reference.
The directory with this README contains a skeleton P4 program,
link_monitor.p4
, which implements basic IPv4 forwarding, as well
as source routing of the probe packets. Your job will be to
extend this skeleton program to fill out the fields in the probe
packet.
Before that, let's compile and test the incomplete link_monitor.p4
program:
-
In your shell, run:
make run
This will:
- compile
link_monitor.p4
, and - start the pod-topo in Mininet and configure all switches with
the
link_monitor.p4
program + table entries, and - configure all hosts with the commands listed in pod-topo/topology.json
- compile
-
You should now see a Mininet command prompt. Open two terminals on
h1
:mininet> xterm h1 h1
-
In one of the xterms run the
send.py
script to start sending probe packets every second. Each of these probe packets takes the path indicated in link-monitor-topo.png../send.py
-
In the other terminal run the
receive.py
script to start receiving and parsing the probe packets. This allows us to monitor the link utilization within the network../receive.py
The reported link utilization and the switch port numbers will always be 0 because the probe fields have not been filled out yet.
- Run an iperf flow between h1 and h4:
mininet> iperf h1 h4
- Type
exit
to leave each xterm and the Mininet command line. Then, to stop mininet:And to delete all pcaps, build files, and logs:make stop
make clean
The measured link utilizations will not agree with what iperf reports because the probe packet fields have not been populated yet. Your goal is to fill out the probe packet fields so that the two measurements agree.
A P4 program defines a packet-processing pipeline, but the rules within each table are inserted by the control plane. When a rule matches a packet, its action is invoked with parameters supplied by the control plane as part of the rule.
In this exercise, we have already implemented the control plane
logic for you. As part of bringing up the Mininet instance, the
make run
command will install packet-processing rules in the tables of
each switch. These are defined in the sX-runtime.json
files, where
X
corresponds to the switch number.
Important: We use P4Runtime to install the control plane rules. The
content of files sX-runtime.json
refer to specific names of tables, keys, and
actions, as defined in the P4Info file produced by the compiler (look for the
file build/link_monitor.p4.p4info.txt
after executing make run
). Any
changes in the P4 program that add or rename tables, keys, or actions
will need to be reflected in these sX-runtime.json
files.
The link_monitor.p4
file contains a skeleton P4 program with key pieces of
logic replaced by TODO
comments. Your implementation should follow
the structure given in this file---replace each TODO
with logic
implementing the missing piece.
Here are a few more details about the design:
Parser
- The parser has been extended support parsing of the source routed probe packets. The parser is the most complicated part of the design so spend a bit of time reading over it. Note that it does not contain any TODO comments so there is nothing you need to change here.
- To parse the probe packets, we use the
hdr.probe.hop_cnt
to determine how many hops the packet has traversed prior to reaching the switch. If this is the first hop then there will not be anyprobe_data
in the packet so we skip that state and transition directly to theparse_probe_fwd
state. In theparse_probe_fwd
state, we use thehdr.probe.hop_cnt
field to figure out whichegress_spec
header field to use to perform forwarding and we save that port value into a metadata field which is subsequently used to perform forwarding.
Ingress Control
- The ingress control block looks very similar to the
basic
exercise. The only difference is that theapply
block contains another condition to forward probe packets using theegress_spec
field extracted by the parser. It also increments thehdr.probe.hop_cnt
field.
Egress Control
- This is where the interesting stateful processing occurs. It uses the
byte_cnt_reg
register to count the number of bytes that have passed through each port since the last probe packet passed through the port. - It adds a new
probe_data
header to the packet and filld out thebos
(bottom of stack) field, as well as theswid
(switch ID) field. - TODO: your job is to fill out the rest of the probe packet fields in order to ensure that you can properly measure link utilization.
Deparser
- Simply emits all headers in the correct order.
- Note that emitting a header stack will only emit the headers within the stack that are actually marked as valid.
Follow the instructions from Step 1. This time, the measured link
utilizations should agree with what iperf
reports.
There are several problems that might manifest as you develop your program:
-
link_monitor.p4
might fail to compile. In this case,make run
will report the error emitted from the compiler and halt. -
link_monitor.p4
might compile but fail to support the control plane rules in thes1-runtime.json
throughs4-runtime.json
files thatmake run
tries to install using P4Runtime. In this case,make run
will report errors if control plane rules cannot be installed. Use these error messages to fix yourlink_monitor.p4
implementation. -
link_monitor.p4
might compile, and the control plane rules might be installed, but the switch might not process packets in the desired way. Thelogs/sX.log
files contain detailed logs that describing how each switch processes each packet. The output is detailed and can help pinpoint logic errors in your implementation.
In the latter two cases above, make run
may leave a Mininet instance
running in the background. Use the following command to clean up
these instances:
make stop
Now that you've implemented this basic monitoring framework can you think of ways to leverage this information about link utilization within the core of the network? For instance, how might you use this data, either at the hosts or at the switches, to make real-time load-balancing decisions?
The documentation for P4_16 and P4Runtime is available here
All excercises in this repository use the v1model architecture, the documentation for which is available at: