Downstream next event tag (DNET), a new signal for more efficient centralized federated execution #349

byeonggiljun · 2024-02-05T20:33:07Z

This PR replaces #176 and #337.

Relevant discussion and issue: lf-lang/lingua-franca#1626, #264

A companion PR in lingua-franca: lf-lang/lingua-franca#2400
A companion PR in reactor-ts: lf-lang/reactor-ts#296

In this PR, the new message type, downstream next event tag (DNET), is introduced to reduce unnecessary NET signals. This signal is sent from the RTI to a federate to notify it that NET tags below some threshold are unnecessary, and not needed by its downstream federates.

Suppose the RTI is computing the DNET tag for a federate A where A has one downstream federate B. The RTI
uses the minimum delay between the federates and B's NET value to compute the DNET tag. For example, suppose the delay from A to B is NEVER and A has an event every 10 ms (0, 10 ms, ...). Also, assume B's next event is at (100 ms, 0). The RTI knows B's next earliest event tag by finding the minimum tag of the most recent NET_B and the head of the in-transit message queue of B is (100 ms, 0).

In this case, the RTI can send DNET (100 ms, 0) to A. Based on the DNET signal, if A is not producing outputs, A does not need to send any NET signals with tags earlier than or equal to (100 ms, 0) (NET with 10 ms, 20 ms, ..., 100 ms) because the RTI cannot send TAG (100 ms, 0) using those NET signals. The RTI can grant TAG (100 ms, 0) to B after receiving NET (110 ms, 0) from A.

Additionally, in this PR, the RTI computes the tag of the signal tag advance grant (TAG) with the earliest incoming message tag (EIMT) to prevent blocking a federate that is eligible to advance its tag. Currently, if a federate has its next event scheduled at (100 ms, 0) and its EIMT is (200 ms, 1), the RTI sends TAG (100 ms, 0). However, this incurs unnecessary NET signals if the federate has an event at the tag (200 ms, 0). So in this PR, the RTI sends TAG (200 ms, 0) (the latest earliest tag of EIMT) so that the federate can execute any potential events with tags earlier than (200 ms, 1) as well as the event at (100 ms, 0).

This PR also contains some refactoring for the data structure used by the RTI to store delays between federates. A two-dimensional matrix stores the delays now.

…ownstreams

…ery possible connection This is a preparation step for the DNET message calculation. This matrix enables each federate to search min_delays from upstreams as well as to downstreams. Use a matrix instead of multiple arrays to store minimum delays of every possible connection This is a preparation step for the DNET message calculation. This matrix enables each federate to search min_delays from upstreams as well as to downstreams.

…he current tag

edwardalee

I need more time to review this, but here is a starting point. This is looking very good so far. I have a few suggestions, some of which are just nits and you should feel free to ignore them. I will follow up with a more complete review as soon as I can.

core/federated/RTI/main.c

core/federated/RTI/rti_common.c

core/federated/RTI/rti_common.h

Co-authored-by: Edward A. Lee <eal@eecs.berkeley.edu>

core/tag.c

include/core/federated/network/net_common.h

edwardalee · 2024-10-13T15:45:44Z

The example given in the PR comment is confusing. It says:

For example, if the delay from A to B is NEVER and B's next event is scheduled at (100 ms, 0), the RTI can send DNET (100 ms, 0) to A; if A sends NET (100 ms, 0) to the RTI, the RTI cannot send TAG (100 ms, 0), and if A sends NET (100 ms, 1), the RTI can grant TAG (100 ms, 0) to B.

What does it mean "B's next event is scheduled at (100 ms, 0)"? I think you mean something like "The most recently received NET from B is (100 ms, 0)"? Also, doesn't this have to be transitive NET?

The rest of the example is irrelevant to this PR. The statement it makes is true without this PR and has always been true. Can you instead give an example that actually uses this PR?

edwardalee

I'm worried about the n^2 complexity and memory cost. I've added some suggestions, but I think the all_upstream, all_downstream, and min_delay arrays all have n^2 complexity in the worst case and they seem redundant. I'm suggesting consolidating to just use the min_delay array for all these purposes.

core/federated/RTI/main.c

core/federated/RTI/rti_common.c

core/federated/RTI/rti_common.h

byeonggiljun · 2024-10-14T00:05:08Z

Hello, @edwardalee.

What does it mean "B's next event is scheduled at (100 ms, 0)"? I think you mean something like "The most recently received NET from B is (100 ms, 0)"? Also, doesn't this have to be transitive NET?

What I meant was "the minimum tag of the most recent NET_B and the head of the in-transit message queue of B is (100 ms, 0)" as the RTI uses in-transit message queue as well as NET from B to predict B's next earliest event. I'll modify the comment to clarify it.

Also, doesn't this have to be transitive NET?

I don't get what transitive NET is. Could you please elaborate on it?

The rest of the example is irrelevant to this PR. The statement it makes is true without this PR and has always been true. Can you instead give an example that actually uses this PR?

I tried to explain how a DNET signal is used in the example. Could you read it and share your thoughts?

edwardalee · 2024-10-14T15:56:05Z

OK, the description is much better now. I would suggest a slight enhancement. Instead of:

"Based on the DNET signal, A does not send any NET signals with tags earlier than or equal to (100 ms, 0) (NET with 10 ms, 20 ms, ..., 100 ms)"

you could say:

"Based on the DNET signal, if A is not producing outputs, A does not need to send any NET signals with tags earlier than or equal to (100 ms, 0) (NET with 10 ms, 20 ms, ..., 100 ms)"

…eature

byeonggiljun and others added 21 commits January 31, 2024 14:14

Prepare for adding a new message type, Downstream Next Event Tag (DNET)

98a569d

Rename upstream and downstream to immediate_upstreams and immediate_d…

af3d083

…ownstreams

Merge branch 'main' into rti-DNET

86f3d50

Save IDs of all downstreams for faster access to the min_dleays matrix

6b37e36

Add a function for subtracting tags

eea1edf

Calculate and send DNET messages

b4900f3

Merge branch 'main' into rti-DNET

83fafe2

Remove unnecessary LTC and NET messages by DNET

42fe08f

Sends last skipped DNETs when sending T_MSGs

d3e462c

Merge branch 'main' into rti-DNET

3d4653c

Merge branch 'main' into rti-DNET

4f2eb91

Do not send DNET to upstreams of ZDC

9967d9a

Disable the RTI unit tests until it reflects the changes of the RTI

400d138

Send NET regardless of DNET if it needs TAG from the RTI

1719951

Using ID of the federate which sends NET to calculate DNET

a6e2b5d

Move DNET calculation function from tag.h to rti_common.h

8664909

Fix the calculation of DNET candidates

ccc9b8b

Re-enable the RTI unit tests

e9ae464

Merge branch 'main' into rti-DNET

24e1e3b

Merge main into branch rti-DNET

800b570

byeonggiljun added enhancement Enhancement of existing feature feature New feature federated labels Apr 22, 2024

Merge branch main into branch 'rti-DNET'

1c2777d

byeonggiljun force-pushed the rti-DNET branch from f266230 to 1c2777d Compare April 22, 2024 18:50

Fix a trace point for DNET

4ebd09e

byeonggiljun force-pushed the rti-DNET branch from 1241f87 to 8bda724 Compare April 23, 2024 20:59

Skip sending LTCs if a network input reaction has been scheduled at t…

9a9d120

…he current tag

byeonggiljun force-pushed the rti-DNET branch from 8bda724 to 9a9d120 Compare April 23, 2024 22:03

byeonggiljun changed the title ~~A new signal downstream next event tag (DNET) for more efficient centralized federated execution~~ Downstream next event tag (DNET), a new signal for more efficient centralized federated execution Aug 14, 2024

byeonggiljun added 2 commits August 14, 2024 16:19

Add some comments and remove a FIXME

a2ab5bc

Minor refactoring

5d52945

byeonggiljun force-pushed the rti-DNET branch 2 times, most recently from 121e21f to 79e2fdb Compare August 15, 2024 18:16

byeonggiljun marked this pull request as ready for review August 15, 2024 18:48

Update comments.

5e09b52

byeonggiljun force-pushed the rti-DNET branch from 79e2fdb to 5e09b52 Compare August 15, 2024 21:27

byeonggiljun requested review from edwardalee and lhstrh August 15, 2024 22:21

edwardalee requested changes Aug 23, 2024

View reviewed changes

byeonggiljun and others added 3 commits August 23, 2024 15:38

Apply suggestions from code review

530b0cd

Co-authored-by: Edward A. Lee <eal@eecs.berkeley.edu>

Move function lf_tag_latest_earlier to tag.c

2857ab8

Invalidate every node's delay information in invalidate_min_delays

4507043

edwardalee reviewed Aug 24, 2024

View reviewed changes

core/tag.c Outdated Show resolved Hide resolved

edwardalee reviewed Aug 24, 2024

View reviewed changes

include/core/federated/network/net_common.h Outdated Show resolved Hide resolved

byeonggiljun and others added 4 commits August 24, 2024 21:59

Apply suggestions from @edwardalee and minor fix

896e152

Run Clang format

0fab3b6

Merge branch 'main' into rti-DNET

2bfd833

Merge branch 'main' into rti-DNET

da80702

edwardalee requested changes Oct 13, 2024

View reviewed changes

byeonggiljun and others added 6 commits October 17, 2024 13:48

Merge branch 'main' into rti-DNET

05280d0

Remove the fields all_upstream and all_downstream

a757105

Clang format

adc0a1d

Turn on DNET signals by default and make the option to turn off the f…

c2f5316

…eature

Merge branch 'main' into rti-DNET

2b7cae5

Exclude the target node itself when computing DNET

7f81c7c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downstream next event tag (DNET), a new signal for more efficient centralized federated execution #349

Downstream next event tag (DNET), a new signal for more efficient centralized federated execution #349

byeonggiljun commented Feb 5, 2024 •

edited

Loading

edwardalee left a comment

edwardalee commented Oct 13, 2024

edwardalee left a comment

byeonggiljun commented Oct 14, 2024

edwardalee commented Oct 14, 2024

Downstream next event tag (DNET), a new signal for more efficient centralized federated execution #349

Are you sure you want to change the base?

Downstream next event tag (DNET), a new signal for more efficient centralized federated execution #349

Conversation

byeonggiljun commented Feb 5, 2024 • edited Loading

edwardalee left a comment

Choose a reason for hiding this comment

edwardalee commented Oct 13, 2024

edwardalee left a comment

Choose a reason for hiding this comment

byeonggiljun commented Oct 14, 2024

edwardalee commented Oct 14, 2024

byeonggiljun commented Feb 5, 2024 •

edited

Loading