Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid race condition when deleting HNS networks #5336

Merged
merged 1 commit into from
Feb 8, 2024

Conversation

manuelbuil
Copy link
Contributor

@manuelbuil manuelbuil commented Jan 30, 2024

Proposed Changes

Check that interfaces are back if the HNS network was deleted before moving to HNS Network creation

Types of Changes

Bugfix

Verification

This is a race condition that really depends on how fast your windows system is able to "recover" the interface after deleting the HNS Network. For me it was impossible to reproduce. However, a basic verification would be:

1 - Deploy rke2 server with calico
2 - Deploy rke2-agent on windows
3 - Once everything is up, Stop-Service rke2 and C:\usr\local\bin\rke2.exe agent service --delete
4 - Verify that there is at least one HNS Network: get-hnsnetwork
5 - Start the rke2-agent on windows again with debug: true (remember to remove the node first or it will complain about password already there)

You should at least see the messages:

 Deleting network: XXXXXX before starting calico"

And

Calico is waiting for the interface with ip: XXXXXX to come back

And it should finally succeed to connect to the server

Testing

Linked Issues

#5335

User-Facing Change


Further Comments

@manuelbuil manuelbuil requested a review from a team as a code owner January 30, 2024 18:03
@@ -283,7 +283,7 @@ func (c *Calico) Start(ctx context.Context) error {
// generateCalicoNetworks creates the overlay networks for internode networking
func (c *Calico) generateCalicoNetworks() error {
if err := deleteAllNetworks(); err != nil {
return err
return fmt.Errorf("error deleting all networks before bootstrapping calico: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For wrapping errors, the convention in this file seems to be:

Suggested change
return fmt.Errorf("error deleting all networks before bootstrapping calico: %v", err)
return errors.Wrap(err, "failed to delete all networks before bootstrapping calico")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are both options present in this file... let me change all to errors.Wrap

@manuelbuil manuelbuil force-pushed the avoidRaceConditionHNS branch 2 times, most recently from 5b20bfa to d6180d9 Compare January 31, 2024 10:00
Signed-off-by: Manuel Buil <mbuil@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants