Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flake e2e: ACL Logging for NetworkPolicy when the namespace's ACL logging annotation is updated #4392

Closed
flavio-fernandes opened this issue May 24, 2024 · 0 comments · Fixed by #4393
Assignees
Labels

Comments

@flavio-fernandes
Copy link
Contributor

flavio-fernandes commented May 24, 2024

Which jobs are flaking?

ACL Logging for NetworkPolicy when the namespace's ACL logging annotation is updated

[It] the ACL logs are updated accordingly

.../ovn-kubernetes/test/e2e/acl_logging.go:121

Which tests are flaking?

• [FAILED] [21.581 seconds]
ACL Logging for NetworkPolicy when the namespace's ACL logging annotation is updated [It] the ACL logs are updated accordingly
/home/vagrant/ovn-kubernetes/test/e2e/acl_logging.go:121

  [FAILED] May 24 06:09:25.765: Timed out after 15.000s.
  Expected
      <bool>: false
  to be true
  In [It] at: /home/vagrant/ovn-kubernetes/test/e2e/acl_logging.go:130 @ 05/24/24 06:09:25.765

Since when has it been flaking?

Sorry, I don't know.

Reason for failure (if possible)

The test does this:

BeforeEach(func() {
By("poking some more...")
clientPod := pods[pokerPodIndex]
pokedPod := pods[pokedPodIndex]
framework.Logf(
"Poke pod %s (on node %s) from pod %s (on node %s)",
pokedPod.GetName(),
pokedPod.Spec.NodeName,
clientPod.GetName(),
clientPod.Spec.NodeName)
Expect(
pokePod(fr, clientPod.GetName(), pokedPod.Status.PodIP)).To(HaveOccurred(),
"traffic should be blocked since we only use a deny all traffic policy")
})
It("the ACL logs are updated accordingly", func() {
clientPodScheduledPodName := pods[pokerPodIndex].Spec.NodeName
composedPolicyNameRegex := fmt.Sprintf("NP:%s:%s", nsName, egressDefaultDenySuffix)
Eventually(func() (bool, error) {
return assertACLLogs(
clientPodScheduledPodName,
composedPolicyNameRegex,
denyACLVerdict,
updatedAllowACLLogSeverity)
}, maxPokeRetries*pokeInterval, pokeInterval).Should(BeTrue())
})

  • setNamespaceACLLogSeverity
  • It pokes the pods to generate the log
  • it loops waiting for the logs to be seen in ovn-controller

The issue is that there is no delay between the setting the acl log and poking, so in a slow vm it may
take a bit of time until ovn is fully configured with it and that may happen after the poke took place.

A proposed solution would be to make the poking also happen while waiting, so it gets generated as
expected.

Anything else we need to know?

It is a race in the test. I have found the issue and will be making a PR for it shortly. :)

To reproduce, these are the steps I took:

# Bring up cluster using kind.sh or kind-helm.sh

# It may be interesting to open a secondary shell and look at ovn-controller log.
# This particular test creates acl_logging on ovn-worker2

$ docker exec ovn-worker2 tail -F /var/log/openvswitch/ovn-controller.log
 
# on another shell, run this test in a loop. It should get the failure after a few
# loops:

$ cd test/e2e && \
  while : ; do \
  go test -v . -ginkgo.v \
  -ginkgo.focus 'the\sACL\slogs\sare\supdated\saccordingly' \
  -ginkgo.flake-attempts 1 -provider skeleton \
  -kubeconfig ${KUBECONFIG} --num-nodes=2 || break ; \
  echo --- ; done
@flavio-fernandes flavio-fernandes added the kind/ci-flake Flakes seen in CI label May 24, 2024
@flavio-fernandes flavio-fernandes self-assigned this May 24, 2024
flavio-fernandes added a commit to flavio-fernandes/ovn-kubernetes that referenced this issue May 24, 2024
Fixes waiting for ACL logging in a test where the namespace's
ACL logging level is updated.

To reproduce, use these steps:

  cd test/e2e && \
  while : ; do \
  go test -v . -ginkgo.v \
  -ginkgo.focus 'the\sACL\slogs\sare\supdated\saccordingly' \
  -ginkgo.flake-attempts 1 -provider skeleton \
  -kubeconfig ${KUBECONFIG} --num-nodes=2 || break ; \
  echo --- ; done

Fixes: ovn-org#4392

Signed-off-by: Flavio Fernandes <ffernandes@nvidia.com>
pperiyasamy pushed a commit to pperiyasamy/ovn-kubernetes that referenced this issue Jul 2, 2024
Fixes waiting for ACL logging in a test where the namespace's
ACL logging level is updated.

To reproduce, use these steps:

  cd test/e2e && \
  while : ; do \
  go test -v . -ginkgo.v \
  -ginkgo.focus 'the\sACL\slogs\sare\supdated\saccordingly' \
  -ginkgo.flake-attempts 1 -provider skeleton \
  -kubeconfig ${KUBECONFIG} --num-nodes=2 || break ; \
  echo --- ; done

Fixes: ovn-org#4392

Signed-off-by: Flavio Fernandes <ffernandes@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant