Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging Quality-of-Life Issues #907

Closed
coryrc opened this issue Nov 26, 2019 · 13 comments
Closed

Logging Quality-of-Life Issues #907

coryrc opened this issue Nov 26, 2019 · 13 comments
Labels
area/test-and-release enhancement New feature or request kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@coryrc
Copy link
Contributor

coryrc commented Nov 26, 2019

A while ago, it was chosen to use testing.T logging for our tests, primarily for aesthetic reasons.

Current Problems

  1. Cannot add trace information without polluting the logs
  2. Log output from logging library should be optional #441 (Irrelevant logging)
  3. eventing#1894 (Logs lost on timeouts)
  4. The libraries our dependencies use do not log to the same places or the same way
  5. We cannot get debug information for parallel test runs appropriately

Proposed Solution

Tests should log with logr semantics into the Zap logger, along with minimal wrappers to cut down on verbosity in tests.

Why?

  1. We already have the Zap logger used for some parts and it provides good backend support
  2. The Zap logger is not global by default, like glog & klog
  3. go-logr semantics are minimalist and fitting with the Go philosophy

It addresses the five pain points:

  1. Verbosity is just a scale from 0 to 10. Trace messages can be added to aid debugging but not harm normal usage.
  2. Verbosity level again.
  3. Zap logger syncs output appropriately
  4. Calls to klog & glog libraries in our dependencies can be redirected to zap just like Istio.
  5. Zap logger outputs to multiple sinks simultaneously, printing immediately while conserving the ability to save output for JUnit.

Enables More Improvement

I wish to expand and improve the conformance tests. Our current logging limitations make it very difficult to have the tests present conformance results well while also making them useful to determine how to change an implementation to make it conform.

@knative-prow-robot knative-prow-robot added area/test-and-release kind/bug Categorizes issue or PR as related to a bug. labels Nov 26, 2019
@coryrc
Copy link
Contributor Author

coryrc commented Nov 26, 2019

Solution Implementation

Demonstration unit test.

Prints something like this if --verbosity=5: [0]

=== RUN   TestTLogger
2019-11-26T15:47:32.543-0800	LEVEL(-4)	TestTLogger	Should be printed!
=== RUN   TestTLogger/A-Nice-Subtest
2019-11-26T15:47:32.543-0800	INFO	TestTLogger.TestTLogger/A-Nice-Subtest	This is pretty important; everyone needs to see it!	{"some pointer": {}, "some number": 42}
=== RUN   TestTLogger/A-Nested-Subtest
--- PASS: TestTLogger (0.00s)
    --- PASS: TestTLogger/A-Nested-Subtest (0.00s)
    --- PASS: TestTLogger/A-Nice-Subtest (0.00s)
PASS
ok  	knative.dev/pkg/test/logging	0.046s

The very light wrapper around Zap and testing.T.

Test changes needed:

  1. Each test must initialize a logger and it is used instead of t.
  2. In tests, t.Log/t.Logf changes to t.V(level).Log() and uses structured logging instead of format strings.
  3. Lots of 3-line if err != nil { t.Error/Fatal(...) } turn into one-line t.Error/Fatal(err,...)

[0] Note it doesn't exactly do this in PoC right now

@coryrc
Copy link
Contributor Author

coryrc commented Nov 26, 2019

/cc @dgerd
/cc @mattmoor
/cc @vaikas
/cc @chizhg
/cc @chaodaiG
/assign @coryrc

@mattmoor
Copy link
Member

cc @Harwayne

@mattmoor
Copy link
Member

A while ago, it was chosen to use testing.T logging for our tests, primarily for aesthetic reasons.

This isn't accurate, we did it partly because the alternative that existed couldn't run in t.Parallel without tripping -race all over.

@coryrc
Copy link
Contributor Author

coryrc commented Nov 27, 2019

A while ago, it was chosen to use testing.T logging for our tests, primarily for aesthetic reasons.

This isn't accurate, we did it partly because the alternative that existed couldn't run in t.Parallel without tripping -race all over.

I didn't find any reasoning in the change PRs/issues, so I DM'd the author and that's what they remembered. Provided this issue can be addressed, how do you feel about this proposal?

@mattmoor
Copy link
Member

Well, then perhaps we're talking about different changes because I switched all the e2e tests to use t.Logf in serving 🤷‍♂

Honestly, so long as we can maintain t.Parallel, I don't particularly mind.

@coryrc
Copy link
Contributor Author

coryrc commented Dec 4, 2019

Note: out of scope is capturing klog output to redirect to JUnit (because klog is global and Go does not make it easy to make threadlocal variables for goroutines).

@adrcunha adrcunha added enhancement New feature or request and removed kind/bug Categorizes issue or PR as related to a bug. labels Dec 5, 2019
@adrcunha
Copy link
Contributor

adrcunha commented Dec 5, 2019

Regarding addressing of pain point 2, how can one use that on Prow? Does it need to be hardcoded in the job?

Regarding addressing of pain point 3, do we have a PoC that if a test times out on Prow we really don't loose logs?

Note: out of scope is capturing klog output to redirect to JUnit (because klog is global and Go does not make it easy to make threadlocal variables for goroutines).

How does it affect the proposed addressing of pain points 4 and 5?

@adrcunha adrcunha changed the title Fix logging Quality of Life issues: standardize on go-logr usage. Standardize on go-logr usage Dec 5, 2019
@knative-prow-robot knative-prow-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 5, 2019
@adrcunha adrcunha removed the kind/bug Categorizes issue or PR as related to a bug. label Dec 5, 2019
@coryrc
Copy link
Contributor Author

coryrc commented Dec 5, 2019

Regarding addressing of pain point 2, how can one use that on Prow? Does it need to be hardcoded in the job?

All our logging libraries allow setting verbosity levels, they just weren't plumbed together. I'm making a flag that controls it so users (including Prow) can set verbosity at whatever level they wish.

Regarding addressing of pain point 3, do we have a PoC that if a test times out on Prow we really don't loose logs?

It's an explicit feature of testing.T.Log() method that it doesn't print anything until the test completes, so pretty easy to avoid.

Note: out of scope is capturing klog output to redirect to JUnit (because klog is global and Go does not make it easy to make threadlocal variables for goroutines).

How does it affect the proposed addressing of pain points 4 and 5?

All logging will be copied to a separate file (optionally, and intended for Prow), so it's at least collected. Non-parallel runs (and single-test debugging sessions) will have all their logging consecutive.
Using gotestsum to collect results on Prow may get all the output into JUnit and I intend to try it, but this PR is more about the test code framework rather than what to do with the output.

@coryrc coryrc changed the title Standardize on go-logr usage Logging Quality-of-Life Issues Dec 10, 2019
@knative-prow-robot knative-prow-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 10, 2019
coryrc added a commit to coryrc/pkg that referenced this issue Dec 10, 2019
Working to introduce structured logging to our tests. See knative#907

This work allows these test functions to be called by objects other
than *testing.T. The t.Error() calls are made compatible with
structured logging (wrapping Zap sugared logger calls) or code using
testing.T.
knative-prow-robot pushed a commit that referenced this issue Dec 13, 2019
…935)

Working to introduce structured logging to our tests. See #907

This work allows these test functions to be called by objects other
than *testing.T. The t.Error() calls are made compatible with
structured logging (wrapping Zap sugared logger calls) or code using
testing.T.
@grantr
Copy link
Contributor

grantr commented Jan 14, 2020

@coryrc is this fixed now with #935?

@coryrc
Copy link
Contributor Author

coryrc commented Jan 14, 2020

No, 935 is just a small step toward enabling it. I'm using the serving conformance tests to validate the idea, see knative/serving#6264 and intend to have something to present to serving working group Real Soon Now (probably next week).

@coryrc
Copy link
Contributor Author

coryrc commented May 28, 2020

knative/serving#6976 and knative/serving#6978 were to be big steps toward doing this.

Using gotestsum would also address a subset of these issues; install it in https://github.com/knative/test-infra/blob/master/images/prow-tests/Dockerfile then update the shell scripts where they call go test to call gotestsum instead.
/unassign

@github-actions
Copy link
Contributor

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test-and-release enhancement New feature or request kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

5 participants