Logging Quality-of-Life Issues #907

coryrc · 2019-11-26T23:31:36Z

A while ago, it was chosen to use testing.T logging for our tests, primarily for aesthetic reasons.

Current Problems

Cannot add trace information without polluting the logs
Log output from logging library should be optional #441 (Irrelevant logging)
eventing#1894 (Logs lost on timeouts)
The libraries our dependencies use do not log to the same places or the same way
We cannot get debug information for parallel test runs appropriately

Proposed Solution

Tests should log with logr semantics into the Zap logger, along with minimal wrappers to cut down on verbosity in tests.

Why?

We already have the Zap logger used for some parts and it provides good backend support
The Zap logger is not global by default, like glog & klog
go-logr semantics are minimalist and fitting with the Go philosophy

It addresses the five pain points:

Verbosity is just a scale from 0 to 10. Trace messages can be added to aid debugging but not harm normal usage.
Verbosity level again.
Zap logger syncs output appropriately
Calls to klog & glog libraries in our dependencies can be redirected to zap just like Istio.
Zap logger outputs to multiple sinks simultaneously, printing immediately while conserving the ability to save output for JUnit.

Enables More Improvement

I wish to expand and improve the conformance tests. Our current logging limitations make it very difficult to have the tests present conformance results well while also making them useful to determine how to change an implementation to make it conform.

coryrc · 2019-11-26T23:51:56Z

Solution Implementation

Demonstration unit test.

Prints something like this if --verbosity=5: [0]

=== RUN   TestTLogger
2019-11-26T15:47:32.543-0800	LEVEL(-4)	TestTLogger	Should be printed!
=== RUN   TestTLogger/A-Nice-Subtest
2019-11-26T15:47:32.543-0800	INFO	TestTLogger.TestTLogger/A-Nice-Subtest	This is pretty important; everyone needs to see it!	{"some pointer": {}, "some number": 42}
=== RUN   TestTLogger/A-Nested-Subtest
--- PASS: TestTLogger (0.00s)
    --- PASS: TestTLogger/A-Nested-Subtest (0.00s)
    --- PASS: TestTLogger/A-Nice-Subtest (0.00s)
PASS
ok  	knative.dev/pkg/test/logging	0.046s

The very light wrapper around Zap and testing.T.

Test changes needed:

Each test must initialize a logger and it is used instead of t.
In tests, t.Log/t.Logf changes to t.V(level).Log() and uses structured logging instead of format strings.
Lots of 3-line if err != nil { t.Error/Fatal(...) } turn into one-line t.Error/Fatal(err,...)

[0] Note it doesn't exactly do this in PoC right now

coryrc · 2019-11-26T23:54:14Z

/cc @dgerd
/cc @mattmoor
/cc @vaikas
/cc @chizhg
/cc @chaodaiG
/assign @coryrc

mattmoor · 2019-11-26T23:54:42Z

cc @Harwayne

mattmoor · 2019-11-26T23:55:55Z

A while ago, it was chosen to use testing.T logging for our tests, primarily for aesthetic reasons.

This isn't accurate, we did it partly because the alternative that existed couldn't run in t.Parallel without tripping -race all over.

coryrc · 2019-11-27T00:31:07Z

A while ago, it was chosen to use testing.T logging for our tests, primarily for aesthetic reasons.

This isn't accurate, we did it partly because the alternative that existed couldn't run in t.Parallel without tripping -race all over.

I didn't find any reasoning in the change PRs/issues, so I DM'd the author and that's what they remembered. Provided this issue can be addressed, how do you feel about this proposal?

mattmoor · 2019-11-27T02:29:54Z

Well, then perhaps we're talking about different changes because I switched all the e2e tests to use t.Logf in serving 🤷‍♂

Honestly, so long as we can maintain t.Parallel, I don't particularly mind.

coryrc · 2019-12-04T05:00:12Z

Note: out of scope is capturing klog output to redirect to JUnit (because klog is global and Go does not make it easy to make threadlocal variables for goroutines).

adrcunha · 2019-12-05T16:07:34Z

Regarding addressing of pain point 2, how can one use that on Prow? Does it need to be hardcoded in the job?

Regarding addressing of pain point 3, do we have a PoC that if a test times out on Prow we really don't loose logs?

Note: out of scope is capturing klog output to redirect to JUnit (because klog is global and Go does not make it easy to make threadlocal variables for goroutines).

How does it affect the proposed addressing of pain points 4 and 5?

coryrc · 2019-12-05T18:29:08Z

Regarding addressing of pain point 2, how can one use that on Prow? Does it need to be hardcoded in the job?

All our logging libraries allow setting verbosity levels, they just weren't plumbed together. I'm making a flag that controls it so users (including Prow) can set verbosity at whatever level they wish.

Regarding addressing of pain point 3, do we have a PoC that if a test times out on Prow we really don't loose logs?

It's an explicit feature of testing.T.Log() method that it doesn't print anything until the test completes, so pretty easy to avoid.

Note: out of scope is capturing klog output to redirect to JUnit (because klog is global and Go does not make it easy to make threadlocal variables for goroutines).

How does it affect the proposed addressing of pain points 4 and 5?

All logging will be copied to a separate file (optionally, and intended for Prow), so it's at least collected. Non-parallel runs (and single-test debugging sessions) will have all their logging consecutive.
Using gotestsum to collect results on Prow may get all the output into JUnit and I intend to try it, but this PR is more about the test code framework rather than what to do with the output.

Working to introduce structured logging to our tests. See knative#907 This work allows these test functions to be called by objects other than *testing.T. The t.Error() calls are made compatible with structured logging (wrapping Zap sugared logger calls) or code using testing.T.

…935) Working to introduce structured logging to our tests. See #907 This work allows these test functions to be called by objects other than *testing.T. The t.Error() calls are made compatible with structured logging (wrapping Zap sugared logger calls) or code using testing.T.

grantr · 2020-01-14T01:03:21Z

@coryrc is this fixed now with #935?

coryrc · 2020-01-14T01:50:04Z

No, 935 is just a small step toward enabling it. I'm using the serving conformance tests to validate the idea, see knative/serving#6264 and intend to have something to present to serving working group Real Soon Now (probably next week).

coryrc · 2020-05-28T03:53:56Z

knative/serving#6976 and knative/serving#6978 were to be big steps toward doing this.

Using gotestsum would also address a subset of these issues; install it in https://github.com/knative/test-infra/blob/master/images/prow-tests/Dockerfile then update the shell scripts where they call go test to call gotestsum instead.
/unassign

github-actions · 2020-08-26T03:59:17Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

knative-prow-robot added area/test-and-release kind/bug Categorizes issue or PR as related to a bug. labels Nov 26, 2019

knative-prow-robot assigned coryrc Nov 26, 2019

adrcunha added enhancement New feature or request and removed kind/bug Categorizes issue or PR as related to a bug. labels Dec 5, 2019

adrcunha changed the title ~~Fix logging Quality of Life issues: standardize on go-logr usage.~~ Standardize on go-logr usage Dec 5, 2019

knative-prow-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 5, 2019

adrcunha removed the kind/bug Categorizes issue or PR as related to a bug. label Dec 5, 2019

coryrc changed the title ~~Standardize on go-logr usage~~ Logging Quality-of-Life Issues Dec 10, 2019

knative-prow-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 10, 2019

coryrc mentioned this issue Dec 10, 2019

Switch *testing.T uses to equivalent interface + structured logging #935

Merged

coryrc mentioned this issue Jan 30, 2020

Backend for improving test logging #1022

Merged

coryrc mentioned this issue Feb 23, 2020

Developer productivity: logs lost when timeout in tests knative/eventing#1894

Closed

knative-prow-robot unassigned coryrc May 28, 2020

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 26, 2020

github-actions bot closed this as completed Sep 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logging Quality-of-Life Issues #907

Logging Quality-of-Life Issues #907

coryrc commented Nov 26, 2019 •

edited

Loading

coryrc commented Nov 26, 2019

coryrc commented Nov 26, 2019

mattmoor commented Nov 26, 2019

mattmoor commented Nov 26, 2019

coryrc commented Nov 27, 2019

mattmoor commented Nov 27, 2019

coryrc commented Dec 4, 2019

adrcunha commented Dec 5, 2019

coryrc commented Dec 5, 2019

grantr commented Jan 14, 2020

coryrc commented Jan 14, 2020

coryrc commented May 28, 2020

github-actions bot commented Aug 26, 2020

Logging Quality-of-Life Issues #907

Logging Quality-of-Life Issues #907

Comments

coryrc commented Nov 26, 2019 • edited Loading

Current Problems

Proposed Solution

Enables More Improvement

coryrc commented Nov 26, 2019

Solution Implementation

coryrc commented Nov 26, 2019

mattmoor commented Nov 26, 2019

mattmoor commented Nov 26, 2019

coryrc commented Nov 27, 2019

mattmoor commented Nov 27, 2019

coryrc commented Dec 4, 2019

adrcunha commented Dec 5, 2019

coryrc commented Dec 5, 2019

grantr commented Jan 14, 2020

coryrc commented Jan 14, 2020

coryrc commented May 28, 2020

github-actions bot commented Aug 26, 2020

coryrc commented Nov 26, 2019 •

edited

Loading