Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developer productivity: logs lost when timeout in tests #1894

Closed
aslom opened this issue Sep 16, 2019 · 24 comments
Closed

Developer productivity: logs lost when timeout in tests #1894

aslom opened this issue Sep 16, 2019 · 24 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Milestone

Comments

@aslom
Copy link
Member

aslom commented Sep 16, 2019

Describe the bug

Logs are lost when timeout in tests there is timeout: currently all I get is lot of stack traces but logging is lost - see stack traces in context below. If I suspect which test failed I can figure out the file and get the line - but logs files can help to see what was actually happening and what was the input

Expected behavior
A clear and concise description of what you expected to happen.

I was expecting to see log output such as

--- FAIL: TestMustPassTracingHeaders (0.25s)
--- FAIL: TestMustPassTracingHeaders/TestMustPassTracingHeaders-InMemoryChannel (0.41s)
channel_header_single_event_helper.go:43: Running header conformance test with channel "InMemoryChannel"
test_runner.go:80: namespace is : "test-must-pass-tracing-headers-in-memory-channel"
channel_header_single_event_helper.go:48: Creating channel
...
instead I only getting stack traces because of timeout and no idea how far test got and where it got stack as there is no logging ...

To Reproduce
Steps to reproduce the behavior.

Run test that blocks - to reproduce quickly set timeout to short

 go test -v -timeout 1s -tags e2e ./test/conformance -run ^TestMustPassTracingHeaders$ --kubeconfig $KUBECONFIG

Knative release version
0.8

Additional context
Add any other context about the problem here such as proposed priority

(base) aslom@m eventing (conformance-test-channel-headers-2) $ go test -v -timeout 60s -tags e2e ./test/conformance -run ^TestMustPassTracingHeaders$ --kubeconfig $KUBECONFIG
2019-09-05T16:55:18.280-0400	info	logging/config.go:43	Successfully created the logger.	{"knative.dev/jsonconfig": "{\n\t  \"level\": \"info\",\n\t  \"encoding\": \"console\",\n\t  \"outputPaths\": [\"stdout\"],\n\t  \"errorOutputPaths\": [\"stderr\"],\n\t  \"encoderConfig\": {\n\t    \"timeKey\": \"ts\",\n\t    \"messageKey\": \"message\",\n\t    \"levelKey\": \"level\",\n\t    \"nameKey\": \"logger\",\n\t    \"callerKey\": \"caller\",\n\t    \"messageKey\": \"msg\",\n\t    \"stacktraceKey\": \"stacktrace\",\n\t    \"lineEnding\": \"\",\n\t    \"levelEncoder\": \"\",\n\t    \"timeEncoder\": \"iso8601\",\n\t    \"durationEncoder\": \"\",\n\t    \"callerEncoder\": \"\"\n\t  }\n\t}"}
2019-09-05T16:55:18.280-0400	info	logging/config.go:43	Logging level set to info
2019-09-05T16:55:18.280-0400	info	logging/config.go:45	Fetch GitHub commit ID from kodata failed: "KO_DATA_PATH" does not exist or is empty
=== RUN   TestMustPassTracingHeaders
=== PAUSE TestMustPassTracingHeaders
=== CONT  TestMustPassTracingHeaders
=== RUN   TestMustPassTracingHeaders/TestMustPassTracingHeaders-InMemoryChannel
=== PAUSE TestMustPassTracingHeaders/TestMustPassTracingHeaders-InMemoryChannel
=== CONT  TestMustPassTracingHeaders/TestMustPassTracingHeaders-InMemoryChannel
panic: test timed out after 1m0s

goroutine 113 [running]:
testing.(*M).startAlarm.func1()
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:1334 +0xdf
created by time.goFunc
	/usr/local/Cellar/go/1.12.1/libexec/src/time/sleep.go:169 +0x44

goroutine 1 [chan receive, 1 minutes]:
testing.tRunner.func1(0xc0006aa700)
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:841 +0x203
testing.tRunner(0xc0006aa700, 0xc00065fe30)
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:869 +0xca
testing.runTests(0xc0000d5720, 0x31bd7d0, 0x1, 0x1, 0xc00065fef0)
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:1155 +0x2a9
testing.(*M).Run(0xc00036aa00, 0x0)
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:1072 +0x162
main.main()
	_testmain.go:42 +0x13e

goroutine 4 [chan receive]:
knative.dev/eventing/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0x31d5fa0)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/vendor/github.com/golang/glog/glog.go:882 +0x8b
created by knative.dev/eventing/vendor/github.com/golang/glog.init.0
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/vendor/github.com/golang/glog/glog.go:410 +0x272

goroutine 54 [select]:
knative.dev/eventing/vendor/go.opencensus.io/stats/view.(*worker).start(0xc000104be0)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/vendor/go.opencensus.io/stats/view/worker.go:154 +0x100
created by knative.dev/eventing/vendor/go.opencensus.io/stats/view.init.0
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/vendor/go.opencensus.io/stats/view/worker.go:32 +0x57

goroutine 66 [chan receive]:
knative.dev/eventing/vendor/k8s.io/apimachinery/pkg/util/wait.WaitFor(0xc000829740, 0xc000837700, 0xc0000e94a0, 0x2450320, 0xc000254120)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:330 +0x58
knative.dev/eventing/vendor/k8s.io/apimachinery/pkg/util/wait.pollInternal(0xc000829740, 0xc000837700, 0x0, 0x0)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:227 +0x8a
knative.dev/eventing/vendor/k8s.io/apimachinery/pkg/util/wait.pollImmediateInternal(0xc000829740, 0xc000837700, 0xc000829740, 0xc000837700)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:252 +0x70
knative.dev/eventing/vendor/k8s.io/apimachinery/pkg/util/wait.PollImmediate(0x3b9aca00, 0x37e11d6000, 0xc000837700, 0x6d, 0x0)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:241 +0x4d
knative.dev/eventing/test/base.WaitForResourceReady(0x2450320, 0xc0004ba938, 0xc000254120, 0x0, 0x0)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/test/base/resource_checks.go:53 +0x1b5
knative.dev/eventing/test/common.(*Tracker).WaitForKResourcesReady(0xc00011ba90, 0xc0003405a0, 0xc0004e2080)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/test/common/tracker.go:146 +0x141
knative.dev/eventing/test/common.(*Client).WaitForAllTestResourcesReady(0xc0004be4c0, 0xc0004de180, 0x27)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/test/common/operation.go:142 +0x33
knative.dev/eventing/test/conformance/helpers.SingleEventHelperForChannelTestHelper.func1(0xc0002be600, 0x21e6500, 0xf)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/test/conformance/helpers/channel_header_single_event_helper.go:65 +0x3a1
knative.dev/eventing/test/common.(*ChannelTestRunner).RunTests.func1(0xc0002be600)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/test/common/test_runner.go:59 +0x44
testing.tRunner(0xc0002be600, 0xc000195320)
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:865 +0xc0
created by testing.(*T).Run
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:916 +0x35a

goroutine 20 [syscall, 1 minutes]:
os/signal.signal_recv(0x0)
	/usr/local/Cellar/go/1.12.1/libexec/src/runtime/sigqueue.go:139 +0x9f
os/signal.loop()
	/usr/local/Cellar/go/1.12.1/libexec/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
	/usr/local/Cellar/go/1.12.1/libexec/src/os/signal/signal_unix.go:29 +0x41

goroutine 22 [chan receive]:
testing.tRunner.func1(0xc0006aa800)
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:841 +0x203
testing.tRunner(0xc0006aa800, 0x2274440)
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:869 +0xca
created by testing.(*T).Run
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:916 +0x35a

goroutine 23 [chan receive, 1 minutes]:
testing.runTests.func1.1(0xc0006aa700)
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:1162 +0x3b
created by testing.runTests.func1
	/usr/local/Cellar/go/1.12.1/libexec/src/testing/testing.go:1162 +0xac

goroutine 14 [IO wait]:
internal/poll.runtime_pollWait(0x3e51d08, 0x72, 0xffffffffffffffff)
	/usr/local/Cellar/go/1.12.1/libexec/src/runtime/netpoll.go:182 +0x56
internal/poll.(*pollDesc).wait(0xc000514d18, 0x72, 0xb00, 0xbf5, 0xffffffffffffffff)
	/usr/local/Cellar/go/1.12.1/libexec/src/internal/poll/fd_poll_runtime.go:87 +0x9b
internal/poll.(*pollDesc).waitRead(...)
	/usr/local/Cellar/go/1.12.1/libexec/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc000514d00, 0xc0005cc000, 0xbf5, 0xbf5, 0x0, 0x0, 0x0)
	/usr/local/Cellar/go/1.12.1/libexec/src/internal/poll/fd_unix.go:169 +0x1f2
net.(*netFD).Read(0xc000514d00, 0xc0005cc000, 0xbf5, 0xbf5, 0x203000, 0x0, 0xbe8)
	/usr/local/Cellar/go/1.12.1/libexec/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc0004c4000, 0xc0005cc000, 0xbf5, 0xbf5, 0x0, 0x0, 0x0)
	/usr/local/Cellar/go/1.12.1/libexec/src/net/net.go:177 +0x69
crypto/tls.(*atLeastReader).Read(0xc000119480, 0xc0005cc000, 0xbf5, 0xbf5, 0x301, 0x244e4e0, 0xc0006a09e0)
	/usr/local/Cellar/go/1.12.1/libexec/src/crypto/tls/conn.go:761 +0x60
bytes.(*Buffer).ReadFrom(0xc00033c258, 0x244e300, 0xc000119480, 0x100bea5, 0x2034580, 0x21881e0)
	/usr/local/Cellar/go/1.12.1/libexec/src/bytes/buffer.go:207 +0xbd
crypto/tls.(*Conn).readFromUntil(0xc00033c000, 0x37b31a0, 0xc0004c4000, 0x5, 0xc0004c4000, 0x2e4)
	/usr/local/Cellar/go/1.12.1/libexec/src/crypto/tls/conn.go:783 +0xf8
crypto/tls.(*Conn).readRecordOrCCS(0xc00033c000, 0x2276b00, 0xc00033c138, 0xc0006a0d58)
	/usr/local/Cellar/go/1.12.1/libexec/src/crypto/tls/conn.go:590 +0x125
crypto/tls.(*Conn).readRecord(...)
	/usr/local/Cellar/go/1.12.1/libexec/src/crypto/tls/conn.go:558
crypto/tls.(*Conn).Read(0xc00033c000, 0xc0005ef000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/usr/local/Cellar/go/1.12.1/libexec/src/crypto/tls/conn.go:1236 +0x137
bufio.(*Reader).Read(0xc00052aa20, 0xc0005f0038, 0x9, 0x9, 0x1008354, 0xc00052a420, 0xc0006a0d58)
	/usr/local/Cellar/go/1.12.1/libexec/src/bufio/bufio.go:223 +0x23e
io.ReadAtLeast(0x244e1a0, 0xc00052aa20, 0xc0005f0038, 0x9, 0x9, 0x9, 0x244e4e0, 0xc0007516e0, 0xc0000be030)
	/usr/local/Cellar/go/1.12.1/libexec/src/io/io.go:310 +0x88
io.ReadFull(...)
	/usr/local/Cellar/go/1.12.1/libexec/src/io/io.go:329
knative.dev/eventing/vendor/golang.org/x/net/http2.readFrameHeader(0xc0005f0038, 0x9, 0x9, 0x244e1a0, 0xc00052aa20, 0x0, 0x0, 0x0, 0x0)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/vendor/golang.org/x/net/http2/frame.go:237 +0x88
knative.dev/eventing/vendor/golang.org/x/net/http2.(*Framer).ReadFrame(0xc0005f0000, 0xc0006d60f0, 0x0, 0x0, 0x0)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/vendor/golang.org/x/net/http2/frame.go:492 +0xa1
knative.dev/eventing/vendor/golang.org/x/net/http2.(*clientConnReadLoop).run(0xc0006a0fb8, 0x22757c8, 0xc0005bd7b8)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/vendor/golang.org/x/net/http2/transport.go:1484 +0x8d
knative.dev/eventing/vendor/golang.org/x/net/http2.(*ClientConn).readLoop(0xc000348480)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/vendor/golang.org/x/net/http2/transport.go:1412 +0x76
created by knative.dev/eventing/vendor/golang.org/x/net/http2.(*Transport).newClientConn
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/vendor/golang.org/x/net/http2/transport.go:596 +0x629

goroutine 61 [select]:
knative.dev/eventing/vendor/k8s.io/apimachinery/pkg/util/wait.poller.func1.1(0xc0000e9500, 0x3b9aca00, 0x37e11d6000, 0xc0000e94a0)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:374 +0x153
created by knative.dev/eventing/vendor/k8s.io/apimachinery/pkg/util/wait.poller.func1
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:357 +0x8c

goroutine 24 [chan receive]:
knative.dev/eventing/test/common.Setup.func1(0xc0000e8fc0, 0xc0002be600)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/test/common/test_runner.go:98 +0x38
created by knative.dev/eventing/test/common.Setup
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/test/common/test_runner.go:97 +0x2a8
FAIL	knative.dev/eventing/test/conformance	60.070s
(base) aslom@m eventing (conformance-test-channel-headers-2) $
@aslom aslom added the kind/bug Categorizes issue or PR as related to a bug. label Sep 16, 2019
@aslom
Copy link
Member Author

aslom commented Sep 16, 2019

If I suspect which test failed I can figure out the file and get the line - but logs files can help to see what was actually happening and what was the input

knative.dev/eventing/test/conformance/helpers.SingleEventHelperForChannelTestHelper.func1(0xc0002be600, 0x21e6500, 0xf)
	/Users/aslom/Documents/awsm/go/src/knative.dev/eventing/test/conformance/helpers/channel_header_single_event_helper.go:65 +0x3a1
		if err := client.WaitForAllTestResourcesReady(); err != nil {
			st.Fatalf("Failed to get all test resources ready: %v", err)
		}

@aslom
Copy link
Member Author

aslom commented Sep 16, 2019

As @Fredy-Z suggested it is related to parallel execution and timeout in go test behavior: golang/go#24929

The issue is April and not merged in golang so no fix available :(

The workaround seems to be to "replace all t.Logf calls with either fmt.Printf or a custom logger" - could we do that and replace back custom logger when the problem is fixed in go?

@aslom
Copy link
Member Author

aslom commented Sep 16, 2019

What would be the best way to tackle the issue - add some flag to replace test logger with new logger? Could it be doen without modifying every test code that uses t *testing.T calls t.Logf.
Is that what this comment golang/go#24929 (comment) suggests?

@vagababov
Copy link
Contributor

/cc @mattmoor

Why not just run the tests with infinite timeout?

@aslom
Copy link
Member Author

aslom commented Sep 16, 2019

It will never finish? the test may be blocked on something that failed before so WaitForAllTestResourcesReady may never succeed? And if you have no logs you do not know what failed before :(

@vaikas vaikas added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Oct 3, 2019
@vaikas vaikas added this to the v0.11.0 milestone Oct 3, 2019
@vaikas
Copy link
Contributor

vaikas commented Oct 3, 2019

Yeah, so we probably don't want tests to hang forever, or did I misunderstand something @vagababov ?

@vagababov
Copy link
Contributor

I think we already run the tests with -1 timeout on Prow

@chizhg
Copy link
Member

chizhg commented Oct 7, 2019

I think we already run the tests with -1 timeout on Prow

Actually we have timeout set for all the integration tests running on Prow, see https://github.com/knative/serving/blob/7d87f88c65981e1905429a02bb1942fe4b96d9f7/test/e2e-tests.sh#L51

@adrcunha
Copy link
Contributor

adrcunha commented Oct 7, 2019

Prow jobs have a 2h timeout. The running job is simply killed, so the log output can be partial. But you can easily check that by using a 3h timeout with the above mentioned test.

@coryrc
Copy link
Contributor

coryrc commented Oct 7, 2019

/assign @coryrc I'm working on a solution for better logging

@chizhg
Copy link
Member

chizhg commented Oct 7, 2019

/assign @coryrc
Additional text is not allowed for this command 🤷‍♂

@aslom
Copy link
Member Author

aslom commented Oct 8, 2019

I think it is more important to have test logs than running tests in parallel. Maybe we should disable parallel test running until golang test is fixed? @coryrc @vaikas-google @chizhg ?

@aslom
Copy link
Member Author

aslom commented Oct 8, 2019

If test logs are lost then how do you diagnose what caused timeouts?

@coryrc
Copy link
Contributor

coryrc commented Nov 26, 2019

Sorry, I haven't yet figured out how to use github so I missed your reply.

go test supposedly has a -p option you can set to 1; will that solve the problem for now?
(Why I can't find that in the help... I don't know...)

@chizhg
Copy link
Member

chizhg commented Nov 26, 2019

Sorry, I haven't yet figured out how to use github so I missed your reply.

go test supposedly has a -p option you can set to 1; will that solve the problem for now?
(Why I can't find that in the help... I don't know...)

I think you meant -parallel, which can be checked by running go help testflag.

IMO the core issue here is the test log are lost after go test gets timeout, which does not have much to do with parallel test runs.

@aslom
Copy link
Member Author

aslom commented Nov 26, 2019

@chizhg @coryrc I did not use parallel option but I do not expect anything would change buffering logs to streaming? golang/go#24929 (comment)

@grantr
Copy link
Contributor

grantr commented Jan 14, 2020

@aslom does #907 fix this?

@grantr grantr added the proposal/0.13 Proposed (not planned) work items for 0.13 release label Jan 14, 2020
@grantr grantr removed this from the v0.11.0 milestone Jan 14, 2020
@aslom
Copy link
Member Author

aslom commented Jan 20, 2020

@grantr which #907 ? https://github.com/knative/eventing/pull/907/files (github links to it)?

@paulrossman paulrossman added this to the v0.13.0 milestone Jan 24, 2020
@coryrc
Copy link
Contributor

coryrc commented Feb 23, 2020

Two solutions (not mutually exclusive) are available:

  1. Use gotestsum instead of go-junit-reporter
  2. Open a separate log file: https://github.com/knative/pkg/blob/a2e3b66654c1bcd82df07f8b4c45b19bad7ab21a/test/logging/logging.go#L154

@coryrc
Copy link
Contributor

coryrc commented Feb 23, 2020

@aslom @grantr was referring to knative/pkg#907 ; that is my "solution #2" in the previous comment.

@paulrossman paulrossman modified the milestones: v0.13.0, v0.14.0 Mar 3, 2020
@akashrv akashrv removed the proposal/0.13 Proposed (not planned) work items for 0.13 release label Apr 14, 2020
@akashrv akashrv modified the milestones: v0.14.0, v0.15.0 Apr 14, 2020
@grantr grantr modified the milestones: v0.15.0, v0.16.0 May 27, 2020
@coryrc
Copy link
Contributor

coryrc commented May 28, 2020

/unassign
Solution remains as I stated and updated in knative/pkg#907, it just needs to be done.

@grantr grantr modified the milestones: v0.16.0, Backlog Jul 8, 2020
@lberk
Copy link
Member

lberk commented Jul 13, 2020

@aslom I think this has been fixed -- would you be willing to reproduce/confirm this and close if it's no longer applicable? thanks!

@lberk lberk added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jul 13, 2020
@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 12, 2020
@lberk
Copy link
Member

lberk commented Oct 15, 2020

@aslom going to close this for now as it seems to be fixed, please feel free to reopen if needed, thanks!

@lberk lberk closed this as completed Oct 15, 2020
matzew pushed a commit to matzew/eventing that referenced this issue Sep 28, 2022
…ve#1894)

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Projects
None yet
Development

No branches or pull requests

10 participants