Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade "files" input with watchdog #271

Open
DpoBoceka opened this issue Sep 6, 2019 · 15 comments
Open

upgrade "files" input with watchdog #271

DpoBoceka opened this issue Sep 6, 2019 · 15 comments
Labels
enhancement help wanted inputs Any tasks or issues relating specifically to inputs

Comments

@DpoBoceka
Copy link
Contributor

It would be nice to have an opportunity to use benthos instead of filebeat or rsyslog for simple shipping logs so it could expand its influence and conquer more use-cases. But currently benthos'es "files" input reads path just once, hence in order to ship new logs we have to restart the instance.
I also wonder if it has metadata in it to understand where benthos stopped its reading if we had it restarted.

@Jeffail Jeffail added enhancement inputs Any tasks or issues relating specifically to inputs labels Sep 6, 2019
@Jeffail
Copy link
Collaborator

Jeffail commented Sep 6, 2019

Hey @DpoBoceka, I'm not opposed to adding the ability to watch and track input files. However, it's a fairly large task, so I'm not likely to take this on myself any time soon.

@DpoBoceka
Copy link
Contributor Author

I'll just leave it here in order someone would be interested.
https://github.com/radovskyb/watcher
With that library we could implement Input.Connect() and Read() the bytes of a file from that channel as we do now. I would like to try that out later

@DpoBoceka
Copy link
Contributor Author

Some advise before I'll get to it?

@Jeffail
Copy link
Collaborator

Jeffail commented Feb 1, 2020

So I think this behaviour should be added to the file input rather than files because files specifically consumes each discrete file as a payload instead of line by line.

I would propose the following additions:

  • Allow consuming >1 files with the file input. We need to preserve backwards compatibility here so path needs to still allow a string value, but we can either add another field paths which is an array, or allow path to be either a string or array of strings.
  • Allow wildcard paths (optional for now, we can do it later)
  • Add a field cache which allows users to specify a cache resource to store metadata about when and where we last read from each file being consumed.
  • When cache is specified, for each file path being consumed we store the consumed position in the cache using the path as the key (maybe hashed). It might be worth storing this in a structured way so that we can add more context later (JSON format?) We should also flush these offsets in a separate goroutine in intervals.
  • On start up, if a cache is specified, for each file we query the cache to see if there's a pre-existing position to consume from. If there is not, or if the position is greater than the files current size (meaning it's been rotated) then we consume from the beginning.

Allowing users to specify their own cache resource not only means they can store this metadata however they like but it also gives them control over things like TTLs. It probably makes sense to eventually flesh out the file cache type to support TTLs itself as it's the most likely candidate for this purpose.

@miko
Copy link

miko commented Oct 20, 2020

I wish "file" input could support tail mode (with truncation/move detection, as in https://github.com/hpcloud/tail) and "super asterisk" as in https://github.com/influxdata/telegraf/tree/master/plugins/inputs/tail

Use case: reading syslog-generated log files (rotated and/or created based on current time)

@abh
Copy link

abh commented Aug 8, 2021

@Jeffail Since the file plugin has been deprecated, should this feature be in a new tail-file plugin or be added as a feature to files after all?

@Jeffail
Copy link
Collaborator

Jeffail commented Aug 8, 2021

Hey @abh, it's actually the files input that has been deprecated in favour of file, the reason for that was because the file input got a new field codec along with supporting multiple paths with the new paths field, and so it supports everything that the files input did (and more).

However, I think it might be difficult to map over all the different codec options to a watcher because they expect to consume an io.Reader, whereas a file watcher will want to chop the file byte stream into discrete lines (or follow a custom delimiter), so I think it might be sensible to go with a separate implementation for now.

Maybe a good path would be to create a new input marked as experimental, iterate on it a few times, and if we can eventually find a way to introduce the codecs from the normal file input then we can combine them, otherwise they'll remain separate.

Is this something you're considering working on? If so let me know if I can help or provide any guidance, it would be awesome to finally get it done.

@mihaitodor
Copy link
Collaborator

Looks like https://github.com/influxdata/tail is a maintained version of https://github.com/hpcloud/tail

@Jeffail
Copy link
Collaborator

Jeffail commented Jul 31, 2022

There's also https://github.com/nxadm/tail which looks a bit more active.

@mihaitodor
Copy link
Collaborator

mihaitodor commented Jul 31, 2022

Just had a quick look in there and it doesn’t look like that much code, TBH. Might be worth maintaining that logic directly in Benthos.

LE: This is definitely not smth we want in Benthos: https://github.com/nxadm/tail/blob/master/winfile/winfile.go I wonder if there's a separate library for it...

@gedw99
Copy link

gedw99 commented Apr 1, 2023

Also need this.

I already started to use https://github.com/nxadm/tail and it’s been good .

@terryherron
Copy link

For consistency consider following the SFTP "watcher" pattern.
https://www.benthos.dev/docs/components/inputs/sftp

Thanks for an excellent project.

@gedw99
Copy link

gedw99 commented May 22, 2023

For consistency consider following the SFTP "watcher" pattern. https://www.benthos.dev/docs/components/inputs/sftp

Thanks for an excellent project.

Had a look. Its using polling. is that your point ? I think polling is also a good base to start from too. We can also add debounce too.

@gedw99
Copy link

gedw99 commented May 22, 2023

this could be used as a base: https://github.com/loov/watchrun/tree/master

Its using polling and also high resolution timers

@fearfate
Copy link
Contributor

fearfate commented Nov 7, 2024

Is this feature still in processing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement help wanted inputs Any tasks or issues relating specifically to inputs
Projects
None yet
Development

No branches or pull requests

8 participants