Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should ExecForEach sanitize or escape arguments differently? #184

Open
telemachus opened this issue Aug 21, 2023 · 9 comments
Open

Should ExecForEach sanitize or escape arguments differently? #184

telemachus opened this issue Aug 21, 2023 · 9 comments

Comments

@telemachus
Copy link

telemachus commented Aug 21, 2023

After a recent conversation about script on Lobste.rs, I'm concerned about how ExecForEach handles arguments (especially filenames?) with tricky characters (e.g., spaces, single quotes, and double quotes).

The current example for ExecForEach is ListFiles("*").ExecForEach("touch {{.}}").Wait(). I'll take that as a baseline, and start with the following minimal script:

package main

import "github.com/bitfield/script"

func main() {
	script.ListFiles("*").ExecForEach("touch {{.}}").Stdout()
}

Imagine that lives in a folder with the following files:

$ ls -l
total 24
-rw-r--r--  1 telemachus  wheel     0 Aug 21 07:14 Bobby" Tables"
-rw-r--r--  1 telemachus  wheel     0 Aug 21 07:14 Bobby' Tables'
-rw-r--r--  1 telemachus  wheel   210 Aug 21 07:17 go.mod
-rw-r--r--  1 telemachus  wheel  3221 Aug 21 07:17 go.sum
-rw-r--r--  1 telemachus  wheel     0 Aug 21 07:19 hello world
-rw-r--r--  1 telemachus  wheel   125 Aug 21 07:16 main.go

Here's the result of running the script:

$ ls -l
total 24
-rw-r--r--  1 telemachus  wheel     0 Aug 21 09:32 Bobby Tables
-rw-r--r--  1 telemachus  wheel     0 Aug 21 07:14 Bobby" Tables"
-rw-r--r--  1 telemachus  wheel     0 Aug 21 07:14 Bobby' Tables'
-rw-r--r--  1 telemachus  wheel   210 Aug 21 09:32 go.mod
-rw-r--r--  1 telemachus  wheel  3221 Aug 21 09:32 go.sum
-rw-r--r--  1 telemachus  wheel     0 Aug 21 09:32 hello
-rw-r--r--  1 telemachus  wheel     0 Aug 21 07:19 hello world
-rw-r--r--  1 telemachus  wheel   124 Aug 21 09:32 main.go
-rw-r--r--  1 telemachus  wheel     0 Aug 21 09:32 world

I would expect (hope?) that script would sanitize the filenames, so that touch ran once with each filename (including the ones with tricky characters).

Instead, here are some things that go wrong (I think).

  1. touch seems to receive the argument Bobby Tables twice. In a way, this is a good result: a single name with a space! On the other hand, all quotes seem to be stripped away from the two original filenames, one of which has single quotes and one of which has double quotes. If you change the code to script.ListFiles("*").ExecForEach("ls {{.}}").Stdout(), you'll see ls: Bobby Tables: No such file or directory (without quotes of any kind) twice in the output. These results suggest that something script does (the templating, perhaps?) removes quotations from filenames.
  2. ExecForEach seems to split the filename hello world into two arguments so that the program runs touch hello and touch world.

Normally, in actual shell scripting, you could fix problems like these with proper quoting.

for file in *
do
	touch "$file"
done

But I can't see any way to do the equivalent using script. Ideally, script itself would handle this for the user. (E.g., to protect users who don't know about such dangers from themselves.)

I noodled around with this a bit last night using https://bitbucket.org/creachadair/shell and https://github.com/mvdan/sh, but nothing I did helped. I'd be interested in working on this, but at the moment I can't see a way forward.

@bitfield
Copy link
Owner

Thanks for putting so much thought into this, @telemachus—it's most helpful.

I think I understand what's going on here now. Let's take the spaces issue first. Something like this:

script.Echo("my file").ExecForEach("touch {{.}}").Wait()

doesn't work as expected, because we end up with two files: my and file. Well, this wouldn't work in the shell either, as we've discussed, so the solution would be to quote the filename part of the command like this:

script.Echo("my file").ExecForEach("touch \"{{.}}\"").Wait()

That takes care of the spaces issue, but what about files with quotes in their names?

script.Echo("my \"quoted\" file").ExecForEach("touch \"{{.}}\"").Wait()

This creates a file named my quoted file, which isn't quite right: it should be literally my "quoted" file. The Go template parser is eating the quotes. Again, the solution is to escape the quotes before they get interpolated:

script.Echo("my \\\"quoted\\\" file").ExecForEach("touch \"{{.}}\"").Wait()

But when the list of filenames comes from some arbitrary source, what can we do? This must be a problem whenever filenames are interpolated into a Go template, mustn't it?

Honestly, I'll confess this to you, but keep it to yourself: I don't much care for Go templates. I try not to use them if at all possible. However, if we can figure out the exact right way to rewrite arbitrary data in script pipes such that it expands to the correct values in templates, I'd certainly consider adding it to the package.

So, to coin a phrase, PRs are welcome!

@earthboundkid
Copy link
Contributor

earthboundkid commented Aug 21, 2023

This problem has the same shape as the classic problem of HTML escaping: how do you make sure 1 < 2 turns into exactly 1 &lt; 2 and not 1 &amp;lt; 2 etc. The thorough solution would be to do what the Go html/template engine does: have a separate type for "safe" strings and turn raw strings into safe strings before final output. That seems like too much effort though. A hacky but easy solution is to add a function to the template that looks like this:

type SafeString string

func quote(s any) SafeString {
  switch v := s.(type) {
  case SafeString:
     return v
  case string:
    return SafeString(shell.Quote(v))
  default:
   panic("bad type!")
 }
}

Then you use it on the client side like .ExecForEach(`touch {{.|quote}}`.

@gedw99
Copy link
Contributor

gedw99 commented Aug 22, 2023

You mean this one ?

https://github.com/google/safehtml

@bitfield
Copy link
Owner

It sounds like we have a plan!

@bitfield
Copy link
Owner

is the plan for someone to write up a PR with (something like?) Carl's quote helper?

That certainly seems like it would be a good value-add, doesn't it? You've constructed the failing test case, @carlmjohnson has pointed out what the solution should look like, and @gedw99 has identified that the necessary code already exists. Between us, I think we've got this!

If so, should the docs recommend that people always use that when they feed arguments to external commands? (I think that means that people should always use the helper when they use ExecForEach, but I am not 100% sure about that.)

Yes, at first I thought we should simply always quote-sanitise input data to the template, but that's not quite good enough, because we don't know that the data contains filenames, for example. Therefore, it needs to be up to the user to call this function in their template if they're interpolating filenames.

Adding this to the documentation should also appease the wrath of @matklad:

What I do find objectionable is not mentioning this in the docs for a function!

@matklad
Copy link

matklad commented Aug 22, 2023

Some gopherology by an ex-crustacean:

It feels like there should be a way to configure default escaping rule for Template, because that's what you'll just need for more or less any use-case. And, like, there is html escaping, it should work somehow?

And, look, indeed, Templae exports the underlying parse tree:

https://pkg.go.dev/text/template#Template

type Template struct {
	*parse.Tree
	// contains filtered or unexported fields
}

So I guess that's the way! You could walk the parse tree manually, applying your custom escaping rules.

Now, how do I actually use that?

Template is the representation of a parsed template. The *parse.Tree field is exported only for use by html/template and should be treated as unexported by all other clients.

Oh...

:-)

@earthboundkid
Copy link
Contributor

In the current version of Go, if you put a package inside a directory called internal, it becomes unimportable by external packages. The Go team has said the parse tree for templates would have been in internal if internal had existed when it was created, but they don't want to move it now. Cf. golang/go#25357

@winks
Copy link

winks commented Aug 22, 2023

Two things, and please just ignore me if I'm going overboard with the terseness here, but in this use case of "replacing shell" instead of "writing a rather large program" I'm left wondering why {{.}} (no matter how default to what templating it may be) feels so verbose, and isn't just {} or %s. I mean, this is not supposed to be verbatim Go, you're putting the input "strings-as-in-cli-args" through your own functions.

Second thing, yes find ... -print0 and xargs -0 is kinda clunky, but see above, if you're wrapping this up in the lib, wouldn't that be a possibility to implicitly use something like that?

@bitfield
Copy link
Owner

@winks, I agree that Go template syntax isn't easy to love. But my thinking is that it's better to use something standard, perfect or not, than to invent my own way of doing this.

Second thing, yes find ... -print0 and xargs -0 is kinda clunky, but see above, if you're wrapping this up in the lib, wouldn't that be a possibility to implicitly use something like that?

Can you elaborate on this point a little bit? I'm not sure I understand what you're referring to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants