Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow parsing just the vault name without requiring the secret path #305

Merged
merged 1 commit into from
Oct 29, 2024

Conversation

aryanjassal
Copy link
Contributor

@aryanjassal aryanjassal commented Oct 14, 2024

Description

When specifying a secret path, currently the format needs to be <vaultName>:<secretPath>, otherwise a parser error is thrown. This should not be the case, and instead paths like <vaultName> should also be allowed, which would point to the root of the directory.

To do this, a parser for vaultName will also be implemented, ensuring consistency in vault's allowed characters. This is important as currently, there are no parsers for vault name, but there is a regex validation for vault names when they are being used in the secret commands. This means that inadvertently vaults can be created which cannot be actually used in any commands like matrix.ai.

Issues Fixed

Tasks

  • 1. Create parser for vaultName
  • 2. Use the vaultName parser to parse the vault name consistently in vault and secret commands
  • 3. Make the default parser accept missing secret names
  • 4. Convert each command to use this new parser and substitute '/' if the secret path is undefined
  • 5. Run tests to ensure no regression
  • 6. Add tests for parsers
  • [ ] 7. Split tests for vaults/scanNode.test.ts out of scope for this PR
  • 8. Add tests for every handler to confirm no secret to be an alias for the root directory
    • cat
    • create
    • dir
    • edit
    • env
    • list
    • mkdir
    • remove
    • rename
    • stat
    • write
  • 9. Change vaults list to vaults ls

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@aryanjassal aryanjassal self-assigned this Oct 14, 2024
@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 14, 2024 via email

@tegefaulkes
Copy link
Contributor

I think I prefer null in those scenarios? Undefined is usually meant to mean void.

Undefined usually means it wasn't provided. Null would mean it was intentionally not provided. We always use undefined if it's optional.

@CMCDragonkai
Copy link
Member

I mean in TS, undefined is for void situations. I prefer null as an explicit Maybe type.

@CMCDragonkai
Copy link
Member

It depends, you need to review other places in the code where this has occurred.

@CMCDragonkai
Copy link
Member

Look at the other parsers.

@aryanjassal aryanjassal marked this pull request as ready for review October 18, 2024 03:04
@aryanjassal
Copy link
Contributor Author

While working on this PR, I actually discovered a bug in secrets write, where it doesn't properly handle writing to directories. I have made an appropriate bug report for that.

https://github.com/MatrixAI/Polykey-CLI/issues/311

There could have been other issues like this which might have slipped past testing and reviewing. I will go through all the currently implemented commands and ensure that the tests properly test everything so similar bugs can't catch us by surprise.

src/utils/parsers.ts Outdated Show resolved Hide resolved
src/utils/parsers.ts Outdated Show resolved Hide resolved
src/utils/parsers.ts Outdated Show resolved Hide resolved
src/utils/parsers.ts Outdated Show resolved Hide resolved
@aryanjassal
Copy link
Contributor Author

I just realised a potential issue that would crop up in the future. Lets say that I have a vault called test, and I have a local directory called test. If we add local filesystem support, then by running pk secrets rm test --recursive, it would be ambiguous what is being referred to, the local directory test, or all the vault contents under test (basically being an alias to test:/).

This same issue applies to basically all secrets commands which are expected to operate on the user's local file system. This needs to be addressed before we go about merging this PR.

@CMCDragonkai

@aryanjassal
Copy link
Contributor Author

I have observed an interesting behaviour with secrets cat. If we use secrets write to write a secret without a trailing newline (like how UNIX dictates it's standard should be), then the last line written using secrets write will fail to render.

[aryanj@matrix-34xx:~]$ pk secrets write vault:test
some content
more content
i will press ctrl d on the next line
this line will not be shown

[aryanj@matrix-34xx:~]$ pk secrets cat vault:test
some content
more content
i will press ctrl d on the next line

[aryanj@matrix-34xx:~]$ pk secrets cat vault:test > tempfile

[aryanj@matrix-34xx:~]$ cat tempfile
some content
more content
i will press ctrl d on the next line
this line will not be shown%

Terminal emulators add a % to indicate that a file wasn't ended properly (no newline at end of file). Node's process.stdout actually doesn't interact with the terminal emulator's standard in and out. Rather, it writes directly to /dev/stdout (or something else. I'm not exactly sure on what it does, but it interacts more closely with IO than commands running on terminal emulators do). If a command ends without a newline on a terminal emulator, it knows that a new line needs to be added at the end to display the prompt and other things elegantly, so it outputs a % to let the users know that the command exited without a proper newline. This is strictly visual and doesn't impact the output or anything else, and is basically standard behaviour across all terminal emulators.

This is not supported by Node's process.stdout as it directly interacts with the standard in/out streams. As such, the terminal emulator doesn't know when the application finishes it's streams. As such, when that is the case, then the last line gets overwritten by the prompt. In my case, the prompt starts with a newline character to separate the output of the previous command better. This clears the previous line's contents and overwrites it with a blank newline.

As a workaround, currently we are manually writing a \n character to stderr so this issue doesn't happen, and doesn't break piping by providing incorrect output as compared to regular cat.

This still needs to be changed to be more intelligent, and detect if the last character was a newline, and do this only if it wasn't.

Another interesting discovery was made in regards to secrets write. When we press <Ctrl-D> on the keyboard, if characters have already been written on that line of stdin, then it would actually break to allow writing into a newline without actually inserting a newline. Then, when we press <Ctrl-D> again, that would signal the end of stream and close the stream. So, in many cases, we would need to press <Ctrl-D> twice in order to actually close the stream. This is not a bug with my code, but rather the behaviour of terminal emulators themselves. However, I'm not 100% certain about this, so more testing is actually needed here.

We should also consider writing a message to stderr informing the user that the message will now be written to the file, as writing to the file takes about a second, and the users won't be sure if the program needs another <Ctrl-D> or is just writing the file. In the future, when stdin will be directly streamed to the handler (currently it's a unary handler. this is being tracked in MatrixAI/Polykey#822), this would become largely irrelevant, but would be good feedback to have right now.

@aryanjassal
Copy link
Contributor Author

aryanjassal commented Oct 18, 2024

This PR will take longer to finish, so the relevant part required for #289 has been broken off and migrated to it's own PR, which will be completed much sooner than this one.

#312

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 18, 2024

I just realised a potential issue that would crop up in the future. Lets say that I have a vault called test, and I have a local directory called test. If we add local filesystem support, then by running pk secrets rm test --recursive, it would be ambiguous what is being referred to, the local directory test, or all the vault contents under test (basically being an alias to test:/).

This same issue applies to basically all secrets commands which are expected to operate on the user's local file system. This needs to be addressed before we go about merging this PR.

@CMCDragonkai

I think firstly there are only some commands that operate on both local and vault namespaces.

That's the mv and cp commands, all other commands only operate in vault namespace only. Therefore test, test: test:/ all refer to the vault, vault and vault root directory path respectively.

So in the case of mv and cp you do have the situation of cp test test or mv test test where there's ambiguity of moving the test vault to the test local path or the test local path to the test vault.

The solutions:

  1. Syntax separation: vault: and vault:/ always mean the vault, I think though I dislike vault: and should not be parsed as vault path, but should be a syntax error (that is vault is ok, and vault:/ pr vault:abc is ok but not vault)
  2. Assume that the parameter is a vault path, unless it is explicit that it is not. That is secrets cp test test should be assumed that test and test are both vault paths, and therefore this would an error, cannot copy vault into vault.

Do note that you don't currently have PCC nor OCC with vault operations via RPC, until you've developed the stream-lifecycle representing an RPC conversation where the stream lifetime means 1 PCC lock. OCC requires MVCC and that's not possible without snapshots of the vault state which we cannot do at the moment.


Please ensure that:

pk secrets cp test test # this is an error
pk secrets cp -r testA testB # this means copying testA into testB, so `testA:/testB`
pk secrets mv test ./test # means local path
pk secrets mv test test/test # means moving test into test/test

Note that mv of a vault itself is equivalent to cp -r because you are not allowed to delete the vault entirely. That has to be a vaults * command and not a secrets command.

There's actually alot of constraints here. @tegefaulkes you should be working with @aryanjassal to ensure that all scenarios are being fast checked here and clearly in one big list.

I want to have a clear list here in the spec.

@CMCDragonkai
Copy link
Member

Terminal emulators add a % to indicate that a file wasn't ended properly (no newline at end of file). Node's process.stdout actually doesn't interact with the terminal emulator's standard in and out. Rather, it writes directly to /dev/stdout (or something else. I'm not exactly sure on what it does, but it interacts more closely with IO than commands running on terminal emulators do). If a command ends without a newline on a terminal emulator, it knows that a new line needs to be added at the end to display the prompt and other things elegantly, so it outputs a % to let the users know that the command exited without a proper newline. This is strictly visual and doesn't impact the output or anything else, and is basically standard behaviour across all terminal emulators.

The % isn't necessarily true, it depends on the terminal emulator.

@CMCDragonkai
Copy link
Member

As a workaround, currently we are manually writing a \n character to stderr so this issue doesn't happen, and doesn't break piping by providing incorrect output as compared to regular cat.

This is a bad idea.

@CMCDragonkai
Copy link
Member

Another interesting discovery was made in regards to secrets write. When we press <Ctrl-D> on the keyboard, if characters have already been written on that line of stdin, then it would actually break to allow writing into a newline without actually inserting a newline. Then, when we press <Ctrl-D> again, that would signal the end of stream and close the stream. So, in many cases, we would need to press <Ctrl-D> twice in order to actually close the stream. This is not a bug with my code, but rather the behaviour of terminal emulators themselves. However, I'm not 100% certain about this, so more testing is actually needed here.

I've never seen this with other unix commands, you should check how you're taking in STDIN here.

@CMCDragonkai
Copy link
Member

I have observed an interesting behaviour with secrets cat. If we use secrets write to write a secret without a trailing newline (like how UNIX dictates it's standard should be), then the last line written using secrets write will fail to render.

This is definitely a bug and not something terminal emulators fail at doing.

The reason for this is due to line buffering. You just need to flush the buffer explicitly at the end when the \n doesn't exist.

@CMCDragonkai
Copy link
Member

I have observed an interesting behaviour with secrets cat. If we use secrets write to write a secret without a trailing newline (like how UNIX dictates it's standard should be), then the last line written using secrets write will fail to render.

This is definitely a bug and not something terminal emulators fail at doing.

The reason for this is due to line buffering. You just need to flush the buffer explicitly at the end when the \n doesn't exist.

In fact you SHOULD ALWAYS be flushing the buffer explicitly at the end of every CLI command.

@CMCDragonkai
Copy link
Member

Assume that the parameter is a vault path, unless it is explicit that it is not.

This a key constraint. You need to fast check this well.

The point is abc is just going to be interpreted as a vault path ALWAYS.

But abc/ is not, it's going to be a local path.

Let's get some variants:

Here’s the updated table based on the clarified rules that vault paths must follow the :/ pattern, with local paths distinguished by their explicit indicators (e.g., ./, /, etc.):

Path Interpretation Rationale
abc Vault path Simple string, assumed to be a vault path (abc).
abc:/ Vault root Explicitly refers to the root of the vault abc.
abc:/abc Vault path Refers to subpath abc within the vault abc.
abc:/subdir/file Vault path Refers to a file within a subdirectory in the vault abc.
abc:/xyz Vault path Refers to xyz subpath in the vault abc.
abc:/xyz/ Vault path Refers to the xyz/ directory inside vault abc.
./abc Local path Prefixed with ./, making it explicitly local.
abc/ Local path Ends with /, indicating it's a local directory.
/abc Local path Absolute path starting with /, clearly a local path.
abc:xyz Local path : allowed in Unix filenames, no :/ pattern detected.
./abc/xyz Local path Prefixed with ./, clearly local with subdirectory xyz.
/xyz/file Local path Absolute path, explicitly local, referring to a file.

Key Rules Recap:

  1. Vault Paths:

    • Any path with :/ is a vault path.
    • A simple path like abc is also treated as a vault path.
  2. Local Paths:

    • Paths prefixed with ./, /, or ending with / are explicitly local.
    • Paths with colons (:) but without the :/ format are treated as local, given that Unix allows colons in filenames.

This table and rule set should now provide a clear, unambiguous interpretation of paths in the context of vault and local operations.

@aryanjassal
Copy link
Contributor Author

Here’s the updated table based on the clarified rules that vault paths must follow the :/ pattern, with local paths distinguished by their explicit indicators (e.g., ./, /, etc.):

This would mean that vault names cannot have a / symbol, as otherwise we wouldn't be able to differentiate between the local file path or a vault name.

Also, what about abc:.xyz? That is technically an allowed unix path, but also an allowed vault path. How should names like this be captured?

Should we allow vault names to start with .? .abc is a local path, but .abc:/xyz is a valid vault path.

@CMCDragonkai
Copy link
Member

Here’s the updated table based on the clarified rules that vault paths must follow the :/ pattern, with local paths distinguished by their explicit indicators (e.g., ./, /, etc.):

This would mean that vault names cannot have a / symbol, as otherwise we wouldn't be able to differentiate between the local file path or a vault name.

Also, what about abc:.xyz? That is technically an allowed unix path, but also an allowed vault path. How should names like this be captured?

Should we allow vault names to start with .? .abc is a local path, but .abc:/xyz is a valid vault path.

Yes / is a reserved symbol anyway.

Under the rules then it will need to be /.abc to always start the resulting vault path.

Well we have to make some rules on a vault name - it would be more constrained than any path. At least 2 symbols are not allowed here: : and /.

We may also want to disallow non-printable characters and control characters but utf-8 symbols are ok.

Use a regex rule - ask chatgpt to help generate and use fastcheck and regex101 to check.

@aryanjassal
Copy link
Contributor Author

We may also want to disallow non-printable characters and control characters but utf-8 symbols are ok.

Previously, and in this PR for now, only ASCII characters are allowed as the vault name. The previous regex followed this: [a-zA-Z0-9_-].

const secretPathRegex = /^([\w-]+)(?::([^\0\\=]+))?$/;

Right now, this PR is following [ -~^:], allowing all ASCII printable characters from 0x20 (space) to 0x7e (~) except colon (:).

I will work on stricter rules and parsing for vault names in this PR. #312 will see a simple change to additionally allow dots in vault names, as that is required urgently. More extensive changes will be made in this PR.

@aryanjassal
Copy link
Contributor Author

aryanjassal commented Oct 25, 2024

Another interesting discovery was made in regards to secrets write. When we press <Ctrl-D> on the keyboard, if characters have already been written on that line of stdin, then it would actually break to allow writing into a newline without actually inserting a newline. Then, when we press <Ctrl-D> again, that would signal the end of stream and close the stream. So, in many cases, we would need to press <Ctrl-D> twice in order to actually close the stream. This is not a bug with my code, but rather the behaviour of terminal emulators themselves. However, I'm not 100% certain about this, so more testing is actually needed here.

I've never seen this with other unix commands, you should check how you're taking in STDIN here.

https://unix.stackexchange.com/a/701319
https://askubuntu.com/a/118762

It does appear to be consistent across unix commands. This behaviour is also seen on unix's cat, where pressing <Ctrl-D> sends the line into stdin, and pressing <Ctrl-D> again on the same line will close stdin. This means pressing <Ctrl-D> twice on the same line is required if a newline is not desired at the end of the output. Otherwise, add a newline, and press <Ctrl-D> once to close stdin. This is exactly how secrets write behaves.

@CMCDragonkai

@aryanjassal
Copy link
Contributor Author

Key Rules Recap:

  1. Vault Paths:

    • Any path with :/ is a vault path.
    • A simple path like abc is also treated as a vault path.
  2. Local Paths:

    • Paths prefixed with ./, /, or ending with / are explicitly local.
    • Paths with colons (:) but without the :/ format are treated as local, given that Unix allows colons in filenames.

This table and rule set should now provide a clear, unambiguous interpretation of paths in the context of vault and local operations.

The rules for local paths aren't relevant for this PR, as its scope is vault paths. I will make a new issue for implementation of local paths, and input all this information there.

@aryanjassal
Copy link
Contributor Author

aryanjassal commented Oct 25, 2024

The reason for this is due to line buffering. You just need to flush the buffer explicitly at the end when the \n doesn't exist.

In fact you SHOULD ALWAYS be flushing the buffer explicitly at the end of every CLI command.

I tried to add a process.stdout.end(), which is designed to flush the stream, but to no avail. The only thing which worked was adding the newline at the end in stderr.

I'll make a separate issue for this, as I don't think that this can be done in this PR.

#316

Copy link

linear bot commented Oct 25, 2024

@aryanjassal
Copy link
Contributor Author

As ENG-431 (#310) is a small fix, I will address that in this PR too.

Copy link
Contributor Author

@aryanjassal aryanjassal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is more or less complete. I've put down some of my thoughts in my review, which would be useful for reference later.

src/secrets/CommandRemove.ts Show resolved Hide resolved
src/secrets/CommandRename.ts Show resolved Hide resolved
tests/secrets/cat.test.ts Show resolved Hide resolved
expect(result.exitCode).toBe(0);
expect(result.stdout).toBe('MySecret1\nMySecret2\nMySecret3\n');
},
globalThis.defaultTimeout * 2,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I ran the tests, I saw no need for additional timeout, so I removed this line. This applies to all secrets subcommand tests, as most of them completed within 3 seconds, with longer ones taking up to 5, which is well under the 20 second timeout limit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we preserve the globalThis.defaultTimeout because that's the default.

tests/secrets/mkdir.test.ts Outdated Show resolved Hide resolved
Comment on lines +332 to +350
const vaultNameArb = fc.stringOf(
fc.char().filter((c) => binParsers.vaultNameRegex.test(c)),
{ minLength: 1, maxLength: 100 },
);
const singleSecretPathArb = fc.stringOf(
fc.char().filter((c) => binParsers.secretPathRegex.test(c)),
{ minLength: 1, maxLength: 25 },
);
const secretPathArb = fc
.array(singleSecretPathArb, { minLength: 1, maxLength: 5 })
.map((segments) => path.join(...segments));
const valueFirstCharArb = fc.char().filter((c) => /^[a-zA-Z_]$/.test(c));
const valueRestCharArb = fc.stringOf(
fc.char().filter((c) => /^[\w]$/.test(c)),
{ minLength: 1, maxLength: 100 },
);
const valueDataArb = fc
.tuple(valueFirstCharArb, valueRestCharArb)
.map((components) => components.join(''));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably isn't the best way to generate testing data, but this is the only way I found to ensure the tested data will be valid at all times. Instead of using the actual regex, I might need to use some other arb rule, but it would become pretty complex pretty fast.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex is fine. What's wrong with regex?

chore: vault commands are now using vaultNameParser

chore: writing fastcheck tests for parsers

chore: jestified outputFormatter tests

feat: updated tests for every secret command to allow undefined secret path

fix: consistently using command instead of [...command] in secret tests

chore: stream consumption must be inside retryAuthentication

chore: tests consistently use command

fix: lint

fix: mkdir

fix: variable name
@aryanjassal
Copy link
Contributor Author

All the tasks for this PR have been done. Brian and I have sanity checked the commands and ensured that they all work as expected. I will merge this PR and make a new one to address all the issues that cropped up during the development of this PR.

@aryanjassal aryanjassal merged commit 6ea2cc3 into staging Oct 29, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants