-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: go/scanner: stop adding source file directory to relative file name in //line comment #69689
Comments
Related Issues and Documentation
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
I've created a test project I compared results against the linters compiled with non-modified Go. I downloaded the linters from Homebrew repository.
|
Change https://go.dev/cl/618276 mentions this issue: |
@griesemer @findleyr @ianlancetaylor Could you please take a look on the CL 618276? |
Could you please take a look on this proposal, if would be very grateful if we can include this fix in go1.24 release. |
Sorry you haven't gotten much traction on this issue. I suspect it's because there's no easy solution, and this has been a problem for a long time. It's likely that we won't reach a satisfactory resolution for 1.24. I wasn't familiar with the context of this issue, but it strikes me that go/scanner should really not be in the business of transforming relative paths to absolute paths. I agree it would be more correct to change that behavior, though we may not be able to do so. Can you clarify exactly how you are encountering the absolute paths? In which artifacts are they written? You say "we can't bake absolute paths inside build artifacts", but as far as I can tell from the issue, the paths are only modified inside the token.Files in memory of a process using go/scanner. I suspect that you're talking about compiler output, but it's worth being precise here. It may be that we can alter the behavior of go/scanner for the compiler, and leave it unmodified inside go/parser.ParseFile, which is how most tooling would consume its output. |
I agree that go/scanner should not be transforming paths, and that a godebug seems like a reasonable way to make this sound but potentially risky change (though each godebug parameter at least doubles the configuration space for testing).
I'm confused; the compiler doesn't use go/scanner. |
Indeed you're right. I thought that there was a use case somewhere. The question remains: which artifacts hold the absolute paths? |
The source file contains |
@adonovan I understand that this is a problem for token.File (above, I said "as far as I can tell from the issue, the paths are only modified inside the token.Files in memory of a process using go/scanner"). What I don't understand is the exact mechanism that causes a problem in Bazel. The original report says In "build systems like Buck2 or Bazel the build actions may be executed on different hosts, so we can't bake absolute paths inside build artifacts". What are the artifacts? The cgo files themselves contain relative paths. |
Oh, I took that to mean that there are some go/token-based tools that are used during a Bazel build that read source files (including those generated by cgo) and include the file/line/column information in their output, causing them to inadvertently depend on the process's working directory, breaking determinism. (Technically, hermetism: such programs are deterministic on any given machine, but incorporate information about the build worker machine in their output.) This was a pretty common problem with Blaze and Forge until we tracked down and fixed every nondeterministic and non-hermetic program (by running them twice and failing if the results were not consistent). |
Sure, but I wonder what those tools are, and whether they can be updated to write relative paths. Perhaps they can't, if they have some deep integration with go/parser (and go/parser doesn't expose the way it calls go/scanner), but it's worth understanding. It is indeed inconvenient and confusing that go/scanner translates paths in this way, but the bug is arguably in the tool that generates the artifacts with absolute paths, or the system that consumes those absolute paths in an inconsistent context. From go/scanner's perspective, inside a running process, the transformation is valid. Before advocating for a solution, I want to better understand the problem. |
I'm sure they can, and I think this proposal is suggesting that they should so that we can remove this workaround from the scanner. The GODEBUG is an acknowledgement that there is risk and cost.
I think we are all in agreement on this.
I don't think that's true: given that the parser API takes as parameters both the content of the file and its apparent file name, the scanner has no justification for interacting with the file system. |
Sorry, I don't understand this comment. I believe the proposal is suggesting that tools relying on absolute paths should update so that the path translation in go/scanner can be removed. Specifically, consider this quote: "The cgocall analyser in golang.org/x/tools is trying to access the original source file by fset.Position(raw.Pos()).Filename, but fails when relative filepaths used because the filename is invalid." That is saying the cgocall analyzer fails when relative paths are used in the token.File. It is not saying that the cgocall analyser is failing in the distributed bazel build. I'm asking what is failing in the distributed bazel build.
How is the scanner interacting with the file system? IIUC it is simply translating paths from relative to absolute, based on the directory of the filepath being parsed. There is no I/O involved. |
Sorry, I now realize that all this time, thanks to this comment:
I was under the misapprehension that the scanner was using the process's current directory to absolutize relative file names in line directives, when in fact it is lexically joining the parent directory of the file containing the line directive (or the purported filename provided in the call to ParseFile). The compiler documentation doesn't give any guidance on the interpretation of relative file names in line directives, and one can imagine a number of possibilities. Given that the scanner behavior should not depend on os.Getwd, and that the working directory of the tool that wrote the //line directive is unknowable, the only interpretation that makes sense to me is the one that is implemented (though possibly without the filepath.Clean, which can give the wrong answer in the presence of symbolic links). Unless the line directives are lexically relativized to the source file's nominal directory, the apparent file name reported by token.File.Name would be unrelativized, making it implicitly relative to the process's working directory. (I had been nodding my head in agreement with all your comments all this time not realizing I was saying something opposite to what I meant.) |
(Apologies if this is obvious to others. I'm catching up.) |
The The compiler works fine with both relative and absolute paths, that's not a problem at all. The compiler doesn't use The problem is tools like |
Yes |
Thanks, I didn't know about that.
Yes, this makes sense to me |
I understand this is quite a tricky issue, than you all for digging into it |
Aha, this is interesting, @podtserkovskiy.
Can you say more about this? If the line directives of the cgo files are relative, why is there an error when they are translated to absolute paths? IIUC, this translation happens inside of a call to |
Is it possible that the issue here is that Bazel uses a build sandbox, so when analyzing a file containing a //line directive, the directive refers to a file that is no longer accessible because it is not a direct input to the current build step? If so, then I think the real problem is cgocall's questionable assumption that it can open the file referred to by //line. I would suggest using FileSet.PositionFor(pos, false) to ignore the //line directive, but the cgocall logic added in https://go.dev/cl/147317 is quite explicit about the need to parse the "raw" cgo file:
Rather too clever indeed. Perhaps the solution is for cgocall to fail gracefully if it can't open the raw cgo file. |
The issue is that the relative path form I've just published a repo with a repro on https://github.com/podtserkovskiy/issue-69689 |
@podtserkovskiy, could you try applying this patch to see if it is a satisfactory solution? diff --git a/go/analysis/passes/cgocall/cgocall.go b/go/analysis/passes/cgocall/cgocall.go
index 613583a1a..3b01cb84f 100644
--- a/go/analysis/passes/cgocall/cgocall.go
+++ b/go/analysis/passes/cgocall/cgocall.go
@@ -7,12 +7,14 @@
package cgocall
import (
+ "errors"
"fmt"
"go/ast"
"go/format"
"go/parser"
"go/token"
"go/types"
+ "io/fs"
"log"
"os"
"strconv"
@@ -47,6 +49,9 @@ func run(pass *analysis.Pass) (interface{}, error) {
cgofiles, info, err := typeCheckCgoSourceFiles(pass.Fset, pass.Pkg, pass.Files, pass.TypesInfo, pass.TypesSizes)
if err != nil {
+ if err == errNoSource {
+ return nil, nil
+ }
return nil, err
}
for _, f := range cgofiles {
@@ -55,6 +60,8 @@ func run(pass *analysis.Pass) (interface{}, error) {
return nil, nil
}
+var errNoSource = errors.New("cgo source unavailable")
+
func checkCgo(fset *token.FileSet, f *ast.File, info *types.Info, reportf func(token.Pos, string, ...interface{})) {
ast.Inspect(f, func(n ast.Node) bool {
call, ok := n.(*ast.CallExpr)
@@ -179,9 +186,18 @@ func typeCheckCgoSourceFiles(fset *token.FileSet, pkg *types.Package, files []*a
for _, raw := range files {
// If f is a cgo-generated file, Position reports
// the original file, honoring //line directives.
+ //
+ // This operation is of dubious validity because it accesses
+ // files that may not be accessible when the analysis driver
+ // runs in a build sandbox as used by Bazel.
+ // So, if we can't read the source file, silently give up.
filename := fset.Position(raw.Pos()).Filename // sic: Pos, not FileStart
- f, err := parser.ParseFile(fset, filename, nil, parser.SkipObjectResolution)
+ f, err := parser.ParseFile(fset, filename+"bonkers", nil, parser.SkipObjectResolution)
if err != nil {
+ if errors.Is(err, new(fs.PathError)) {
+ log.Printf("cgocall analyzer on package %s cannot read raw cgo source file %s (possible build sandbox?); skipping", pkg, filename)
+ return nil, nil, errNoSource
+ }
return nil, nil, fmt.Errorf("can't parse raw cgo file: %v", err)
}
found := false |
@adonovan This helps to silence the errors on this specific analyser, but this is not what I'd like to achieve. The problem is a bit more broad then The root cause is caused by the assumption of
The Then The full example is here https://github.com/podtserkovskiy/issue-69689 The CL 618276 is definitely fixes those problems for us, so linters on CGo files work the same way as they work on normal Go files. During my experiments I didn't find any particular example when CL 618276 break something. Do you have any specific examples when this CL 618276 can break existing code? I understand, the decision how to resolve relative paths is of course debatable, but the natural resolving relative to CWD seems to be consistent AFAICT across other go standard tools and less magical then the current one. What do you think about that? |
Do folks have thoughts on how to scale up this experiment to avoid #26671 repeating? |
@timothy-king it's still not clear to me that this is the correct fix.
I don't think this is clear. It seems to me that if the line directive says "dir/foo.go", I would expect that to be relative to the current file, because the file itself is not associated with any CWD. I think the line directives are just wrong. Why not just run |
@findleyr I did a quick experiment on github.com/podtserkovskiy/issue-69689
However the compiler produces errors with inconsistent paths when does typecheck if CGo files has type-errors.
i.e. the file-paths of files without But with
The issue is that the compiler and |
We need to clarify the way tools interpret relative names in Presumably @griesemer had a similar goal in CL 100235 as I can see from this comment
But this part of CL 100235 has been reverted to the old behaviour by CL 127658. We have at least two options how to standardise this behaviour:
Which one sounds better to you? Are there more options? |
The problem with option 1 is that it assumes the process's working directory is the same as what was used by -trimpath or the tool that generated the cgo file. That may be true for the compiler because of the strict environment in which it is run by go build. For example, the compiler's working directory is always GOROOT/src even when compiling, say, The only directory that can be reliably used as as a base is the parent directory of the source file itself; that is, option 2. In short, I think the problem is in GOROOT/src/cmd, not in go/scanner or x/tools. |
I have filed a new proposal #70478 to change the compiler, and suggest that we close this one. |
Proposal Details
The current behaviour
The
go/scanner.Scanner.updateLineInfo
prepends name of*.cgo1.go
file directory to the relative path of the original CGo file taken from//line
comment.For example:
Let's parse it then with
parser.ParseFile(fset, "cgo_objdir/test1.cgo1.go", nil, parser.ParseComments)
and checkfset
value.The issue is that
fset
will containgo/token.File
withinfos[0].lineInfo.Filename = "cgo_objdir/src/test1.go"
.The path
"cgo_objdir/src/test1.go"
inlineInfo.Filename
is invalid.There was an attempt to fix this behaviour, but it was rollbacked in CL 127658 due to Issue 26671
When this becomes a problem?
In build systems like Buck2 or Bazel the build actions may be executed on different hosts, so we can't bake absolute paths inside build artifacts.
We have a way to prevent this by adding
-trimpath $(pwd)
to build commands.However the
go/scanner
will compute invalid filepath in this case as shown in the above.A specific example
The
cgocall
analyser ingolang.org/x/tools
is trying to access the original source file byfset.Position(raw.Pos()).Filename
, but fails when relative filepaths used because the filename is invalid.This doesn't allow seamless integration of Buck2/Bazel with code analysis tools.
Possible solutions
Option 1: Bold
Rollback CL 127658, but this probably can break some existing linters or other programs like it was in Issue 26671.
Option 2: Safe
Rollback CL 127658 and add a
GODEBUG
constant to minimise potential damage to existing programs.goscannerprefixlinedir = "1"
- old behaviourgoscannerprefixlinedir != "1"
- new behaviourI'm not sure what the default behaviour of should be for go1.24.
I'm happy with any option as we can set
GODEBUG
env-var before running code analysers.EDIT: updated
GODEBUG
variable nameThe text was updated successfully, but these errors were encountered: