Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial Unison support #361

Merged
merged 13 commits into from
Apr 26, 2024
Merged

Conversation

zetashift
Copy link
Contributor

Let me know if anything is missing!

Copy link

cla-bot bot commented Mar 22, 2024

We require contributors to sign our Contributor License Agreement, and we don't have @zetashift on file. You can sign our CLA at https://zed.dev/cla. Once you've signed, post a comment here that says '@cla-bot check'.

@zetashift
Copy link
Contributor Author

@cla-bot check

Copy link

cla-bot bot commented Mar 22, 2024

We require contributors to sign our Contributor License Agreement, and we don't have @zetashift on file. You can sign our CLA at https://zed.dev/cla. Once you've signed, post a comment here that says '@cla-bot check'.

Copy link

cla-bot bot commented Mar 22, 2024

The cla-bot has been summoned, and re-checked this pull request!

@zetashift
Copy link
Contributor Author

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Mar 22, 2024
Copy link

cla-bot bot commented Mar 22, 2024

The cla-bot has been summoned, and re-checked this pull request!

@maxdeviant maxdeviant changed the title add initial unison support Add initial unison support Mar 22, 2024
@maxdeviant
Copy link
Member

maxdeviant commented Mar 22, 2024

@zetashift Can we drop the -zed suffixes from the extension ID and name?

Since these are all Zed extensions, there isn't a need to specify.

https://github.com/zetashift/unison-zed/blob/92ebe2290a432d4312c98e5e1b42b8d7237bb3a9/extension.toml#L1-L2

It should be:

id = "unison"
name = "Unison"

It looks like the extension.toml is also missing a repository field and a schema_version field (which should be set to 1).

@maxdeviant maxdeviant changed the title Add initial unison support Add initial Unison support Mar 22, 2024
@zetashift
Copy link
Contributor Author

@maxdeviant I've updated the extension.toml: https://github.com/zetashift/unison-zed/blob/main/extension.toml
Do I have to do anything else to update the git submodule?

@maxdeviant
Copy link
Member

@maxdeviant I've updated the extension.toml: https://github.com/zetashift/unison-zed/blob/main/extension.toml Do I have to do anything else to update the git submodule?

You need to bump the submodule commit to the new one.

Typically this is done by going into the submodule directory (extensions/unison, in this case), and then checking out main / master, and then you should see a commit change in the extensions repo that needs to be committed.

@zetashift
Copy link
Contributor Author

@maxdeviant I've updated the extension.toml: https://github.com/zetashift/unison-zed/blob/main/extension.toml Do I have to do anything else to update the git submodule?

You need to bump the submodule commit to the new one.

Typically this is done by going into the submodule directory (extensions/unison, in this case), and then checking out main / master, and then you should see a commit change in the extensions repo that needs to be committed.

Ah thank you! I also saw that I named the path wrong, e.g. it should be unison and not unison-zed, though that makes me wonder what would happen if another unison extensions pops up?

@maxdeviant
Copy link
Member

@maxdeviant I've updated the extension.toml: https://github.com/zetashift/unison-zed/blob/main/extension.toml Do I have to do anything else to update the git submodule?

You need to bump the submodule commit to the new one.
Typically this is done by going into the submodule directory (extensions/unison, in this case), and then checking out main / master, and then you should see a commit change in the extensions repo that needs to be committed.

Ah thank you! I also saw that I named the path wrong, e.g. it should be unison and not unison-zed, though that makes me wonder what would happen if another unison extensions pops up?

Could you say more about this? I'm not sure I follow.

I would think any improvements to the Unison extension should go towards your extension, as opposed to creating new ones.

@zetashift
Copy link
Contributor Author

zetashift commented Mar 25, 2024

I would think any improvements to the Unison extension should go towards your extension, as opposed to creating new ones.

Ah this was exactly what my question was about. I'll probably share that along then.

https://github.com/zed-industries/extensions/actions/runs/8392887233/job/23063430084?pr=361#step:9:19
 I'm not sure what this error is about, is my extension configured wrong for wasm building?

@maxdeviant
Copy link
Member

https://github.com/zed-industries/extensions/actions/runs/8392887233/job/23063430084?pr=361#step:9:19
 I'm not sure what this error is about, is my extension configured wrong for wasm building?

It's failing due to this error:

Error: Failed to instantiate wasm module: invalid import 'strcat'

We don't expose strcat to Wasm builds of Tree-sitter grammars. We're going to add strncat, and then the grammar will need to be updated to use that instead.

@maxbrunsfeld
Copy link
Contributor

We opened an issue on the Unison grammar repo: kylegoetz/tree-sitter-unison#57.

@zetashift
Copy link
Contributor Author

We're going to add strncat

Is there an issue or something to see when this lands in tree-sitter, and when Zed uses that specific version? Else I might forget this PR :P

@maxdeviant
Copy link
Member

We're going to add strncat

Is there an issue or something to see when this lands in tree-sitter, and when Zed uses that specific version? Else I might forget this PR :P

Tree-sitter already supports strncat on the latest master: tree-sitter/tree-sitter@0fc92c9

Right now we're blocked on the upstream grammar resolve this issue: kylegoetz/tree-sitter-unison#57 (or if you want to fork it to make the necessary adjustments).

@zetashift
Copy link
Contributor Author

zetashift commented Apr 12, 2024

Tree-sitter already supports strncat on the latest master: tree-sitter/tree-sitter@0fc92c9

Does Zed( or the extensions part) use the master or is it pinned to a specific version?

Right now we're blocked on the upstream grammar resolve this issue

I have a PR up kylegoetz/tree-sitter-unison#58 ! Let's see if it gets merged anytime soon :)

@maxdeviant
Copy link
Member

Tree-sitter already supports strncat on the latest master: tree-sitter/tree-sitter@0fc92c9

Does Zed( or the extensions part) use the master or is it pinned to a specific version?

We patch our tree-sitter crate to a specific SHA:

https://github.com/zed-industries/zed/blob/cd5ddfe34bdda1626d9702cf2087b093acfa744c/Cargo.toml#L391-L392

And right now we're pinned to tree-sitter/tree-sitter@7f21c3b, so Zed already supports Wasm extensions using strncat.

@zetashift
Copy link
Contributor Author

[2024-04-16T20:54:38Z INFO  extension::extension_builder] finished compiling extension /home/runner/work/extensions/extensions/extensions/unison
Error: Failed to instantiate wasm module: invalid import 'fwrite'

Is this WASM error from the parser or something external? I grepped tree-sitter-unison and it doesn't seem to use fwrite?

@maxbrunsfeld
Copy link
Contributor

Maybe it calls fprintf or similar? If so, maybe that can be wrapped in an ifdef, disabling it for wasm builds? or remove it completely?

@zetashift
Copy link
Contributor Author

zetashift commented Apr 18, 2024

Maybe it calls fprintf or similar? If so, maybe that can be wrapped in an ifdef, disabling it for wasm builds? or remove it completely?

Seems it does: https://github.com/search?q=repo%3Akylegoetz%2Ftree-sitter-unison%20fprintf&type=code
They seem to be enabled only if it's DEBUG and log_level is enabled.

@kylegoetz
Copy link

@maxbrunsfeld could you or zetashift check this PR? kylegoetz/tree-sitter-unison#68

I'm conditionally excluding the fprintf stuff if __EMSCRIPTEN__ is defined in the preprocessor. Based on Emscripten docs, all WASM builds using Emscripten define it.

@maxdeviant
Copy link
Member

maxdeviant commented Apr 18, 2024

@maxbrunsfeld could you or zetashift check this PR? kylegoetz/tree-sitter-unison#68

I'm conditionally excluding the fprintf stuff if __EMSCRIPTEN__ is defined in the preprocessor. Based on Emscripten docs, all WASM builds using Emscripten define it.

We've used the __wasm32__ directive in other instances: 6cdh/tree-sitter-racket#8

Emscripten defines it as well, I believe, but it's not Emscripten-specific.

We don't use Emscripten when compiling grammars for Zed AFAIK (unless it's somehow used under the hood). We're using Clang 17: https://github.com/zed-industries/zed/blob/a202499c9a61bbb7ec2498b196c94237223372a0/crates/extension/src/extension_builder.rs#L209-L219

@kylegoetz
Copy link

@maxdeviant thanks, i've updated that branch to use the directive and we'll see if it works for @zetashift and then i'll merge that branch into main

@zetashift
Copy link
Contributor Author

@kylegoetz this indeed fixes the fwrite error, but now the following error pops up for me locally:

extensions on  add-unison [!?]
❯ ./zed-extension  --scratch-dir ./scratch --source-dir extensions/unison --output-dir output
Error: Failed to instantiate wasm module: invalid import 'atof'

@kylegoetz
Copy link

@maxbrunsfeld or @maxdeviant , any ideas for why this is happening, and if teh list of what is available is overly strict? If atof is not available to parse a string to a double, is strtod? Is there a definitive list of what Zed makes available to TSGs? I don't want to make poor @zetashift play telephone whack-a-mole with me as I fix one of these and they re-build and get another invalid import, and we repeat this a bunch.

@maxdeviant
Copy link
Member

@maxbrunsfeld or @maxdeviant , any ideas for why this is happening, and if teh list of what is available is overly strict? If atof is not available to parse a string to a double, is strtod? Is there a definitive list of what Zed makes available to TSGs? I don't want to make poor @zetashift play telephone whack-a-mole with me as I fix one of these and they re-build and get another invalid import, and we repeat this a bunch.

Here is the list of stdlib symbols that we expose to Tree-sitter Wasm builds: https://github.com/tree-sitter/tree-sitter/blob/4cd23ff6b0b74f35996cfec873c3ca6b0bfac2f7/lib/src/wasm/stdlib-symbols.txt

We can expose more, if needed, but we've been conservative with adding them since each one can increase the base size of each Wasm bundle.

@kylegoetz
Copy link

kylegoetz commented Apr 20, 2024

@maxdeviant

Am I able to use assert.h? How about inttypes.h? string.h? wctype.h? math.h? ctype.h? I'm trying to figure out how substantially I'd need to refactor scanner.c so it can be compiled into Zed. I think I use inttypes.h because it's the only way to reliably ensure cross-platform long size. Otherwise Windows and Linux do not have the same length for long, and then logging the variables isn't possible because the format code for an int64_t for example is different depending on platform. (In fact, I added it recently because the tree-sitter Github action for building and deploying to NPM was failing on Windows, and the fix was to use this lib.

@maxdeviant
Copy link
Member

@maxdeviant

Am I able to use assert.h? How about inttypes.h? string.h? wctype.h? math.h? ctype.h? I'm trying to figure out how substantially I'd need to refactor scanner.c so it can be compiled into Zed. I think I use inttypes.h because it's the only way to reliably ensure cross-platform long size. Otherwise Windows and Linux do not have the same length for long, and then logging the variables isn't possible because the format code for an int64_t for example is different depending on platform. (In fact, I added it recently because the tree-sitter Github action for building and deploying to NPM was failing on Windows, and the fix was to use this lib.

I'll defer to @maxbrunsfeld here, as he's more familiar with Tree-sitter.

@maxbrunsfeld
Copy link
Contributor

@kylegoetz All types should be fine to use, but there are only certain libc functions available when running Tree-sitter parsers via wasm. The core library itself needs to provide the libc functions. If you run tree-sitter build --wasm in your grammar repo, it will tell you if you are using any functions that are unavailable in wasm, and output a list of the functions that are available.

@kylegoetz
Copy link

kylegoetz commented Apr 25, 2024

Okay, the only issue is atof, which I can't just copy-paste into scanner.c because it's platform-dependent code as per the C standard. So I opened a ticket and filed a PR for it in the tree-sitter repo. I don't know how to update wasm-stdlib.h, though.

That is the blocker for this PR, and I can't remove it because I don't know another way of converting string to double to validate that the sequence that's been read can be parsed as a Float in Unison.

@maxbrunsfeld
Copy link
Contributor

@kylegoetz do you need to store the actual float value represented by a string, or do you just need to check if it is a float?

If the latter, I think it would be better to parse the float manually, since any libc function like atof or strtof is going have locale-specific details that may not match the actual syntax of the Unison language. atof is particularly problematic, because it just returns zero on failure AFAICT.

it's platform-dependent code as per the C standard

To be clear, that's a reason not to use atof right? Your unison parser probably shouldn't behave differently on different platforms.

@kylegoetz
Copy link

kylegoetz commented Apr 26, 2024

Thanks for the advice, everyone! One issue was Unison's "REPL" would accept out-of-bounds Floats and pretty print them as Infinity. I've confirmed with Unison devs that Infinity is still of type Float, so there's no issue about needing to determine float or infinity tokens by bounds checking. No more atof.

So I've removed bounds checking from the scanner, which has dramatically simplified numeric parsing!

@zetashift mind checking to see if you can build now? tree-sitter build-wasm works without error for me. Latest commit on master (or you can use tag 1.1.4)

@maxdeviant maxdeviant merged commit 979afd0 into zed-industries:main Apr 26, 2024
2 checks passed
@maxdeviant
Copy link
Member

@zetashift @kylegoetz Thank you both for all your work getting this one published!

@kylegoetz
Copy link

@zetashift @kylegoetz Thank you both for all your work getting this one published!

Thanks for the patience to your and @maxbrunsfeld for patiently explaining some stuff to me.

@zetashift
Copy link
Contributor Author

Thanks for getting this through the line! It's @kylegoetz who did the hard part! 🥇

cabrinha pushed a commit to cabrinha/extensions that referenced this pull request Aug 9, 2024
Let me know if anything is missing!

---------

Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
adorabilis pushed a commit to adorabilis/zed-extensions that referenced this pull request Aug 17, 2024
Let me know if anything is missing!

---------

Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants