Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for cache management #319

Merged
merged 4 commits into from
Dec 8, 2024
Merged

Conversation

KornelH
Copy link
Contributor

@KornelH KornelH commented Nov 21, 2024

Add a configuration setting to change the storage mode for large cache tables (mainly used for code navigation). Available modes:

  • memory (default, original): use ETS tables
  • compressed memory: use ETS tables with compressed flag turned on.
  • file: use DETS tables saved into temporary files.

Why is this configuration option needed?

In shared environment, where many instances of vscode_erlang run in the same time on large code bases, noticeable amount of memory can be consumed by this extension that eventually can exhaust all the memory.

Notes for compressed memory:

Some simple tests, done in the above explained scenario, showed memory consumption dropped by 50% when compressed memory is configured.

Notes for file:

As multiple extension instances can run in the same time on the same host, even in the same workspace, we cannot use static filenames, nor workspace specific filenames to create DETS tables. The only option is to use unique filenames, like temporary files. Of course, it has a drawback, temporary files must be deleted on exit otherwise those will consume a lot of disk space after a while. If the extension is shut down normally, it's not a problem, the extension deletes it's own cache.

However, if some extension is terminated brutally, the cache files left on the disk. Hence, an automatic mechanism is hooked to the points when configuration is received from Visual Studio Code, and to normal shutdown, to look for cache directories created by yet not running extension, and delete them.

Reusing old cache files is theoretically possible if only one extension instance run in a workspace, VSCode is closed and reopened, then the new extension instance could continue where the previous one stopped. But, as multiple extension instances can run in the same workspace, it can easily go wrong. Therefore it is simpler to use instance specific DETS files, just like instance specific ETS tables.

Kornel Horvath added 4 commits November 21, 2024 21:17
Extract an explicit function expression to a variable to fix broken
syntax highlight. For example add an `-spec` line or `-define` after
function `index_of` and those will not be syntax highlighted.
Erlang/OTP `supervisor` documentation says: [1]

> It is also allowed to set it to `infinity`, if the child process is a
> worker.
>
> Warning
>
> Be careful when setting the shutdown time to `infinity` when the child
> process is a worker. Because, in this situation, the termination of
> the supervision tree depends on the child process, it must be
> implemented in a safe way and its cleanup procedure must always
> return.

Therefore, the shutdown time of `gen_lsp_server` is changed from
`infinity` to default value, that is `5000` milliseconds for worker
processes.

Additionally, a sketch of the application supervision tree is added to
module header of `vscode_lsp_app` for easier understanding.

The `-behaviour` attributes are added to the corresponding modules, also
for easier understanding.

[1] https://www.erlang.org/doc/man/supervisor.html
Add a configuration setting to change the storage mode for large cache
tables (mainly used for code navigation). Available modes:

- `memory` (default, original): use ETS tables
- `compressed memory`: use ETS tables with `compressed` flag turned on.
- `file`: use DETS tables saved into temporary files.

Why is this configuration option needed?

In shared environment, where many instances of `vscode_erlang` run in
the same time on large code bases, noticeable amount of memory can be
consumed by this extension that eventually can exhaust all the memory.

Notes for `compressed memory`:

Some simple tests, done in the above explained scenario, showed memory
consumption dropped by 50% when `compressed memory` is configured.

Notes for `file`:

As multiple extension instances can run in the same time on the same
host, even in the same workspace, we cannot use static filenames, nor
workspace specific filenames to create DETS tables. The only option is
to use unique filenames, like temporary files. Of course, it has a
drawback, temporary files must be deleted on exit otherwise those will
consume a lot of disk space after a while. If the extension is shut down
normally, it's not a problem, the extension deletes it's own cache.

However, if some extension is terminated brutally, the cache files left
on the disk. Hence, an automatic mechanism is hooked to the points when
configuration is received from Visual Studio Code, and to normal
shutdown, to look for cache directories created by yet not running
extension, and delete them.

Reusing old cache files is theoretically possible if only one extension
instance run in a workspace, VSCode is closed and reopened, then the
new extension instance could continue where the previous one stopped.
But, as multiple extension instances can run in the same workspace, it
can easily go wrong. Therefore it is simpler to use instance specific
DETS files, just like instance specific ETS tables.
@KornelH
Copy link
Contributor Author

KornelH commented Nov 21, 2024

This is a continuation of PR #313 that is canceled and evolved into this PR.

@KornelH
Copy link
Contributor Author

KornelH commented Nov 21, 2024

There is a yet unsolved issue in this PR, namely the deletion of an old cache directory created by an old extension instance with old PID, and there is a non-Erlang process during the check using the same PID. It could be filtered out, but it doesn't really cause a noticeable problem, as next or next-next time it will be deleted. Yet, it's not nice.

@KornelH
Copy link
Contributor Author

KornelH commented Nov 22, 2024

I list some measurements here for reference. The test project contains 2209 Erlang modules, approximately 110 MB.

memory

This is the baseline as this is the legacy and default method.

Total memory: 2.34 GB, ETS memory: 2.26 GB

ETS tables:

Table name Size Memory
syntax_tree 2209 1.04 GB
dodged_syntax_tree 2209 795.19 MB
references 273806 454.70 MB
documents_inlayhints 2209 2.60 MB
document_comments 1 3.40 KB

compressed memory

Total memory: 944.86 MB, ETS memory: 872.12 MB

ETS tables:

Table name Size Memory
references 273806 454.70 MB
syntax_tree 2209 233.35 MB
dodged_syntax_tree 2209 182.93 MB
documents_inlayhints 2209 2.61 MB
document_comments 1 3.42 KB

Note: ETS table references is not compressed, but the rest are. If references is compressed then it's size shrinks to 450 MB. The gain is negligible.

file

Total memory: 535.23 MB, ETS memory: 453.16 MB, DETS disk usage: 716.1 MB

ETS tables:

Table name Size Memory
references 273806 454.70 MB

DETS tables:

Table name Size Disk space
syntax_tree 2209 379.1 MB
dodged_syntax_tree 2209 336.7 MB
documents_inlayhints 2209 316.0 KB
document_comments 1 24.0 KB

@pgourlain
Copy link
Owner

Hi @KornelH ,

Sorry for the delay.
Nice Job !

You choose to store file in temporary file, but what do think about using storage locations provided by VsCode ?

On Windows: C:\Users<user>\AppData\Roaming\Code\User\globalStorage<extension-id>
On macOS: ~/Library/Application Support/Code/User/globalStorage/
On Linux: ~/.config/Code/User/globalStorage/

documentation : https://code.visualstudio.com/api/extension-capabilities/common-capabilities#data-storage

  • storageUri
  • globalStorageUri

@KornelH
Copy link
Contributor Author

KornelH commented Dec 7, 2024

Good idea. I haven't checked it yet but this is what I'm thinking about:

  1. As globalStorageUri is shared among all extension instances I don't think it's a good place to store instance specific cache files.
  2. What happens if multiple VSCode + extension instances are opened for the same workspace. Is storageUri unique per instance?
  3. If VSCode + extension is shut down, is the storageUri are cleaned automatically?
  4. In corporate environment users home area are often located on a network drive (e.g. NFS in *nix) that is not favorable to store cache files on due to quota and speed limitations. I'm not sure if Windows syncs AppData\Roaming between hosts when the user logs in to another host but based on it's name it looks a possible scenario to me.

I shall test 2) and if it's not unique I think simply using sotorageUri is not adequate. Maybe by combining it with e.g. extension instance unique ID (PID?) could be OK.

My other big concern is 4). Storing config on NFS is good, but large cache files is another story. This is why I wanted to use the temp directory as that is always resides on local disk and usually has no or high quota limit and fast.

@pgourlain pgourlain merged commit b2d4e83 into pgourlain:master Dec 8, 2024
1 check passed
@KornelH
Copy link
Contributor Author

KornelH commented Dec 9, 2024

I meant to discuss the alternatives ... but I'm fine with the current version too, hence I did it. Let's see if there will be any feedback for it.

@pgourlain
Copy link
Owner

For me 2 things

  1. your explanations are clear
  2. by default configuration is like before.

I agree, let's see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants