An attempt at an open-source version of the Logseq Sync service, intended for individual, self-hosted use.
It's vaguely functional (see What Works? below), but decidedly pre-alpha software. Definitely don't try to point a real, populated Logseq client at it, I have no idea what will happen.
Right now, the repo contains (in cmd/server
) a mostly implemented version of the Logseq API, including credentialed blob uploads, signed blob downloads, a SQLite database for persistence, and most of the API surface at least somewhat implemented.
Currently, running any of this requires a modified version of the Logseq codebase (here), and the @logseq/rsapi
package (here)
On that note, many thanks to the Logseq Team for open-sourcing rsapi
recently, it made this project significantly easier to work with.
With a modified Logseq, you can use the local server to
- Create a graph
- Upload (passphrase-encrypted) encryption keys
- Get temporary AWS credentials to upload your encrypted files to your private S3 bucket
- Upload your encrypted files
And that's basically the full end-to-end flow! The big remaining things are:
- Implement the WebSockets protocol
- There's some documentation for it
- Figure out how/when to increment the transaction (
tx
) counter
There's some documentation for the API in docs/API.md. This is the area I could benefit the most from having more information/help on, see Contributing below
The real Logseq Sync API gets temp S3 credentials and uploads files direct to S3. I haven't looked closely enough to see if we can swap this out for something S3-compatible like s3proxy or MinIO, see #2 for a bit more discussion.
Currently, amazonaws.com
is hardcoded in the client, so that'll be part of a larger discussion on how to make all of this configurable in the long run.
Being able to connect to a self-hosted sync server requires some changes to Logseq as well, namely to specify where your sync server can be accessed. Those changes are in a rough, non-functional state here: https://github.com/logseq/logseq/compare/master...bcspragu:logseq:brandon/settings-hack
The self-hosted sync backend has rudimentary support for persistence in a SQLite database. We use sqlc to do Go codegen for SQL queries, and Atlas to manage generating diffs.
The process for changing the database schema looks like:
- Update
db/sqlite/schema.sql
with your desired changes - Run
./scripts/add_migration.sh <name of migration>
to generate the relevant migration - Run
./scripts/apply_migrations.sh
to apply the migrations to your SQLite database
With this workflow, the db/sqlite/migrations/
directory is more or less unused by both sqlc
and the actual server program. The reason it's structured this way is to keep a more reviewable audit log of the changes to a database, which a single schema.sql
doesn't give you.
If you're interested in contributing, thanks! I sincerely appreciate it. There's a few main avenues for contributions:
The main blocker right now is getting buy-in from the Logseq team, as I don't want to do the work to add self-hosting settings to the Logseq codebase if they won't be accepted upstream. I've raised the question on the Logseq forums, as well as in a GitHub Discussion on the Logseq repo, but have received no official response.
One area where I would love help is specifying the official API more accurately. My API docs are based on a dataset of one, my own account. So there are areas that are underspecified, unknown, or where I just don't understand the flow. Any help there would be great!
Specifically, I'd like to understand:
- The details of the WebSocket protocol (doc started here), and
- How and when to update the transaction counter,
tx
in the API
I believe there's a bug (filed upstream, initially here) in the s3-presign
crate used by Logseq's rsapi
component, which handles the actual sync protocol bits (encryption, key generation, S3 upload, etc).
The bug causes flaky uploads with self-hosted, AWS-backed (i.e. S3 + STS) servers, but I haven't had the time to investigate the exact root cause. The source code for the s3-presign
crate is available here, the GitHub repo itself doesn't appear to be public.