Original goal was to replica uoj-judger/run_program in GO language using libseccomp. As technology grows, it also implements new technologies including Linux namespace and cgroup.
The idea of rootfs and interval CPU usage checking comes from syzoj/judge-v3 and the pooled pre-forked container comes from vijos/jd4.
If you are looking for sandbox implementation via REST / gRPC API, please check go-judge.
Notice: Only works on Linux since ptrace, unshare, cgroup are available only on Linux
- install latest go compiler from golang/download
- install libseccomp library: (for Ubuntu)
apt install libseccomp-dev
- build & install:
go install github.com/criyle/go-sandbox/...
- Restricted computing resource by POSIX rlimit: Time & Memory (Stack) & Output
- Restricted syscall access (by libseccomp & ptrace)
- Restricted file access (read & write & access & exec). Evaluated by UOJ FileSet
Improvements:
- Precise resource limits (s -> ms, mb -> kb)
- More architectures (arm32, arm64)
- Allow multiple traced programs in different threads
- Allow pipes as input / output files
Default file access syscall check:
- check file read / write:
open
,openat
- check file read:
readlink
,readlinkat
- check file write:
unlink
,unlinkat
,chmod
,rename
- check file access:
stat
,lstat
,access
,faccessat
- check file exec:
execve
,execveat
- Unshare & bind mount rootfs based on hostfs (eliminated ptrace)
- Use Linux Control Groups to limit & acct CPU & memory (eliminated wait4.rusage)
- Container tech with execveat memfd, sethostname, setdomainname
- Normal (no error)
- Program Error
- Resource Limit Exceeded
- Time
- Memory
- Output
- Unauthorized Access
- Disallowed Syscall
- Runtime Error
- Signalled
SIGXCPU
/SIGKILL
are treated as TimeLimitExceeded by rlimit or caller killSIGXFSZ
is treated as OutputLimitExceeded by rlimitSIGSYS
is treaded as Disallowed Syscall by seccomp- Potential Runtime error are:
SIGSEGV
(segment fault)
- Nonzero Exit Status
- Signalled
- Resource Limit Exceeded
- Program Runner Error
type Result struct {
Status // result status
ExitStatus int // exit status (signal number if signalled)
Error string // potential detailed error message (for program runner error)
Time time.Duration // used user CPU time (underlying type int64 in ns)
Memory Size // used user memory (underlying type uint64 in bytes)
// metrics for the program runner
SetUpTime time.Duration
RunningTime time.Duration
}
Configured runner to run the program. Context
is used to cancel (control time limit exceeded event; should not be nil).
type Runner interface {
Run(context.Context) <-chan runner.Result
}
- Pre-fork container to run programs inside
- Unix socket to pass fd inside / outside
Container / Host Communication Protocol (single thread):
- ping (alive check):
- reply: pong
- conf (set configuration):
- reply pong
- open (open files in given mode inside container):
- send: []OpenCmd
- reply: "success", file fds / "error"
- delete (unlink file / rmdir dir inside container):
- send: path
- reply: "finished" / "error"
- reset (clean up container for later use (clear workdir / tmp)):
- send:
- reply: "success"
- execve: (execute file inside container):
- send: argv, env, rLimits, fds
- reply:
- success: "success", pid
- failed: "failed"
- send (success): "init_finished" (as cmd)
- reply: "finished" / send: "kill" (as cmd)
- send: "kill" (as cmd) / reply: "finished"
- reply:
Any socket related error will cause the container exit (with all process inside container)
Container restricted environment is accessed though RPC interface defined by above protocol
Provides:
- File access
- Open: create / access files
- Delete: remove file
- Management
- Ping: alive check
- Reset: remove temporary files
- Destroy: destroy the container environment
- Run program
- Execve: execute program with given parameters
type Environment interface {
Ping() error
Open([]OpenCmd) ([]*os.File, error)
Delete(p string) error
Reset() error
Execve(context.Context, ExecveParam) <-chan runner.Result
Destroy() error
}
- seccomp: provides seccomp type definition
- libseccomp: provides utility function that wrappers libseccomp
- forkexec: fork-exec provides mount, unshare, ptrace, seccomp, capset before exec
- memfd: read regular file and creates a sealed memfd for its contents
- unixsocket: send / recv oob msg from a unix socket
- cgroup: creates cgroup directories and collects resource usage / limits
- mount: provides utility function that wrappers mount syscall
- rlimit: provides utility function that defines rlimit syscall
- pipe: provides wrapper to collect all written content through pipe
- cmd/runprog/config: defines arch & language specified trace condition for ptrace runner from UOJ
- container: creates pre-forked container to run programs inside
- runner: interface to run program
- ptrace: wrapper to call forkexec and ptracer
- filehandler: an example implementation of UOJ file set
- unshare: wrapper to call forkexec and unshared namespaces
- ptrace: wrapper to call forkexec and ptracer
- ptracer: ptrace tracer and provides syscall trap filter context
- runprog: safely run program by unshare / ptrace / pre-forked containers
- config/config.go: all configs toward running specs (similar to UOJ)
- 5.19:
memory.peak
in cgroup v2 - 4.15: cgroup v2
- 4.14: SECCOMP_RET_KILL_PROCESS
- 4.6: CLONE_NEWCGROUP
- 3.19: execveat()
- 3.17: seccomp, memfd_create
- 3.10: CentOS 7
- 3.8: CLONE_NEWUSER without CAP_SYS_ADMIN, CAP_SETUID, CAP_SETGID
- 3.5: prctl(PR_SET_NO_NEW_PRIVS)
- 2.6.36: prlimit64
$ go test -bench . -benchtime 10s
goos: linux
goarch: amd64
pkg: github.com/criyle/go-sandbox/pkg/forkexec
BenchmarkSimpleFork-4 12409 996096 ns/op
BenchmarkUnsharePid-4 10000 1065168 ns/op
BenchmarkUnshareUser-4 10000 1061770 ns/op
BenchmarkUnshareUts-4 10000 1056558 ns/op
BenchmarkUnshareCgroup-4 10000 1049446 ns/op
BenchmarkUnshareIpc-4 709 16114052 ns/op
BenchmarkUnshareMount-4 745 16207754 ns/op
BenchmarkUnshareNet-4 3643 3492924 ns/op
BenchmarkFastUnshareMountPivot-4 612 20967318 ns/op
BenchmarkUnshareAll-4 837 14047995 ns/op
BenchmarkUnshareMountPivot-4 488 24198331 ns/op
PASS
ok github.com/criyle/go-sandbox/pkg/forkexec 147.186s
$ go test -bench . -benchtime 10s
goos: linux
goarch: amd64
pkg: github.com/criyle/go-sandbox/container
BenchmarkContainer-4 5907 2062070 ns/op
PASS
ok github.com/criyle/go-sandbox/container 21.763s
$ go test -bench . -benchtime 10s
goos: linux
goarch: amd64
pkg: github.com/criyle/go-sandbox/pkg/cgroup
BenchmarkCgroup-4 50283 245094 ns/op
PASS
ok github.com/criyle/go-sandbox/pkg/cgroup 14.744s
Blocking:
$ go test -bench . -benchtime 10s
goos: linux
goarch: amd64
pkg: github.com/criyle/go-sandbox/pkg/unixsocket
cpu: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
BenchmarkBaseline-8 12170148 1048 ns/op
BenchmarkGoroutine-8 2658846 4910 ns/op
BenchmarkChannel-8 8454133 1431 ns/op
BenchmarkChannelBuffed-8 8767264 1357 ns/op
BenchmarkChannelBuffed4-8 9670935 1230 ns/op
BenchmarkEmptyGoroutine-8 34927512 342.8 ns/op
PASS
ok github.com/criyle/go-sandbox/pkg/unixsocket 83.669s
Non-block:
$ go test -bench . -benchtime 10s
goos: linux
goarch: amd64
pkg: github.com/criyle/go-sandbox/pkg/unixsocket
cpu: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
BenchmarkBaseline-8 11609772 1001 ns/op
BenchmarkGoroutine-8 2470767 4788 ns/op
BenchmarkChannel-8 8488646 1427 ns/op
BenchmarkChannelBuffed-8 8876050 1345 ns/op
BenchmarkChannelBuffed4-8 9813187 1212 ns/op
BenchmarkEmptyGoroutine-8 34852828 342.2 ns/op
PASS
ok github.com/criyle/go-sandbox/pkg/unixsocket 81.679s