The system call API to Magenta is well documented in the following pages:
- System Calls - "man page"-like documentation, one page per system call.
- Concepts - Overview of concepts, including about Handles and Rights.
The system call API is not stable and not intended for direct use by developers. The kernel only allows system calls from distinguished addresses in a shared library, and the shared library forms the stable API that developers should use. In the future, the mapping of system call numbers may change. In fact, in the future, these numbers may be randomized between different processes running on the same machine!
System call stubs in the shared library are generated at compile time.
System call numbers are assigned linearly in a dense packing. At
the current time, the same specification is used for both Aarch64
and x86_64, so both architectures share the same system call numbers.
The stubs are generated by the system/public/magenta/syscalls.sysgen
tool with a command such as:
./build-magenta-pc-x86-64/tools/sysgen \
-kernel-code ./build-magenta-pc-x86-64/gen/include/magenta/syscall-invocation-cases.inc \
-trace ./build-magenta-pc-x86-64/gen/include/magenta/syscall-ktrace-info.inc \
-category ./build-magenta-pc-x86-64/gen/include/magenta/syscall-category.inc \
-kernel-header ./build-magenta-pc-x86-64/gen/include/magenta/syscall-definitions.h \
-arm-asm ./build-magenta-pc-x86-64/gen/include/magenta/syscalls-arm64.S \
-x86-asm ./build-magenta-pc-x86-64/gen/include/magenta/syscalls-x86-64.S \
-vdso-header ./build-magenta-pc-x86-64/gen/include/magenta/syscall-vdso-definitions.h \
-vdso-wrappers ./build-magenta-pc-x86-64/gen/include/magenta/syscall-vdso-wrappers.inc \
-numbers ./build-magenta-pc-x86-64/gen/include/magenta/mx-syscall-numbers.h \
-user-header ./build-magenta-pc-x86-64/gen/include/magenta/syscalls/definitions.h \
-rust ./build-magenta-pc-x86-64/gen/include/magenta/syscalls/definitions.rs \
system/public/magenta/syscalls.sysgen
You can view the resulting assembly by using objdump -d
on the
generated files. Here is a small sample from
syscalls-x86-64.S.o
and syscalls-arm64.S.o
:
00000000000001bd <SYSCALL_mx_socket_read>:
1bd: 41 52 push %r10
1bf: 41 53 push %r11
1c1: 49 89 ca mov %rcx,%r10
1c4: b8 16 00 00 00 mov $0x16,%eax
1c9: 0f 05 syscall
00000000000001cb <CODE_SYSRET_mx_socket_read_VIA_mx_socket_read>:
1cb: 41 5b pop %r11
1cd: 41 5a pop %r10
1cf: c3 retq
0000000000000108 <SYSCALL_mx_socket_read>:
108: d28002d0 mov x16, #0x16 // #22
10c: d4000001 svc #0x0
0000000000000110 <CODE_SYSRET_mx_socket_read_VIA_mx_socket_read>:
110: d65f03c0 ret
At the current time (subject to change!), x86_64 passes the
system call number in %eax
and passes
arguments in %rdi, %rsi, %rdx, %r10, %r8, %r9, %r12, %r13
.
Aarch64 passes the system call number in x16
and arguments
in x0 .. x7
.
System calls are usually invoked through libmagenta.so
which
is a shared library provided to userland processes as a
VDSO. This library is specially and cannot be unmapped or
mapped over. The kernel only accepts system calls made from
specific addresses within this VDSO.
System calls arrive in the kernel as a fault. The x86_64 handler
is found in ./kernel/arch/x86/64/syscall.S
at FUNCTION(x86_syscall)
.
It dispatches to unknown_syscall
for out-of-range numbers, or
jumps through a dispatch table that is automatically generated. Each
entry comes from the syscall_dispatch
macro, and calls to a generated
wrapper function. Wrappers can be found in the generated
syscall-kernel-wrappers.inc
file, and currently look like:
x86_64_syscall_result wrapper_socket_read(mx_handle_t handle, uint32_t options, void* buffer, size_t size, size_t* actual, uint64_t ip) {
return do_syscall(MX_SYS_socket_read, ip, &VDso::ValidSyscallPC::socket_read, [&]() {
return static_cast<uint64_t>(sys_socket_read(handle, options, make_user_ptr(buffer), size, make_user_ptr(actual)));
});
}
The do_syscall
function in kernel/lib/syscalls/syscalls.cpp
takes care of common syscall handling, such as implementing kernel
tracing. It also performs an unusual check -- it calls a function
which verifies the caller's Instruction Pointer. Any calls that
don't originate from the right address in a shared library are
rejected. This prevents developers from directly making system calls
without going through the shared library. (Note: Magenta prevents
processes from unmapping this library and mapping their own code
in its place).
The check is implemented
in generated functions such as VDso::ValidSyscallPC::socket_read
,
which are passed in by the wrapper. These can be found in
the generated vdso-valid-sysret.h
header. A typical example
is:
static bool socket_read(uintptr_t offset) {
switch (offset) {
case VDSO_CODE_SYSRET_mx_socket_read_VIA_mx_socket_read - VDSO_CODE_START:
return true;
}
return false;
}
This in turn references a generated offset, such as VDSO_CODE_SYSRET_mx_socket_read_VIA_mx_socket_read
, defined in vdso-code.h
:
#define VDSO_CODE_SYSRET_mx_socket_read_VIA_mx_socket_read 0x0000000000006671
If the valid_pc
check succeeds, the system call handler passed in
by the wrapper (ie. sys_socket_read
) is finally called.
System call handlers are written in C++ and can be found under
the kernel/lib/syscalls
directory. For example, sys_socket_read
is found in kernel/lib/syscalls/syscalls_socket.cpp
.
System calls typically (XXX always?) make use of a dispatcher
object derived from the Dispatcher
class.
System calls use handles to reference kernel objects, and these are already well documented. Handles belong to a single process (or the kernel, while in transit), and can be sent between processes. Multiple handles in multiple processes can reference the same underlying kernel object.
Handle values are obfuscated by the kernel before being revealed
to userland. The mechanism used is subject
to change, but currently makes use of a per-process secret. The
secret is set while creating a ProcessDispatcher
object. The
handle_rand_
member is set to a 29-bit value generated by a
cryptographic random number generator. The high bit and the
two low bits are zeroed. The map_handle_to_value
function
maps kernel handle values to mx_handle_t
values used by userland
applications by an XOR with the secret (passed in mixer
below):
static mx_handle_t map_handle_to_value(const Handle* handle, mx_handle_t mixer) {
// Ensure that the last bit of the result is not zero, and make sure
// we don't lose any base_value bits or make the result negative
// when shifting.
DEBUG_ASSERT((mixer & ((1<<31) | 0x1)) == 0);
DEBUG_ASSERT((handle->base_value() & 0xc0000000) == 0);
auto handle_id = (handle->base_value() << 1) | 0x1;
return mixer ^ handle_id;
}