Skip to content

Latest commit

 

History

History
281 lines (239 loc) · 10.3 KB

CONTRIBUTING.org

File metadata and controls

281 lines (239 loc) · 10.3 KB

The easiest way to contribute to the Emacs Rust core is to help port C functions over to Rust. In the process of doing so it will often expose gaps where new data structures or functionality is needed. See the list at the end of this document for examples of functions that still need to be ported.

The following sections describe the basic types used in Rune and how they interact with each other.

Overview of the Object Hierarchy

The type Object is equivalent to LispObject in the C core. It is a tagged pointer that is a superset of all possible concrete types. It is always 64 bits. There are other Sum types that represent a different group of concrete types, such as Number which is all numeric types, or Function which is all callable types. These can be matched into the different concrete types by using the untag method.

The Gc<T> type represents a generic tagged pointer. Object is actually just a type alias for Gc<ObjectType>. All Gc<T> types share the same layout and bit patterns, meaning that any tagged pointer can be cast back to an Object with into() or as_obj().

Generally to go from a sum type (Object, List, Number, Function, etc) to a concrete type, you can either match on it with untag() or convert it to a subtype with try_into(). To convert a concert type back into a sum type, you can use into() if it is already a GC managed type (such as LispString, Cons, etc) or use into_obj(cx) if it is not GC managed.

Creating Objects

We strive for good interop with Rust native types. Many primitive types can be changed to a Lisp type using the IntoObject trait. This trait defines one method into_obj(cx) that converts a type into a lisp tagged pointer. For example "hello".into_obj(cx) will return a Gc<&LispString>, which is a tagged pointer to a Lisp String. Alternatively, there is an add method on Context that will call into_obj and then convert it to a Object. So cx.add("hello") will return Object, which is Gc<ObjectType>. Also a untagged object (e.g. &Cons) can be converted to a tagged pointer with the tag() method (returning Gc<&Cons>).

Lists

Lisp lists can be created using the list! macro. This accepts a variable number of arguments and converts each one into an object. To create just a single Cons, use Cons::new or Cons::new1 (when the cdr is nil).

Symbols

Builtin symbols that are defined in the Rust source are given constant values and definitions. For example to reference the lisp symbol lambda, you would use sym::LAMBDA. These constants can be used in match statements. For example to check if an object is the symbol lambda, you could use matches!(x, ObjectType::Symbol(sym::LAMBDA)). This can also be used to create objects from symbols like sym::LAMBDA.into(). All read symbols are interned. Uninterned symbols can be created with Symbol::new_uninterned.

Nil

nil is an important value and has special support. Since it is a just a regular symbol, you could just check for nil by looking for that symbol with matches!(x, ObjectType::Symbol(sym::NIL)). nil has a special constant for the ObjectType enum, so you could write that as matches!(x, ObjectType::NIL). But you can also just use x.is_nil() or x == NIL. To create a nil object, just use the constant value NIL. Same goes for value t via TRUE.

Ints

Just like in GNU Emacs, ints are represented as a unboxed fixnum. This means that integers can be converted to objects without needing the GC heap, and can be taken directly from an Object without following a pointer. However their range is less than a i64 due to tagging.

Context

The Context type is singleton that is passed through out the call graph and contains GC heap and other state. Creating any GC managed object requires using the Context. This type is normally named cx, and is passed as the last argument to functions that consume it. Calling garbage_collect requires mutable access to the context. This ensures that no object that is created with the Context can survive pass garbage collection (unless it is rooted). There can only be one instance of Context per thread.

Rooting

All objects created with a particular context cannot be accessed past garbage collection. To continue accessing an object, it needs to be rooted. This is done via the root! macro. It either takes an existing object and shadows the name with a rooted version (root!(obj, cx)), or takes an initializer (root!(x, new(Vec), cx) or root!(x, init(Vec::new()), cx)). Rust structures that are rooted have the type Rt<T> (which stands for “Root Traceable”). Objects that are rooted have the type Rto<T> (Root Traceable Object) which is a type alias for Rt<Slot<T>>. The Slot type allows object pointers to be updated during garbage collection.

Environment

The type Env represents the lisp thread environment. Is passed as the second to last argument when Context is also used, and passed as the last argument when Context is not present in a function signature. All Lisp state should be stored in Env.

Defining lisp variables

New lisp variables are created using the defvar! macro, which optionally takes a default value. A coresponding symbol with a uppercase SNAKE_CASE name is also created.

Defining lisp functions

Lisp functions are normal Rust functions that are annotated with the #[defun] proc macro. This macro will create a wrapper that converts Objects into the requested types and also converts the return value back into an Object. This allows functions to move much of their type checking out of the function body for cleaner implementations. For example a function that accepts a string and returns a usize could be written like this:

#[defun]
fn my_fun(x: &str) -> usize {
    ...
}

allocating

If a function needs to allocate new objects, it will need to accept a Context parameter by reference.

#[defun]
fn my_fun(x: &str, cx: &Context) -> usize {
    ...
}

If a function need to access the environment, it will need to accept a Env parameter.

#[defun]
fn my_fun(x: &str, env: &Env, cx: &Context) -> usize {
    ...
}

rooted calls

If a function needs to call garbage_collect or calls a function that does (via the call! macro) it will need to take &mut Context. This means that all arguments need to be rooted as well. This is done by wrapping them in a Rto type.

#[defun]
fn my_fun(x: &Rto<Object>, env: &Rt<Env>, cx: &mut Context) -> Object {
    ...
}

Common errors

cannot borrow `*cx` as immutable

When calling a function that takes a mutable context (&mut Context), Rust will lengthen the mutable borrow for as long as the returned value is accessed. This can be fixed by wrapping the call in the rebind! macro.

let x = rebind!(my_func(x, &mut cx));

cannot borrow `*cx` as mutable because it is also borrowed as immutable

This is usually a sign that you need to root an object.

let x = cx.add("hello");
// root it
root!(x, cx);
mutable_call(&mut cx);
// access the variable again
let x = x.bind(cx);

C functions to port to Rust

casefiddle.c

Functions to manipulate case

  • upcase
  • downcase
  • upcase-initials
  • upcase-region
  • downcase-region
  • capitalize-region
  • upcase-initials-region
  • upcase-word
  • downcase-word
  • capitalize-word

character.c

Functions that operate on characters

  • unibyte-char-to-multibyte
  • multibyte-char-to-unibyte
  • char-width
  • string-width
  • unibyte-string
  • get-byte

fns.c (string)

Functions that operate on strings

  • string-bytes
  • string-distance
  • string-equal
  • compare-strings
  • string-lessp
  • string-version-lessp
  • string-collate-lessp
  • string-collate-equalp
  • string-make-multibyte
  • string-make-unibyte
  • string-as-unibyte
  • string-as-multibyte
  • string-to-unibyte
  • substring
  • clear-string
  • base64-encode-string
  • base64url-encode-string
  • base64-decode-string
  • base64-encode-region
  • base64url-encode-region
  • base64-decode-region

timefns.c

Functions that operate on time formats

  • time-add
  • time-subtract
  • time-less-p
  • time-equal-p
  • float-time
  • format-time-string
  • decode-time
  • encode-time
  • time-convert
  • current-cpu-time
  • current-time-string
  • current-time-zone
  • set-time-zone-rule

dired.c

Functions for working with directories

  • directory-files
  • directory-files-and-attributes
  • file-name-completion
  • file-name-all-completions
  • file-attributes-lessp
  • system-users
  • system-groups

fileio.c

Functions that operate on files

  • make-temp-file-internal
  • make-temp-name
  • substitute-in-file-name
  • copy-file
  • make-directory-internal
  • delete-directory-internal
  • delete-file
  • rename-file
  • add-name-to-file
  • make-symbolic-link
  • file-name-absolute-p
  • file-exists-p
  • file-executable-p
  • file-readable-p
  • file-writable-p
  • access-file
  • file-accessible-directory-p
  • file-regular-p
  • file-selinux-context
  • set-file-selinux-context
  • file-acl
  • set-file-acl
  • file-modes
  • set-file-modes
  • set-default-file-modes
  • default-file-modes
  • set-file-times
  • unix-sync
  • file-newer-than-file-p
  • insert-file-contents
  • write-region
  • verify-visited-file-modtime
  • visited-file-modtime
  • set-visited-file-modtime
  • do-auto-save
  • set-buffer-auto-saved
  • clear-buffer-auto-save-failure
  • recent-auto-save-p
  • next-read-file-uses-dialog-p
  • set-binary-mode
  • file-system-info

editfns.c

Function to edit buffers and manipulate text.

  • byte-to-string
  • pos-bol
  • line-beginning-position
  • pos-eol
  • line-end-position
  • buffer-size
  • point-min
  • point-max
  • gap-position
  • gap-size
  • position-bytes
  • byte-to-position
  • following-char
  • preceding-char
  • bobp
  • eobp
  • bolp
  • eolp
  • char-after
  • char-before
  • user-login-name
  • user-real-login-name
  • user-uid
  • user-real-uid
  • group-name
  • group-gid
  • group-real-gid
  • user-full-name
  • system-name
  • emacs-pid
  • insert
  • insert-and-inherit
  • insert-char
  • insert-byte
  • buffer-substring
  • buffer-string
  • insert-buffer-substring
  • compare-buffer-substrings
  • replace-buffer-contents
  • subst-char-in-region
  • translate-region-internal
  • delete-region
  • delete-and-extract-region
  • internal–unlabel-restriction
  • save-restriction
  • ngettext
  • message
  • message-box
  • message-or-box
  • current-message
  • propertize
  • format
  • format-message
  • char-equal
  • transpose-regions