Lifetimes + Provenance proposals #338

lattner · 2023-06-06T20:04:18Z

lattner
Jun 6, 2023
Maintainer

Hi all, I put together a draft of the lifetimes proposal, split between the new capabilities in one doc and then a syntax bikeshed proposal in the second. I'd love thoughts and feedback on this thread:

Provenance Tracking and Lifetimes in Mojo
Keyword naming and other topics to discuss

sa- · 2023-06-06T21:18:38Z

sa-
Jun 6, 2023

Starting with a small idea for the bikeshed (sorry)

The more extreme direction would be to remove let entirely ... it doesn't provide additional performance benefits over var, it only prevents "accidental mutation" of a value.

I would like to add that knowing a value is immutable makes code faster to reason about, reducing the time to comprehend. While it may not be the most valuable information for a compiler, it is quite valuable to the human. This assesment also holds true for the thought of "only preventing accidental mutation".

Further, I liked the idea to rename let to fix, since the new keyword is more intention revealing. It also side steps confusion about what const means by implying that the value is fixed at runtime. A person new to programming would find it easier to reason that fixing something means you can't change it rather than letting

Edit: We could drop var instead, since assigning mutable variables without a keyword is what we do in python

5 replies

czheo Jun 7, 2023

knowing a value is immutable makes code faster to reason about

Say if we remove let and value is immutable by default (like in Rust), will that lift this concern? I feel passing immutable values/objects when possible is already considered as a good practice by many people, given the influence of functional programming. But doing that for all may not be compatible with Python's semantics. So maybe immutable by default only within Mojo specific syntax blocks such as fn / struct.

gryznar Jun 7, 2023

@czheo This will greatly correspond with optional borrowed as currently is!

sa- Jun 9, 2023

I think making things immutable by default would not make mojo a good member of the python family, as much as I would like that to be the default as well. An option could be to drop var instead, some people have mentioned that in the discussions

lattner Jun 11, 2023
Maintainer Author

I know this is super tempting to bikeshed on, but I'd really rather we push concrete keyword names out until we get the overall design settled. Lazy evaluation of this will give us more complete information that will allow us to make a better final call. I added the suggestion to the naming proposal though, thanks!

InfernalAzazel Sep 21, 2023

Perhaps the discarded var default declaration is mutable, and the immutable declaration is defined by const

gryznar · 2023-06-06T21:57:50Z

gryznar
Jun 6, 2023

I also like the idea of being explicit about mutability from the reasons mentioned above. +1 on fix also if it may eliminate ambiguity.
As a reviewer point of view to this proposal: Keyword naming and other topics to discuss I have some concerns:
a) It has wrong semantic. It uses camelCase and should use snake_case as a Mojo is tend to be Python superset
b) This syntax is also incompatible with Python. It looks like not declaring reference, but calling function ref with argument a

c) This variable is used only in one place:

so what is the purpose of it?
d) ref for me also is not the best choice for immutable reference in case of mutable and immutable references. ref alone does not inquire that I cannot change it. It only gives me a tip, that the object passed this way is a proxy to original one, so changing it will affect original also if the source object is mutable, so in this case borrow makes more sense for me.
This proposal: https://github.com/modularml/mojo/blob/main/proposals/lifetimes-and-provenance.md also have wrong semantic in some places, as in 2a

3 replies

lattner Jun 11, 2023
Maintainer Author

Sorry about that, I wasn't intending to propose use of camel case, it is just old habits. I adjusted the examples to use snake case local variable names. I also updated the former doStuff example to declare 'a', that was also a mistake: fn do_stuff[a: lifetime](x: ref(a) String): .... Thanks!

gryznar Jun 11, 2023

I am verry happy, that my suggestions can improve Mojo! So, I've found yet another place:

lattner Jun 11, 2023
Maintainer Author

I'll tweak that thx

strangemonad · 2023-06-06T22:04:31Z

strangemonad
Jun 6, 2023

@lattner thanks for kicking off proposals in the open. I know it's a pain but I'm happy it's happening that way.

I love the idea of separating out the core design from the concrete syntax. I'd have a meta-proposal that I think might be powerful. I'd expect this pattern to be pretty frequent. I'd like to systematically provide a framework for that and always split up the concrete syntax from the abstract syntax. By abstract syntax, I'm imagining a definition similar in spirit to what Rob Harper did in Practical Foundations of Programing Languages (PFPL) though it probably doesn't need to be as rigorous as the Abstract Binding Tree syntax he uses throughout the book. I'm guessing the mojo MLIR dialect is probably a little too specific and is needs to be something slightly higher level and less formal but not as high-level as mojo concrete syntax?

That way you could always have concrete syntax discussions in terms of pattern matching how it lowers to abstract syntax.

6 replies

strangemonad Jun 7, 2023

@czheo I don't think I'm trying to propose anything that formal, because, as you mention, then you have yet another thing to design 🙃. I'm just suggesting that more or less what's done here be a standard pattern. I can't think of any case off the top of my head where concrete syntax and the underlying core statics and semantics depend so much on each other that you can't separate them.

lattner Jun 11, 2023
Maintainer Author

Really interesting proposal. I'm not aware of large scale examples of this, but would love to see it explored if you have cycles to develop it. I agree that starting with the base language would be a great place to build out and validate the framework

strangemonad Jun 13, 2023

@lattner I'll see if I can try and grow something along side proposals like this. I'm assuming traits v0 is probably one of the things that comes after lifetimes?

The main example I can think of in an engineering context would be MIR, but to @czheo's point that came after so thins might be a case of hindsight is putting the horse before the cart :) I'm still curious to at least try it out

strangemonad Aug 12, 2023

Just wanted to follow up here. Since posting this I've started a company so I don't foresee having time to explore the abstract syntax work in the near future

lattner Aug 14, 2023
Maintainer Author

No problem, thanks for the update!

mojodojodev · 2023-06-06T23:56:59Z

mojodojodev
Jun 6, 2023

Lifetimes proposal looks great, borrowed[a] String / ref[a] String with the square brackets looks the most natural, it makes sense to explain that it's being parameterized by the lifetime.

var meaning owned sounds really natural also, here's a suggestion for all the keywords in that vein:

ref[a] - immutable reference
mut[a] - mutable reference
let[a] - immutable owned
var[a] - mutable owned

Having three letters for all of the keywords will allow the user to understand this is related to ownership and mutability

The problem with the proposed removing let is that code ported from Python to Mojo won't behave the same, keeping let and var is advantageous in that it says this is a Mojo variable so you can add all the weird Python dynamic behavior when the keyword is elided.

15 replies

mojodojodev Jun 8, 2023

@czheo very cool didn't realize they were directly related to LLVM nice find.

It might end up you can do it that way and it ends up being a StringRef that'd be cool

lattner Jun 11, 2023
Maintainer Author

Yep that's where it came from. It is directly related to string_view in C++ (the LLVM data structures predate the C++ STL growing all these things). The idea of a "pointer + extend without ownership" is more general than a "reference to a specific owning data structure" because it type erases the concrete storage type. For example, an LLVM StringRef can point into C array, an std::vector, or one of the zoo of other specialized storage types llvm has - it can even point to a scalar on the stack.

Per the comments above, I think actually calling this sort of type "ArrayRef" and "StringRef" in mojo would be super confusing if we have "ref" as a different concept. Python generally uses the word "Slice" for these things, and I think that would be great to use for these.

@mojodojodev your naming ideas are great, I pulled them into the bottom of the doc for sorting when we get more implementation experience. Thank you!

czheo Jun 12, 2023

@lattner Totally agree that calling it "XyzRef" can be confusing, and that's exactly what confused me at the first place. Meanwhile, "view" resonates better with me than "slice", since slicing in Python

creates copies instead of referencing underlying data. (unlike golang which do referencing.)
can have a "step", so the underlying data can be discontinuous.

So it's more nuanced than string_view/StringRef.
For something more alike, Python has memoryview, which also has "view" in the name.

jdlib Jun 15, 2023

Coming back to the keyword proposal of @mojodojodev:

ref - immutable reference
mut - mutable reference
let - immutable owned
var - mutable owned

When Java 10 introduced automatic type inference and the var keyword, they also looked at let, val and other related keywords (https://stackoverflow.com/a/49427377). Given the very different uses of `all these keywords in other languages I fear that the proposed solution will not be very intuitive.

But what about simply adopting Rust, having only let (or var) and adding & for references/not owned and mut for mutability?

jdlib Jun 15, 2023

Also not sure about the drop of inout. This also expressed that reassignments of a inout parameter in a function will be propagated to the outside (see swap example). Will this still be possible?

gryznar · 2023-06-07T00:04:36Z

gryznar
Jun 7, 2023

If I am thinking more about Keyword naming and other topics to discuss, it seems clear to me that original syntax has the most sense. Let's consider some example

borrowing argument to use it, but not change it:

current:

fn do_something(borrowed x: Int): ... 
# function expects BORROWED x, borrowed here is related to TYPE of argument, not to action

after change:

fn do_something(ref x: Int): ... 
# function expects reference to x, but would like to mutate x or not? Without diving deeper into semantic it is not clear

fn do_something(borrow x: Int): ... 
# borrow here relates to action which function would like to perform on argument, 
# but this action is indicated during argument declaration, so it does not have much sense

owning an argument to destroy it
current:

def do_sth(owned p: Pointer): ...
# p here is OWNED by destroy_pointer(), so after do_sth() p will be not available outside

after change:

def do_sth(var p: Pointer): ...
# var is used to assign variable, so here it means, that p will still exist, but will be overwritten?

Owning for me needs special treatment, because during declaration of owned argument we must be very explicit to not confuse others.

1 reply

lattner Jun 11, 2023
Maintainer Author

I don't have strong opinions, but I have some concern about general programmers (i.e., those without Rust experience) and the word "borrow". It is a word that can be explained and has good meaning in the rust lexicon, but doesn't connote referencing something, and doesn't even appear in the rust language (they use the & sigil instead). This isn't to say that "borrow" or "borrowed" is bad, but it does have some challenges.

ksandvik · 2023-06-07T00:16:22Z

ksandvik
Jun 7, 2023

let removed -- good. No need for more keywords, fix and variations of out et rest. Same removing owned; less is better. If something needs special treatment from 90% of normal behavior, then it needs a keyword.
ref - ok for me, it's a reference after all. mutref is fine as well. if there's a generic mut then there's an assumption it works in all cases where it might not.

2 replies

gryznar Jun 7, 2023

let | fix. You are forgetting about one crucial thing. Global scope has to be compatible with Python in which everything is mutable. Some users may want to declare variable as immutable. How it could be done without additional syntax? Not specializing it won't tell the compiler to protect you against unintended mutation.
I propose to let | fix be OPTIONAL in fn | struct as borrowed is optional (everything is declared as immutable)
owned - transfer of ownership (to destroy purposes for example) needs be specially treated to avoid painful mistakes. Being explicit here counts very much for me.
ref for me is ok, but it has to mean only reference, without additional assumptions about mutability (may be mutable or may be not)

strangemonad Jun 13, 2023

@gryznar for 1) I think there's a workable solution there since "global" scope is tied to modules in python. I think you could have let semantics be the default for mojo modules and var for python modules. This would be in line with the "strict fn" proposal you made in #120 (reply in thread) and a similar sketched out idea I made right after in #120 (comment) both in #120

obviously this has to propagate cleanly e.g.

from my_mojo_mod import my_let
from my_py_mod import my_var

It's less clear how the other direction should work. If I distribute a mojo package as a native package distribution, how can it be imported in python. Kotlin (exposing back to java) and rust py03 exposing to python both use an approach of annotating the symbols that should be re-exposed (@annotations in kotlin -> java and [macros] in rust py03).

ksandvik · 2023-06-07T00:40:18Z

ksandvik
Jun 7, 2023

I suspect this will be downvoted, but we have another closing character set for declaring lifetime parameters, eh:
fn longest(x: borrowed{life} String,
y: borrowed{life} String) -> borrowed{life} String:

So it's like a lifetime closure. Also most likely library developers will mostly encounter the side effects so most other developers might seldom have the need to use {} . Anyway, interested to see if I'm chased out from the shed due to introducing {} to the python world.

1 reply

lattner Jun 11, 2023
Maintainer Author

The problem with curly braces is that they mean "set" or "dictionary" in Python, not closure.

abhinav-upadhyay · 2023-06-07T04:33:22Z

abhinav-upadhyay
Jun 7, 2023

The lifetime proposal seems well thought out.

On the keyword renaming discussion:

I am not in the favour of dropping let altogether and having everything as mutable. There is value in having the ability to create immutable variables. It makes the code easier to reason about, gives clear intention of the programmer that they do not want this variable to be modified and also all the thread-safety benefits. Also, using let as the keyword for creating immutable variables didn't seem unnatural to me because that's how we do it in Rust. Using fix in place of let reads very weird to me. If we really want to replace let, const sounds way better.
Not a fan of replacing owned with var either but I think I will get used to it.
Other keyword renaming suggestions seem reasonable to me

1 reply

gryznar Jun 7, 2023

I dislike also owned -> var

czheo · 2023-06-07T08:46:49Z

czheo
Jun 7, 2023

w.r.t lifetime of self, it's not clear to me the different between

struct Foo[life: lifetime]:
    var data : Pointer[Int, life]

vs

struct Foo:
    var data : Pointer[Int, Self_lifetime]

2 replies

lattner Jun 11, 2023
Maintainer Author

The proposal probably doesn't explain this well, but the former is the lifetime of something else, not the lifetime of Foo itself.

czheo Jun 12, 2023

@lattner in that case, can it be more straightforward if we mandate the first lifetime parameter as the lifetime of self, consistent to how Python mandates instance methods have self as the first argument?

struct Foo[Self_lifetime: lifetime]:
    var data : Pointer[Int, Self_lifetime]

mzaks · 2023-06-09T15:22:05Z

mzaks
Jun 9, 2023

Here is a crazy idea:

fn example['1_life](cond: Bool,
                           x: borrowed'1 String,
                           y: borrowed'1 String):
    # Late initialized local borrow with explicit lifetime
    borrowed'1 strRef : String

    if cond:
        strRef = x
    else:
      	strRef = y
    print(strRef)

IMHO lifetimes can be described as "colourations" of the reference semantics. As such it is important to see how many colours are there and which reference belong to the same colour. Therefore a numeric identification of a life is more useful. That does not mean we should not name the colours though. I propose to have an option to associate a lifetime with name by postfixing the numeric value with _string. This way users can give meaningful names to lifetimes in parameter declaration but do not have to repeat them in the rest of the signature as the numeric prefix is the actual identifier.

Here is a more complex example taken from this blogpost where the author advocates for giving the lifetimes in Rust more meaningful names then just 'a, 'b, 'c:

struct Article:
    title: String
    author: Author

struct Author:
    name: String

struct ArticleProvider:
    articles: Vec[Article],

struct AuthorProvider:
    authors: Vec[Author]

struct AuthorView['1_article, '2_author]:
    author: inout'2 Author
    articles: Vec[inout'1 Article]

fn authors_with_articles['1_article, '2_author](
    article_provider: inout'1 ArticleProvider,
    author_provider: inout'2 AuthorProvider,
) -> Vec[AuthorView['1, '2]]:

With numeric identification of lifetimes it would be sensible to suggest '0 or '0_static to be a default for static lifetime and '1 or '1_self to be a default for a struct self lifetime.

The example from Provenance tracking and Lifetime in Mojo would adopt as following:

    @value
    @register_passable("trivial")
    struct MutablePointer[type: AnyType, '1_self]:
        alias pointer_type = __mlir_type[...]
        var address: pointer_type

   	    fn __init__() -> Self: ...
        fn __init__(address: pointer_type) -> Self: ...

        # Should this be an __init__ to allow implicit conversions?
        @static_method
        fn address_of(inout'1 arg: type) -> Self:
        	...

        fn __getitem__(self, offset: Int) -> inout'1 type:
   		    ...


        @staticmethod
        fn alloc(count: Int) -> Self: ...
        fn free(self): ...

    fn exercise_pointer():
    	# Allocated untracked data with static/immortal lifetime.
    	let ptr = MutablePointer[Int, '0].alloc(42)
    	# Use extended getitem through reference to support setter.
    	ptr[4] = 7

    	var localInt = 19
    	let ptr2 = MutablePointer.address_of(localInt)
    	ptr2[0] += 1  # increment localInt

        # ERROR: Cannot mutate localInt while ptr2 lifetime is live
        localInt += 1
    	use(ptr2)

I think the more exotic features from Rust listed in Provenance tracking and Lifetime in Mojo, will also feel more intuitive with a numeric identification of lifetimes, where a smaller number outlives the larger one and we can identify equality by something like this:

struct Pair [type: AnyType, '2_first | '3_second]:
  first: inout'2 type
  second: inout'3 type

Where we signify that '2 has a parallel lifetime to '3. This is not really well thought out though.

Motivation behind this rather radical proposal is following:

Generic Lifetime Annotations is a concept which is not familiar to most of the potential users of Mojo. Making it look similar to other concepts like generic types and compile time arguments might cause more confusion as user unfamiliar with the concept will not be able to identify something "unfamiliar" from the function or struct signature. So IMHO following signature:

fn example[life: lifetime](cond: Bool,
                           x: borrowed[life] String,
                           y: borrowed[life] String):

might suggest to the user, who is unfamiliar with the concept, that life is a generic type they need to provide and borrowed[life] will confuse them, but they do know what a[b] is normally a getitem kind of thing, so that is confusing because it is on a keyword borrow but familiar, so the WTF moment is very subtle. Where when there is special syntax for a very special concept, it just directly screams at you, "hey, look me up!". It does not have to be a ' like in Rust, but I think making generic type annotations and generic lifetime annotation look the same is unnecessary.

Second argument for the design I came up with was based on my experience, giving meaningful names to lifetimes is hard and almost nobody does it, this is why most of the Rust code uses 'a, 'b, .... In Mojo I will probably go with l1, l2, ... which is not that much better. The lifetimes names look like variable names and with that generate noise. This is why I suggested to replace the alphanumeric identifiers with just numeric. If we would replace the ' with the @ we could write borrowed@1 and borrowed@2 which mean that following reference was borrowed at lifetime 1 and the other was borrowed at lifetime 2, which also implies that first reference will outlive the second one. This is a terse notation which communicates the intent and context quite well.

3 replies

lattner Jun 11, 2023
Maintainer Author

Huh, really interesting! I added your suggestion to the doc here so we can consider it when getting to the syntax'ing stage, thanks!: https://github.com/modularml/mojo/blob/main/proposals/lifetimes-keyword-renaming.md#more-alternatives-to-consider

strangemonad Jun 13, 2023

@mzaks this definitely jives with me and my understanding of the params RustBelt had to track to formally verify Rust's borrow checker. I think there are likely to be a few more sorts of lifetimes to track

single thread / multi-thread e.g. multiple async coroutines can all share a non-atomic borrow so long as they run on the same thread. Borrows that might span a thread likely need to be implemented as atomic-borrows and track some sort of thread-id parameter. You could obviously start with the pessimistic atomic-borrow impl but you'd likely want to recover cases where you don't need the additional memory barrier overhead?
host / device? I'm imagining a world where mojo is able to dispatch async coroutine kernels that run on the host and on accelerator devices. A simple cpu memory barrier is no longer enough and you need something like a GPU memory fence + PCIe DMA coordination or the ability to know that you're on a unified memory system.

fabrizio-ferrari Oct 3, 2023

This is why I suggested to replace the alphanumeric identifiers with just numeric. If we would replace the ' with the @ we could write borrowed@1 and borrowed@2 which mean that following reference was borrowed at lifetime 1 and the other was borrowed at lifetime 2, which also implies that first reference will outlive the second one.

+1

I think use @ instead of ' is much betteer because this last one sounds awfully like a unterminated string.

strangemonad · 2023-06-13T18:15:20Z

strangemonad
Jun 13, 2023

Hey @lattner, sorry, I've not had much time to put all my thoughts in a coherent form yet. I wanted to at least put a few things are your radar that make my spidey senses tingle when I read through some of this.

1) Awareness of RustBelt's contributions

First, I want to make sure you're aware of the awesome work by Derek Dreyer and team (a former Bob Harper / CMU PhD now at MPI) has done as part of the RustBelt. He was able to formally verify rust's borrow checker correctness and safety including unsafe. I think it's at least worth being aware of and noting what information he needed to keep track of when reasoning about proving the correctness of unsafe code (what the main paper calls Interior Mutability). The approach he used of creating borrow predicates and lifetime tokens (specifically the data he needs to track to reason about those programs seems like it should also be useful in implementing lifetimes and ref counting (especially across async boundaries).

RustBelt defines the following predicates and the corresponding parameters required (e.g. a concept of a thread id) to reason about the lifetimes in unsafe (code with Internal Mutability ie pointer manipulation)
- persistent borrows are a sharing proposition that allow modeling the sharing predicate required for the behavior of types with interior mutability (e.g. Cell, RefCell, Mutex, Rc, Arc,...)
- indexed borrows provide a common implementation for sharing predicates that handle the cases of read-only and persistent borrows.
- non-atomic persistent borrows a type of persistent borrow that's only allowed on a thread e.g. Cell<T>. ^[in section 6.1 RustBelt: securing the foundations of the rust programming language (mpi-sws.org)]
- atomic persistent borrows thread safe persistent borrows e.g. Mutex<T> ^[in section 6.2 RustBelt: securing the foundations of the rust programming language (mpi-sws.org)]

Derek tends to be friendly and respond to strangers reaching out and asking questions :)

2) concerns about lifetimes as type parameters

I'm not able to fully articulate this one yet but I worry this isn't the right direction to go.

2.1) I think this might prove to be a stumbling block once you try to figure out traits and function overriding and how the sub-typing relationship plays with restrictions or widening of the lifetime. It's worth being aware of some of the issues Rust had to resolve on that front:
- RFC: impl specialization by aturon · Pull Request #1210 · rust-lang/rfcs (github.com)
- Drop check in rust Drop Check - The Rustonomicon (rust-lang.org) and Drop Check escape hatch dropck can be bypassed via a trait object method · Issue #26656 · rust-lang/rust (github.com) Drop Check escape test Long awaited regression test for dropck on trait object method. by pnkfelix · Pull Request #30307 · rust-lang/rust (github.com)

2.2) It's not clear to me how this would work with partial type specialization. e.g. I might want to create a type alias that partially specifies some of the type params but not the lifetime. Does this open us up to needing to handle named parameters here?

2.3) This is maybe the crux of my thoughts (that I don't have a crisp articulation of yet). I think there's a flavor of the expression problem at play here. The lifetime parameter(s) are making a statement about the lhs binding to a value of the type. I think this is very similar to how linear type systems need to track usage counts. QTT (refinement of McBride's original proposal and the Idris 2 (Edwin Bradey) solved this expression problem by keeping these 2 parameters separate and realizing a (usage-count, Tau) pair of parameters at the binding site. (note I'm not suggesting we expose a linear type system but that lifetime and arc implementations have similar concerns to implementing linear types since you have to track usage).

QTT paper Type-and-Scope Safe Programs and Their Proofs (gallais.github.io)
Idris 2 linear types implementation paper Idris 2: Quantitative Type Theory in Practice (arxiv.org)

0 replies

jordiae · 2023-06-22T18:40:01Z

jordiae
Jun 22, 2023

Perhaps an unpopular opinion, but to me the original syntax with "&" was more intuitive.

1 reply

fabrizio-ferrari Oct 3, 2023

Intuitive it is. But the problem with "&" is when you need mix character with keyword like this: "&mut"

ammkrn · 2023-07-09T08:47:50Z

ammkrn
Jul 9, 2023

The initial document notes that relationships/constraints between lifetimes as a future area of interest rather than a present one (ala 'big outlives 'small); since there was already some discussion in the linked C++ RFC and some input from Rust devs therein, I just wanted to throw out the opinion that Rust's decision to directly present this to users as a subtype/supertype relationship and rely on users' intuition about variance and the type system is, in my opinion, responsible for a nontrivial share of the difficulty newcomers have understanding rust type signatures and actually working with lifetimes.

The Rustonomicon section on variance notes that this is a common source of confusion:

Lifetimes are just regions of code, and regions can be partially ordered with the contains (outlives) relationship. Subtyping on lifetimes is in terms of that relationship: if 'big: 'small ("big contains small" or "big outlives small"), then 'big is a subtype of 'small. This is a large source of confusion, because it seems backwards to many: the bigger region is a subtype of the smaller region. But it makes sense if you consider our Animal example: Cat is an Animal and more, just as 'big is 'small and more.

Put another way, if someone wants a reference that lives for 'small, usually what they actually mean is that they want a reference that lives for at least 'small. They don't actually care if the lifetimes match exactly. So it should be ok for us to forget that something lives for 'big and only remember that it lives for 'small.

When you get up to three lifetimes, for example 'a, 'b, 'c where 'a is the longest lived, and 'c is the shortest lived, instead of expressing what the author admits is just a partial order as something like 'a >= 'b >= 'c, rust demands users directly write (or read) impl<'c, 'b: 'c, 'a: 'b> .... The C++ RFC seemed to be leaning toward a more direct 'a >= 'b style, which seems much more intuitive even if lifetimes and constraints are considered part of the type system internally.

0 replies

renlite · 2023-09-24T18:05:09Z

renlite
Sep 24, 2023

Variants for all the keywords (Lifetime):

## 1) ref(=default)
fn add(x: Int, y: Int) -> Int:
    return x + y

fn add(mut x: Int, mut y: Int) -> Int:
    x += 1
    y += 1
    return x + y

fn set_fire(mutown text: String) -> String:
    text += "🔥"
    return text


## 2) own(=default)
fn add(ref x: Int, ref y: Int) -> Int:
    return x + y

fn add(mutref x: Int, mutref y: Int) -> Int:
    x += 1
    y += 1
    return x + y

fn set_fire(mut text: String) -> String:
    text += "🔥"
    return text

## or
fn add(in x: Int, in y: Int) -> Int:
    return x + y

fn add(inout x: Int, inout y: Int) -> Int:
    x += 1
    y += 1
    return x + y

fn set_fire(mut text: String) -> String:
    text += "🔥"
    return text

The last variant (ref as in / inout) seems to be better readable.

0 replies

david-ragazzi · 2023-10-03T18:26:43Z

david-ragazzi
Oct 3, 2023

I have a scheme in mind that maybe could minimize the keywords and at same time keep their meanings consistent.

It consist of:

Use "ref" as reference to an object, and in the missing of it, the compiler consider the object as owner (in the same way Rust does).
Use the “var” keyword both to declare an object has a variant content (i.e., mutable) and to inform that a function parameter has a variant content (i.e., mutable) and use "const" keyword to say that a object has constant content.

Bellow we have a clue of how these keywords could interate:

const a = "abc"  # a: imutable owned
const b = ref a  # b: imutable reference to "a"

var c = "cde"    # c: mutable owned
var d = ref c    # d: mutable reference to "c"
const e = ref c  # e: imutable reference to "c"

Bellow as "const" is the default, we don't need specify it:

def do_something(ref a: String, var ref b: String, c: String, var d: String)
    # a: imutable reference
    # b: mutable reference
    # c: imutable owned
    # d: mutable owned
    ...

In the end, we would have only these 3 keywords: var and const to specify if an object is mutable or not, and ref to specify if an object is a reference to another object (in the missing of it, the compiler handle the object as it being the owner of the content).

6 replies

gryznar Oct 9, 2023

The problem with & for me is the fact that is not self-descriptive. As you look at Python code it is easy to understand, as it looks like english. Single character keywords may be familiar for developers from other languages, but for newcomers it may look like magic :)

gryznar Oct 9, 2023

Python gains such popularity not because it is fast. The main reason is simplicity and ease of understand. Mojo should introduce new concepts in the same way

david-ragazzi Oct 10, 2023

@lattner Could you consider put this proposal as more one alternative in https://github.com/modularml/mojo/blob/main/proposals/lifetimes-keyword-renaming.md as some users like it too ?

gryznar Oct 10, 2023

If it is told about me, I am not convinced to use the same keywords to mix declaration of variable with specifying arguments. Rust is only using "let" to declare variable, so it is strange for me, that the same keywords (var and const) are reused, but for other purposes (declaring arguments).

gryznar Oct 10, 2023

I like also default as immutable reference, so in this case having separate keyword for taking an ownership looks much better. Separating mutable and immutable references will be also a plus, so for me additional keywords:

own
mutref

will be much better to have instead of trying to express everything by mixing var and const.

ksandvik · 2023-10-03T18:29:02Z

ksandvik
Oct 3, 2023

Semantically I like the keywords as each one is a noun, so when speaking about the code it makes sense.

0 replies

nmsmith · 2023-10-08T03:15:53Z

nmsmith
Oct 8, 2023

Update Jan 2024: Since making the below post, I've iterated a lot on the design. I plan to publish a major revision at some point. There are a lot of subtleties that make the design presented below untenable. In particular, storing references inside structs is very tricky, and I no longer believe the naive syntax presented below is appropriate.

@lattner Here is some very late feedback on the lifetimes proposal. I meant to provide this feedback months ago, but I wanted to investigate these ideas more thoroughly. Unfortunately, I haven't had the time to do that yet. So I'll just convey the gist of what I was thinking, in the hope that it might be helpful.

In short, we might be able to express provenance as follows:

fn longest(x: String, y: String) -> x|y:

As well as being more succinct, this syntax might facilitate a more general semantics than Rust, by allowing us to abstract over the mutability of references.

The basic idea

I'm exploring a syntax for lifetimes that is much simpler than the proposed ref[a] syntax. As your proposal mentions, a "lifetime parameter" is really a way to describe the provenance of references that the function "puts" somewhere. (A function can put references into the return register, but it can also put references into mutable arguments.) Crucially, the provenance of a reference can be understood as the variable or struct field that it was produced from. Oftentimes, the source will be decided at runtime. This suggests that at compile time, the provenance of a reference must be handled more generally as the set of all variables and struct fields that it could have been produced from.

Part 1: Returning references

Let's look at an example. For the sake of comparison, here is the syntax of the current proposal:

fn longest[a: Lifetime](x: ref[a] String, y: ref[a] String) -> ref[a] String:

Here is an alternative syntax that seems promising to me:

fn longest(x: String, y: String) -> x|y:

Above, the return type x|y means that the function will either return a reference to the object that x refers to, or a reference to the object that y refers to.

One nice thing about this syntax (apart from being more concise) is that it communicates an important aspect of the function's behaviour. If you didn't otherwise know what the function did (imagine it was called foo), you can learn a lot just by reading the type signature: "this function accepts a string x and a string y, and returns either x or y". You can discern the same thing from the ref[a] syntax, but it's certainly not as straightforward to read!

Here's another example:

fn foo[A, B, C](x: A, y: B, z: C) -> List[x|y|z]

This signature specifies that the return value will be a list which is populated with references to the objects that x, y, and z refer to. Notice that the | operator is doing double-duty: it's acting as both the union operator (an established Python feature), while also being used to describe provenance. This is nice: it would reduce the amount of syntax that users need to learn.

With the proposed syntax, it would also be possible to return references to non-local variables:

let x, y, z: Int
fn foo(...) -> List[x|y|z]

This means that we wouldn't need a 'static lifetime. Instead, we just explicitly name the static variables that the function might return a reference to.

This also gives us a very cool and dead obvious syntax for methods that return references to self:

struct Point:
    x, y: Float64
    # Note: the return type is the value `self`, not the _type_ `Self`
    fn translate(self, dx: Float64, dy: Float64) -> self:
        self.x += dx
        self.y += dy
        return self
    fn rotate(self, around: Point, degrees: Float64) -> self:
        ...
let p = Point(...).translate(10, 0).rotate(origin, 90)

It would also be possible to return references to public struct fields, and computed properties:

fn foo(p1: Person, p2: Person) -> p1.name|p2.name

To return a reference to a private struct field, I suspect you could just expose it via a computed property, and use that as the return type.

Referring to collections would be slightly more challenging, because the elements of a collection cannot be named by a set of variables, or a set of paths. Indeed, the elements can't be enumerated at compile-time, because there are dynamically-many elements. So instead of referring to collection elements by their name, we might be able to refer to them by the type parameter that defines them:

fn combine(
    list_1: List[String],
    list_2: List[String]
) -> List[list_1.Item|list_2.Item]

Here, I'm assuming that the type parameter associated with the List type has been given the name Item, and that it is accessible to users.

Polymorphism over mutability

One cool benefit of this syntax is that it allows us to abstract over the mutability of the reference being returned.

Consider this function from earlier:

fn longest(x: String, y: String) -> x|y:

Given this signature, it seems reasonable that if the strings used as arguments to a particular invocation of this function were mutable, then the reference returned should also be mutable. Basically, the type signature should be read as "I don't care if x or y are mutable, I'm just going to return one of them to you". In other words, the function is promising that it won't mutate the arguments, but it doesn't require the caller to make that promise.

In contrast, the ref[a] syntax does seem to suggest that the caller isn't allowed to mutate the return value, because the return type is explicitly annotated with ref (as opposed to mutref):

fn longest[a](x: ref[a] String, y: ref[a] String) -> ref[a] String:

This suggests that if the function implementer wanted to give the caller the freedom to mutate the result, they would need to offer a second version of the function:

fn longest[a](x: mutref[a] String, y: mutref[a] String) -> mutref[a] String:

This is a very real nuisance that Rust programmers often encounter. It forces many APIs and traits to be duplicated twice. (As an example, see Index and IndexMut.)

Abstracting over mutability technically doesn't require the syntax I've proposed, but IMO, the syntax makes it apparent that a function does not have the authority to specify the mutability of the references it returns.

Compatibility with access modifiers

It's worth thinking about how this syntax might work alongside access modifiers—on the assumption that Mojo might eventually offer them. Consider a scenario where an opaque data type that has no type parameters (such as a CustomerDatabase) needs to offer a method that returns references to elements (such as Customers) stored within a private collection type (such as a BTree). How can we modify the signature of the following function, to hide db from the public interface?

struct CustomerDatabase:
    private db: BTree[ID, Customer]
    # Oops, 'self.db' is meant to be a private implementation detail
    fn __getitem__(self, id: ID) -> self.db.Value:
        return self.db[id]

What constitutes a suitable solution will depend on Mojo's approach to access control and "privacy", which is not something that has been decided yet. Regardless, this is an important problem to solve. I have some thoughts on how to solve it, but nothing worth sharing yet.

Part 2: Storing references in arguments

So far, I've presented a syntax for describing the provenance of return values. But when a function has mutable arguments, there is also the possibility that references are inserted into those. We need to extend the proposed syntax such that it can describe the provenance of such references.

It might look something like this:

fn insert(s: String, items: mut List[String put s]) -> Bool:
    items.append(s)
    return true

# Or defined as a method on `List`:
struct List[Item]:
    fn insert(self: mut Self[Item put x], x: Item):
        ...

This syntax would combine well with the | operator:

fn insert_one_of(s1: String, s2: String, items: mut List[String put s1|s2])

This aspect of the syntax needs more attention, but I haven't been able to dedicate the time yet. The syntax really needs to be co-designed with the semantics for stored references, which hasn't been finalized. The Mojo team have been putting a lot of work into the semantics, so I'll leave it to them to figure out how this syntax might (or might not) integrate with it.

Summary

At the very least, I think the proposed syntax is really interesting, and worth further investigation. The core question is: is it possible to use variable names, rather than lifetime parameters, to describe the provenance of references? If it were possible, it might greatly simplify the syntax for lifetimes in Mojo, and ultimately make Mojo a simpler and more beautiful language.

1 reply

gryznar Oct 8, 2023

Hmm, with this proposition it is not obvious, that variable is taking reference to a String and it is not passed by value for example.

E.g. in Rust it is possible to create value from the reference. This semantic seems to be to mean to clearly express that in Mojo. It may be, that I am wrong and I don't think about sth

gryznar · 2023-10-14T23:26:44Z

gryznar
Oct 14, 2023

How about sth like that?

let -> const (declares immutable and unreassignable value)
foo^ -> own foo (transfers ownership)
borrowed -> ref (immutable reference)
inout -> mutref (mutable reference)
owned -> own (taking ownership)

(ref and mutref could be easily reused for passing value):

var foo = 1
const bar = ref foo
const  baz = mutref foo

FAQ:
Q: Why not to reuse var / const in argument declaration in function?
A:
a) separation IMHO works much better (var and const are responsible only for variable declaration and nothing more, like in other languages).
b) owned -> own much better express intent than var (taking an ownership)

Q: Why to replace ^ with own?
A: This is only one special character added. It may confuse newcommers. Keyword is much more expressive. Also: ^ in Python is bitwise XOR operator

3 replies

fabrizio-ferrari Oct 14, 2023

I like it. By the way, this ^ is bizarre!

sa- Oct 15, 2023

This looks good! The only edit I would suggest is using the word move to replace ^

gryznar Oct 15, 2023

own could be also used as replacement for owned when move not. This is also inconsistent with passing by copy for example which does not require keyword.

ErenBalatkan · 2023-12-11T09:05:21Z

ErenBalatkan
Dec 11, 2023

Consider using the following proposal for 4-character keywords:

cref -> Immutable reference
mref -> Mutable reference
cvar -> Immutable owned
mvar -> Mutable owned

This approach ensures conceptual consistency, where all mutables start with m, and all references end with ref. This consistency could be huge improvement to the time it takes to learn these concepts.

And with this approach move for transferring ownership fits nicely into 4-character convention instead of ^

Regardless of whether this exact naming schema appeals or not, please consider prioritizing consistent naming conventions. Using abbreviations like mut for mutable references, while clear on mutability, lacks clarity on it being a reference. And given the number of concepts that a language includes, reducing need to memorize which abbreviation maps to which concept would make it much easier for new developers to learn.

1 reply

dderooy Dec 15, 2023

cref and mref look good

Eprahim-taha · 2023-12-11T10:14:14Z

Eprahim-taha
Dec 11, 2023

^ => move

let a: String = "Hello Chris"
let b: String = move a✅ = let b: String = a^❌

print(a) #Error

0 replies

dderooy · 2023-12-13T03:07:11Z

dderooy
Dec 13, 2023

Another bikeshedding opinion:

ref / constref in replacement of mutref / ref

The syntax 'mutref' creates a cacophony in my mind and I think its because 'mutt' sounds unpleasant. Considering this syntax will be used a lot and this is the time for exploration, ideally we end up with something smoother sounding. I'm hopeful of Mojo not just as a useful and fast language, but also as a beautiful and intuitive extension of Python.

Id even prefer deltaref over mutref

1 reply

dderooy Dec 17, 2023

Other ideas for mutref:

mref (already mentioned by @ErenBalatkan and I think its very clean)
altref
varref
adjref
ioref
modref

Again I just don't like the 'mutt' sound 😆

Eprahim-taha · 2023-12-20T18:44:47Z

Eprahim-taha
Dec 20, 2023

Beautiful. We talked about syntax for life at Mojo 🔥 and some of you gave great, mildly amazing suggestions, while others weren't. We talked about the importance of age in mojo 🔥 and its effective role in transferring the language to another world, but let us leave all that and ask the most important question: When will it be completed? Lifetime support in Mojo programming language 🔥 This is what we should focus on. Any additions to the syntax can be talked about later.

0 replies

Brian-M-J · 2023-12-23T03:56:49Z

Brian-M-J
Dec 23, 2023

Another really interesting design decision is OCaml's modes, which decouple lifetimes and types, therefore reducing complexity:

Encoding locality with a mode has some advantages compared to Rust’s type-centric approach. In Rust, reference types are parameterized over specific regions represented by lifetime variables.
On the other hand, lifetime variables are a source of pervasive complexity. When references are inherently polymorphic, essentially all functions become lifetime-polymorphic as well. For example, whenever a reference lacks a lifetime annotation, an implicit lifetime variable appears:

fn print_string(s: &str);
// Is equivalent to...
fn print_string<'a>(s: &'a str);

Since Rust supports first-class functions, the result is that higher-order functions require higher-order polymorphism, for which type inference is undecidable in general.
OCaml’s modes do not affect type inference—they preserve the types of existing code, so users truly don’t need to consider modes they aren’t actively using. In OCaml, type inference, higher-order functions, and garbage collection are all important parts of the development workflow, so we consider the local mode to be a good fit.

I don't know how well this can work in Mojo:

It's mentioned that Rust's lifetime system is more expressive than modes:

This design is more expressive than locality, which only distinguishes values that may escape all regions from those that cannot escape any.

I don't know if modes can work in a language that doesn't use a garbage collector.

There's also Project Verona by Microsoft Research, inspired by Rust, Cyclone and Pony, that aims to answer these questions:

If we design a language without concurrent mutation, can we build scalable memory management?
Can linear regions be used to remove the restrictions of per-object linearity without sacrificing memory management?
Can language level regions be used to support compartmentalisations?

See this talk for more info.

Edit: Also check out Antelang:
Achieving Safe, Aliasable Mutability with Unboxed Types

0 replies

Verdagon · 2024-06-11T22:20:21Z

Verdagon
Jun 11, 2024

Hey, saw this linked in an HN comment and it nerd-sniped me! I know this might be outdated, but here's some thoughts anyway:

It isn’t clear to me how the compiler will remap this though. We’d have to pass in the pointer/reference instead of the struct type. An alternative is to not allow expressing this and require casts. We can start with that model and explore adding this as the basic design comes up.

Something that helped simplify implementation for Vale was to universally add an implicit "self region" generic parameter. In Mojo speak:

struct IntArray[Self_lifetime: Lifetime]:
    var ptr : Pointer[Int, Self_lifetime]

(this is under the hood, user needn't see [Self_lifetime: Lifetime])

It also gave the use-site a convenient place to put extra region/lifetime bounds; the compiler could lower an e.g. &(dyn MyTrait + 'a) to &(dyn MyTrait<'a>) which was nice (not sure of the Mojo equivalent yet).

Biggest downside was that it was a "special" generic parameter, and therefore a design smell... on the other hand, once I took it further and considered having multiple "implicit self region" generic parameters, some truly weird things happened which later solved some holes in Vale's design, ironically.

Might or might not be applicable to Mojo (Vale's region borrowing is rather unusual) but could be worth exploring.

I’m hoping/expecting that the borrow checker will allow mutable references to overlap with other references iff that reference is only loaded and not mutated.

I was exploring this with Jon Goodwin (of Cone) for his language so I agree with your hope/expectation, and there are also a few other mechanisms that allow more aliasing+mutability along those lines. Happy to explain more if anyone's curious.

Random thought re: these "transfer functions": when I implemented Vale's logic that mapped callee lifetimes to caller lifetimes, I felt some eerie echoes with how linear algebra uses matrices to transform vectors from one coordinate space into another. If anyone's good at graphics and compilers, I'd be curious to see how far that echo reaches.

0 replies

IgorDeepakM · 2024-09-03T09:00:24Z

IgorDeepakM
Sep 3, 2024

I really wonder if Rust like lifetimes is a proper fit for Mojo. Mojo seems to be targeting programmers who are using Python and that for convenience, they want things done as easily as possible and don't want to deal with memory management. What happens when you present a Python programmer with Rust like lifetimes? I bet they will hiss and spit. People who like Rust will continue to use Rust.

Many languages seems to be jumping on a Rust like single ownership model today but that only solves half of the problem. There is still multiple ownership, like classes in Python which you can toss around without thinking about it.

Is there another way using inference instead, that the compiler can decide what is borrow and what is not? If you look at the language Lobster, it uses static analysis to reduce the number of reference counts.

2 replies

nmsmith Sep 3, 2024

I bet they will hiss and spit.

If people who "hiss and spit" want to stay away from Mojo, I'd say that's a good thing for everyone.

ErenBalatkan Sep 3, 2024

I really wonder if Rust like lifetimes is a proper fit for Mojo. Mojo seems to be targeting programmers who are using Python and that for convenience, they want things done as easily as possible and don't want to deal with memory management. What happens when you present a Python programmer with Rust like lifetimes? I bet they will hiss and spit. People who like Rust will continue to use Rust.

Many languages seems to be jumping on a Rust like single ownership model today but that only solves half of the problem. There is still multiple ownership, like classes in Python which you can toss around without thinking about it.

Is there another way using inference instead, that the compiler can decide what is borrow and what is not? If you look at the language Lobster, it uses static analysis to reduce the number of reference counts.

Mojo's approach while similar is also different than Rust's approach. For starters Mojo has much better defaults and comes with automatic inference on JIT methods (ones defined with def). I doubt a python programmer will even notice the difference

Lifetimes + Provenance proposals #338

lattner Jun 6, 2023 Maintainer

Replies: 24 comments · 54 replies

lattner Jun 11, 2023 Maintainer Author

lattner Jun 11, 2023 Maintainer Author

lattner Jun 11, 2023 Maintainer Author

lattner Jun 11, 2023 Maintainer Author

lattner Aug 14, 2023 Maintainer Author

lattner Jun 11, 2023 Maintainer Author

lattner Jun 11, 2023 Maintainer Author

lattner Jun 11, 2023 Maintainer Author

lattner Jun 11, 2023 Maintainer Author

lattner Jun 11, 2023 Maintainer Author

The basic idea

Part 1: Returning references

Polymorphism over mutability

Compatibility with access modifiers

Part 2: Storing references in arguments

Summary

lattner
Jun 6, 2023
Maintainer

Replies: 24 comments 54 replies

lattner Jun 11, 2023
Maintainer Author

lattner Jun 11, 2023
Maintainer Author

lattner Jun 11, 2023
Maintainer Author

lattner Jun 11, 2023
Maintainer Author

lattner Aug 14, 2023
Maintainer Author

lattner Jun 11, 2023
Maintainer Author

lattner Jun 11, 2023
Maintainer Author

lattner Jun 11, 2023
Maintainer Author

lattner Jun 11, 2023
Maintainer Author

lattner Jun 11, 2023
Maintainer Author