Way for Python wrapper over C++ class to borrow reference, released on garbage collection? #770
Replies: 3 comments 2 replies
-
After doing some digging in the docs it turns out I should be using With this: Object* cpp_self = (Object*) nb::inst_ptr<Object>(self_handle); The hack seems to work as expected. Really enjoying using nanobind so far by the way ❤️. Looks like a really awesome tool so far. |
Beta Was this translation helpful? Give feedback.
-
What you are doing here seems quite unsafe/dangerous. Did you see the |
Beta Was this translation helpful? Give feedback.
-
The approach I've described above is certainly non-standard (it is a hack after all), nonetheless it seems safer than what I have been able to achieve with nanobind's default options. When applied to situations common in my application nanobind's default options seem to either 1) consistently produce memory leaks, 2) consistently produce segfaults or 3) make copies when I really wanted references. For context, I'm currently porting some code that had been written with numba to C++. The original numba code was tested pretty comprehensively in terms of memory leaks, and I'm hoping to achieve the same behavior with nanobind. Of course, I'd rather use standard nanobind approaches over my hack, if that can be accomplished without compromising, the safety, speed, and functionality of the program (there does already seem to be a considerable improvement in speed... so off to a good start). My hack is a direct reimplementation of numba's memory management approach, where memory is always owned by intrusive reference counted objects allocated by the Numba Runtime (NRT) (i.e. by the compiled C/C++-like parts), and when an object is wrapped in a PyObject, that PyObject just borrows a single intrusive (NRT-side) reference to the allocated object which is released when the PyObject is garbage collected. I was aware of struct MyClass {
public:
MyField &field() { return m_field; }
private:
MyField m_field;
};
nb::class_<MyClass>(m, "MyClass")
.def("field", &MyClass::field, nb::rv_policy::reference_internal); My impression is that Consider the following example, and imagine that
#include <iostream>
#include <string>
#include <atomic>
#include <memory>
// Note: Need to be included in this order
#include <nanobind/nanobind.h>
#include <nanobind/intrusive/counter.h>
#include <nanobind/intrusive/ref.h>
#include <nanobind/intrusive/counter.inl>
#include <nanobind/stl/string.h>
namespace nb = nanobind;
using namespace nb::literals;
using nb::ref;
using nb::intrusive_base;
int total_allocated_objs = 0 ;
class Object : public intrusive_base {
public:
uint64_t get_refcount() {
// Needed to edit intrusive_base to expose this
return intrusive_base::get_refcount();
}
};
struct Dog : public Object{
std::string name;
int age;
Dog(std::string _name, int _age=1) :
Object(), name(_name), age(_age) {
}
static ref<Dog> make(std::string _name, int _age=1){
total_allocated_objs++;
return new Dog(_name, _age);
}
~Dog(){
total_allocated_objs--;
std::cout << "Doggie Died: " << name << " @" << uint64_t(this) << std::endl;
}
};
struct Kennel : public Object {
ref<Dog> dog;
Kennel() : Object() {}
static ref<Kennel> make(std::string _name, int _age=1){
total_allocated_objs += 2;
Kennel* kennel = new Kennel();
kennel->dog = new Dog(_name, _age);
// std::cout << "K dog refcount(2):" << kennel->dog->get_refcount() << std::endl;
return kennel;
}
~Kennel(){
total_allocated_objs--;
std::cout << "Kennel of " << dog->name << " Died: @" << uint64_t(this) << std::endl;
}
};
NB_MODULE(my_ext, m) {
nb::class_<Object>(
m, "Object",
nb::intrusive_ptr<Object>(
[](Object *o, PyObject *po) noexcept { o->set_self_py(po); })
);
nb::intrusive_init(
[](PyObject *o) noexcept {
nb::gil_scoped_acquire guard;
Py_INCREF(o);
},
[](PyObject *o) noexcept {
nb::gil_scoped_acquire guard;
Py_DECREF(o);
});
nb::class_<Dog>(m, "Dog")
.def(nb::new_(&Dog::make), "name"_a="fido", "age"_a=1, nb::rv_policy::reference)
.def_rw("name", &Dog::name)
.def("get_refcount", &Object::get_refcount)
;
nb::class_<Kennel>(m, "Kennel")
.def(nb::new_(&Kennel::make), "name"_a="fido", "age"_a=1, nb::rv_policy::reference)
.def_rw("dog", &Kennel::dog, nb::rv_policy::reference_internal)
.def("get_refcount", &Object::get_refcount)
;
m.def("leaked_objects", [](){return total_allocated_objs;});
} Now we test in Python from dummer_ext import Dog, Kennel, leaked_objects
d = Dog("woofy")
d = None
print("^ 'woofy' Shoulda died.\n")
k = Kennel("scratchy", 50)
# print("kennel:", k.get_refcount())
k = None
print("^ Kennel Shoulda died.")
print("^ 'scratchy' Shoulda died.\n")
k = Kennel("buddy", 64)
# print('k->buddy (1)=2,', k.dog.get_refcount())
# print('k->buddy (2)=2,', k.dog.get_refcount())
# print('k->buddy (3)=2,', k.dog.get_refcount())
# print("kennel:", k.get_refcount())
d = k.dog
print("k.dog is k.dog: ", d is k.dog)
# print('k->buddy (4)=3,', k.dog.get_refcount())
k = None
print("^ Kennel Shoulda died.")
# print('k->buddy (5)=2,', d.get_refcount())
d = None
print("^ 'buddy' Shoulda died.")
print("\n--- End of Program --")
print("Leaked Objects: ", leaked_objects()) Doggie Died: woofy @94112506739936
^ 'woofy' Shoulda died.
Kennel of scratchy Died: @94112506754592
Doggie Died: scratchy @94112506739936
^ Kennel Shoulda died.
^ 'scratchy' Shoulda died.
Kennel of buddy Died: @94112506754592
Doggie Died: buddy @94112506739936
[1] 60022 segmentation fault (core dumped) python3.10 ../python/tests/run_broken_dog.py For me this segfaults on The only approach that passes this simple test without faults or leaks is the version where I hack tp_dealloc to call dec_ref() on garbage collection, and additionally make sure the base Object class holds a pointer to each PyObject when a new one is created (that pointer is null'ed on garbage collection). I'm sure your time is limited. I'm certainly not asking for a solution, perhaps just a discussion. After all, I already have a robust solution---it just happens to work around nanobind instead of with it. Feel free to show me wrong and share a setup that passes this test, although this toy example is really only tests a small fraction of the flexibility that I need. I'm mostly sharing this because it seems odd to me that, as far as I can tell, nanobind doesn't support this very simple pattern of memory management which seems quite common, and affords a lot more flexibility what I've been able to produce with nanobind's current defaults. |
Beta Was this translation helpful? Give feedback.
-
I'm wondering if I'm missing something because it seems that nanobind doesn't implement a very simple pattern of object ownership that I've seen in other Python <-> compiled code interfaces like numba.
I was hoping that I could use nanobind to have objects with intrusive reference counting that would just have their Python wrappers grab one reference (i.e. incr_ref()) when they are created, and release that reference (i.e. dec_ref()) when those Python objects are garbage collected.
It seems like most of the options in nanobind enforce a policy of releasing ownership to Python when they are wrapped. I think
shared_ptr
might be an exception, but I'd really rather avoid usingshared_ptr
(for performance reasons, and general distaste). It's unclear to me why releasing ownership to Python is encouraged as the default in many cases (particularly with intrusive reference counting). In my application, I have situations where objects are kept in containers. Any policy that transferred ownership of an object to Python would cause the following nasty failure mode:Is there any way to achieve the object ownership pattern that I am considering here to avoid this issue?
I have most of a hacky solution worked out that involves:
I'm stuck with trying to implement this workaround because I cannot figure out how to get a pointer to the wrapped C++ object from the PyObject* pointer that is passed into the new tp_dealloc implementation. I'm getting std::bad_alloc when I run:
Anyway, I am wondering if I am missing something essential here or if I really need to resort to a hack like this to achieve what I'm trying to do?
P.S. The documentation for intrusive reference counting is a bit misleading. It seems to imply that one could implement their own Object class and expect everything to work, for instance with
ref<T>
. I'm fairly certain that I really need to implement my owndec_ref()
in my project because there are cases, where objects can be either written to buffers or malloced, in which casedec_ref()
shouldn't always delete on zero (it might need to just decref a pointer to its buffer object instead). I ended up being able to get by by using my own implementation ofref<T>
instead since the builtin one seems to assume that it is getting a subclass of intrusive_base (which the documentation describes as being an optional shortcut, not a requirement).Beta Was this translation helpful? Give feedback.
All reactions