Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stage_ros crashes #66

Open
corot opened this issue Sep 19, 2016 · 18 comments
Open

stage_ros crashes #66

corot opened this issue Sep 19, 2016 · 18 comments

Comments

@corot
Copy link
Contributor

corot commented Sep 19, 2016

I experience a variety of crashes while running Stage within stage_ros node. I put some of the here. I tried to compile the code on debug to provide better traces, but then stage_ros crashes at startup on libGLU.so library. So stack traces are not very meaningful, sorry. My wild guess is that it's all about memory management, as in one crash (I didn't recorded the bt) it mentioned "doubly freed memory", and also failures tend to increase when my PC has been working for a while (and so RAM gets low). I'll try to provide more information, but meanwhile... did anyone experience similar problems?
Thanks!

EXAMPLE CRASHES

This is the most common:

(gdb) bt
#0  0x00007ffff69827b6 in Stg::World::Raytrace(Stg::Ray const&) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#1  0x00007ffff6979c5e in Stg::ModelRanger::Sensor::Update(Stg::ModelRanger*) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#2  0x00007ffff6979e52 in Stg::ModelRanger::Update() () from /opt/ros/indigo/lib/libstage.so.4.1.1
#3  0x00007ffff6969bca in Stg::Model::UpdateWrapper(Stg::Model*, void*) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#4  0x00007ffff69833e8 in Stg::World::ConsumeQueue(unsigned int) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#5  0x00007ffff698346e in Stg::World::update_thread_entry(std::pair<Stg::World*, int>*) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#6  0x00007ffff6bbb184 in start_thread (arg=0x7fffdac38700) at pthread_create.c:312
#7  0x00007ffff5bac37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

I saw this one sometimes:

Program received signal SIGSEGV, Segmentation fault.
__memmove_ssse3_back () at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:1517
1517    ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S: No such file or directory.
(gdb) bt
#0  __memmove_ssse3_back () at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:1517
#1  0x00007ffff695fa75 in std::vector<Stg::Block*, std::allocator<Stg::Block*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<Stg::Block**, std::vector<Stg::Block*, std::allocator<Stg::Block*> > >, Stg::Block* const&) ()
   from /opt/ros/indigo/lib/libstage.so.4.1.1
#2  0x00007ffff697dc9a in Stg::Cell::AddBlock(Stg::Block*, unsigned int) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#3  0x00007ffff6985327 in Stg::World::MapPoly(std::vector<Stg::point_int_t, std::allocator<Stg::point_int_t> > const&, Stg::Block*, unsigned int) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#4  0x00007ffff695e08d in Stg::Block::Map(unsigned int) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#5  0x00007ffff695ec17 in Stg::BlockGroup::Map(unsigned int) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#6  0x00007ffff696581c in Stg::Model::Map(unsigned int) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#7  0x00007ffff6965878 in Stg::Model::MapWithChildren(unsigned int) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#8  0x00007ffff69753f6 in Stg::ModelPosition::Move() () from /opt/ros/indigo/lib/libstage.so.4.1.1
#9  0x00007ffff69835fd in Stg::World::Update() () from /opt/ros/indigo/lib/libstage.so.4.1.1
#10 0x00007ffff699d550 in Stg::WorldGui::Update() () from /opt/ros/indigo/lib/libstage.so.4.1.1
#11 0x00007ffff66c0f18 in Fl::wait(double) () from /usr/lib/x86_64-linux-gnu/libfltk.so.1.1
#12 0x0000000000465605 in main ()

And this only once:

(gdb) #0  0x00007ffff697dc57 in Stg::Cell::AddBlock(Stg::Block*, unsigned int) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#1  0x00007ffff6985327 in Stg::World::MapPoly(std::vector<Stg::point_int_t, std::allocator<Stg::point_int_t> > const&, Stg::Block*, unsigned int) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#2  0x00007ffff695e08d in Stg::Block::Map(unsigned int) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#3  0x00007ffff695ec17 in Stg::BlockGroup::Map(unsigned int) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#4  0x00007ffff696581c in Stg::Model::Map(unsigned int) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#5  0x00007ffff6965878 in Stg::Model::MapWithChildren(unsigned int) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#6  0x00007ffff696651b in Stg::Model::SetPose(Stg::Pose const&) () from /opt/ros/indigo/lib/libstage.so.4.1.1
#7  0x000000000045fd62 in StageNode::cb_reset_srv(std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&) ()
#8  0x0000000000487138 in boost::_mfi::mf2<bool, StageNode, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&>::operator()(StageNode*, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&) const ()
#9  0x00000000004845b3 in bool boost::_bi::list3<boost::_bi::value<StageNode*>, boost::arg<1>, boost::arg<2> >::operator()<bool, boost::_mfi::mf2<bool, StageNode, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&>, boost::_bi::list2<std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&> >(boost::_bi::type<bool>, boost::_mfi::mf2<bool, StageNode, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&>&, boost::_bi::list2<std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&>&, long) ()
#10 0x0000000000482698 in bool boost::_bi::bind_t<bool, boost::_mfi::mf2<bool, StageNode, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&>, boost::_bi::list3<boost::_bi::value<StageNode*>, boost::arg<1>, boost::arg<2> > >::operator()<std_srvs::EmptyRequest_<std::allocator<void> >, std_srvs::EmptyResponse_<std::allocator<void> > >(std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&) ()
#11 0x000000000047f693 in boost::detail::function::function_obj_invoker2<boost::_bi::bind_t<bool, boost::_mfi::mf2<bool, StageNode, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&>, boost::_bi::list3<boost::_bi::value<StageNode*>, boost::arg<1>, boost::arg<2> > >, bool, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&>::invoke(boost::detail::function::function_buffer&, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&) ()
#12 0x0000000000494cf6 in boost::function2<bool, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&>::operator()(std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&) const ()
#13 0x0000000000494257 in ros::ServiceSpec<std_srvs::EmptyRequest_<std::allocator<void> >, std_srvs::EmptyResponse_<std::allocator<void> > >::call(boost::function<bool (std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&)> const&, ros::ServiceSpecCallParams<std_srvs::EmptyRequest_<std::allocator<void> >, std_srvs::EmptyResponse_<std::allocator<void> > >&) ()
#14 0x0000000000492802 in ros::ServiceCallbackHelperT<ros::ServiceSpec<std_srvs::EmptyRequest_<std::allocator<void> >, std_srvs::EmptyResponse_<std::allocator<void> > > >::call(ros::ServiceCallbackHelperCallParams&) ()
#15 0x00007ffff78cea8a in ros::ServiceCallback::call() () from /opt/ros/indigo/lib/libroscpp.so
#16 0x00007ffff7911107 in ros::CallbackQueue::callOneCB(ros::CallbackQueue::TLS*) () from /opt/ros/indigo/lib/libroscpp.so
#17 0x00007ffff7911c33 in ros::CallbackQueue::callAvailable(ros::WallDuration) () from /opt/ros/indigo/lib/libroscpp.so
#18 0x00007ffff795a1e5 in ros::SingleThreadedSpinner::spin(ros::CallbackQueue*) () from /opt/ros/indigo/lib/libroscpp.so
#19 0x00007ffff7941e0b in ros::spin() () from /opt/ros/indigo/lib/libroscpp.so
#20 0x0000000000494bad in void boost::_bi::list0::operator()<void (*)(), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(), boost::_bi::list0&, int) ()
#21 0x0000000000493e65 in boost::_bi::bind_t<void, void (*)(), boost::_bi::list0>::operator()() ()
#22 0x000000000049259e in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(), boost::_bi::list0> >::run() ()
#23 0x00007ffff6ddca4a in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0
#24 0x00007ffff6bbb184 in start_thread (arg=0x7fffda437700) at pthread_create.c:312
#25 0x00007ffff5bac37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
@onlytailei
Copy link

same problem, any feedback?

@corot
Copy link
Contributor Author

corot commented Aug 21, 2018

Hi, I'm using this fork https://github.com/CodeFinder2/Stage and crashes are gone. But it's not maintained anymore
I have also noticed that @rakeshshrestha31 has some fixes in his fork.
would be really nice if they can PR the fixes on upstream

@rtv
Copy link
Owner

rtv commented Aug 21, 2018 via email

@rtv
Copy link
Owner

rtv commented Aug 21, 2018 via email

@rakeshshrestha31
Copy link

The fixes in my fork are actually for dealing with the problems that come when we have to delete the stage world pointer to reallocate it again. I needed this for my project (which doesn't use stage_ros). stage_ros does not do this memory release so I doubt that my changes will affect this very issues.

@corot
Copy link
Contributor Author

corot commented Aug 21, 2018

Thanks for informing; yes, I also noted that @rtv's fork is well ahead from upstream. Looking forward to the next release!
I'll keep this issue open meanwhile for general information

@onlytailei
Copy link

@corot @rtv
Hi, thanks for informing. I tried https://github.com/CodeFinder2/Stage and https://github.com/rtv/stage_ros, but the same error happened again.
Do I need to test on some specific commit? or the master branch now?

@corot
Copy link
Contributor Author

corot commented Aug 23, 2018

Hi, are you sure you are compiling stage_ros against the forked Stage? Catkin will always prefer the version installed on /opt/ros, so you must make it take the new version, e.g.:

-find_package(stage REQUIRED)
+# Find stage using pkg-config, so we don't require the ROS package
+find_package(PkgConfig REQUIRED)
+pkg_check_modules(Stage REQUIRED stage>=4.2)

@onlytailei
Copy link

onlytailei commented Aug 23, 2018

Yes. I removed the stage in /opt/ros/kinetic/ when I compiled it from the forked Stage.
It seems that if I only run one stage, it is fine.
But when I try to run two different stage worlds in two roscores (with different ports) in one machine, one of them will crash after several hours with the raytrace error above.

@corot
Copy link
Contributor Author

corot commented Aug 23, 2018 via email

@onlytailei
Copy link

Nothing changed. It still happens frequently. The longest record is running for 7 hours.
I will try @rakeshshrestha31's fork as next step.

@onlytailei
Copy link

onlytailei commented Aug 29, 2018

It seems that the more frequently I call positionmodel->setpose(), the more frequently it happens.
But it still exists no matter what version I choose.

@rakeshshrestha31
Copy link

That's interesting. I've never come across this problem and I've used my fork (and also the upstream) for long running experiments.
I suggest that you build both the stage library and stage_ros in DEBUG or RELWITHDEBINFO mode and run the program using gdb (or other debuggers). Then going to the innermost frame of stagelib we can find which line is throwing the error.
I see that @corot posted his backtrace but didn't post the exact line where the crash was happening. I think that could help figuring out how to fix the problem.

@onlytailei
Copy link

onlytailei commented Aug 30, 2018

@rakeshshrestha31 @corot @rtv Thanks for informing.
Here is what I get from all built with DEBUG
The first one happened both for @rakeshshrestha31's branch and https://github.com/rtv/Stage when I use the reset_positions service to call this->positionmodels[r]->SetPose(this->initial_poses[r]) in https://github.com/rtv/stage_ros.
It seems all about the region count. What does the count mean here?

Further tests showed that it is fine to just call the reset_positions services repeatedly without sending any moving commands to robots. But if I let robots move for several steps and then call "reset_positions", and repeat this whole process, the bug will apear soon at the service callback function.

I test it with https://github.com/rtv/stage_ros and 30 robots (positionmodels) in one stage world both in gui and headless modes.

stageros: /XXX/Stage/libstage/region.hh:80: Stg::Cell* Stg::Region::GetCell(int32_t, int32_t): Assertion `count == 0' failed.
stageros: /XXX/Stage/libstage/region.hh:80: Stg::Cell* Stg::Region::GetCell(int32_t, int32_t): Assertion `count == 0' failed.

Thread 7 "stageros" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe1c3f700 (LWP 19756)]
0x00007ffff5bb3428 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) where
#0  0x00007ffff5bb3428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff5bb502a in __GI_abort () at abort.c:89
#2  0x00007ffff5babbd7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x7ffff6910bd5 "count == 0", file=file@entry=0x7ffff6910ba0 "/xxxx/Stage/libstage/region.hh", line=line@entry=80, 
    function=function@entry=0x7ffff69112e0 <Stg::Region::GetCell(int, int)::__PRETTY_FUNCTION__> "Stg::Cell* Stg::Region::GetCell(int32_t, int32_t)") at assert.c:92
#3  0x00007ffff5babc82 in __GI___assert_fail (assertion=0x7ffff6910bd5 "count == 0", file=0x7ffff6910ba0 "/xxxx/Stage/libstage/region.hh", line=80, function=0x7ffff69112e0 <Stg::Region::GetCell(int, int)::__PRETTY_FUNCTION__> "Stg::Cell* Stg::Region::GetCell(int32_t, int32_t)")
    at assert.c:101
#4  0x00007ffff68d11ac in Stg::Region::GetCell (this=0x7815d8, x=23, y=17) at /xxxx/Stage/libstage/region.hh:80
#5  0x00007ffff68d0902 in Stg::World::MapPoly (this=0x732a10, pts=..., block=0x32b2a10, layer=0) at /xxxx/Stage/libstage/world.cc:1028
#6  0x00007ffff687966c in Stg::Block::Map (this=0x32b2a10, layer=0) at /xxxx/Stage/libstage/block.cc:174
#7  0x00007ffff687d9a3 in Stg::BlockGroup::Map (this=0x32b19c0, layer=0) at /xxxx/Stage/libstage/blockgroup.cc:120
#8  0x00007ffff688dc65 in Stg::Model::Map (this=0x32b18f0, layer=0) at /xxxx/Stage/libstage/model.cc:811
#9  0x00007ffff688c863 in Stg::Model::MapWithChildren (this=0x32b18f0, layer=0) at /xxxx/Stage/libstage/model.cc:490
#10 0x00007ffff688f504 in Stg::Model::SetPose (this=0x32b18f0, newpose=...) at /xxxx/Stage/libstage/model.cc:1233
#11 0x000000000046d0a1 in StageNode::cb_reset_srv(std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&) ()
#12 0x000000000048edaa in boost::_mfi::mf2<bool, StageNode, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&>::operator()(StageNode*, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&) const ()
#13 0x000000000048c263 in bool boost::_bi::list3<boost::_bi::value<StageNode*>, boost::arg<1>, boost::arg<2> >::operator()<bool, boost::_mfi::mf2<bool, StageNode, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&>, boost::_bi::list2<std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&> >(boost::_bi::type<bool>, boost::_mfi::mf2<bool, StageNode, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&>&, boost::_bi::list2<std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&>&, long) ()
#14 0x0000000000489bcf in bool boost::_bi::bind_t<bool, boost::_mfi::mf2<bool, StageNode, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&>, boost::_bi::list3<boost::_bi::value<StageNode*>, boost::arg<1>, boost::arg<2> > >::operator()<std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&>(std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&) ()
#15 0x0000000000486f0f in boost::detail::function::function_obj_invoker2<boost::_bi::bind_t<bool, boost::_mfi::mf2<bool, StageNode, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&>, boost::_bi::list3<boost::_bi::value<StageNode*>, boost::arg<1>, boost::arg<2> > >, bool, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&>::invoke(boost::detail::function::function_buffer&, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&) ()
#16 0x00000000004a0063 in boost::function2<bool, std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&>::operator()(std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&) const ()
#17 0x000000000049ebf8 in ros::ServiceSpec<std_srvs::EmptyRequest_<std::allocator<void> >, std_srvs::EmptyResponse_<std::allocator<void> > >::call(boost::function<bool (std_srvs::EmptyRequest_<std::allocator<void> >&, std_srvs::EmptyResponse_<std::allocator<void> >&)> const&, ros::ServiceSpecCallParams<std_srvs::EmptyRequest_<std::allocator<void> >, std_srvs::EmptyResponse_<std::allocator<void> > >&) ()
#18 0x000000000049cf91 in ros::ServiceCallbackHelperT<ros::ServiceSpec<std_srvs::EmptyRequest_<std::allocator<void> >, std_srvs::EmptyResponse_<std::allocator<void> > > >::call(ros::ServiceCallbackHelperCallParams&) ()
#19 0x00007ffff7899bf1 in ros::ServiceCallback::call() () from /opt/ros/kinetic/lib/libroscpp.so
#20 0x00007ffff78ed210 in ros::CallbackQueue::callOneCB(ros::CallbackQueue::TLS*) () from /opt/ros/kinetic/lib/libroscpp.so
#21 0x00007ffff78ee683 in ros::CallbackQueue::callAvailable(ros::WallDuration) () from /opt/ros/kinetic/lib/libroscpp.so
#22 0x00007ffff794b511 in ros::SingleThreadedSpinner::spin(ros::CallbackQueue*) () from /opt/ros/kinetic/lib/libroscpp.so
#23 0x00007ffff79304cb in ros::spin() () from /opt/ros/kinetic/lib/libroscpp.so
#24 0x000000000049fc07 in void boost::_bi::list0::operator()<void (*)(), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(), boost::_bi::list0&, int) ()
#25 0x000000000049e13e in boost::_bi::bind_t<void, void (*)(), boost::_bi::list0>::operator()() ()
#26 0x000000000049cb2a in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(), boost::_bi::list0> >::run() ()
#27 0x00007ffff6d885d5 in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.58.0
#28 0x00007ffff6b616ba in start_thread (arg=0x7fffe1c3f700) at pthread_create.c:333
#29 0x00007ffff5c8541d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Sometimes it is that:

#0  __memmove_avx_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:245
#1  0x00007ffff68c9161 in std::__copy_move<true, true, std::random_access_iterator_tag>::__copy_m<Stg::Block*> (__first=0x7fffd80659b0, __last=0x0, __result=0x936e960) at /usr/include/c++/5/bits/stl_algobase.h:384
#2  0x00007ffff68c9032 in std::__copy_move_a<true, Stg::Block**, Stg::Block**> (__first=0x7fffd80659b0, __last=0x0, __result=0x936e960) at /usr/include/c++/5/bits/stl_algobase.h:402
#3  0x00007ffff68c8da4 in std::__copy_move_a2<true, Stg::Block**, Stg::Block**> (__first=0x7fffd80659b0, __last=0x0, __result=0x936e960) at /usr/include/c++/5/bits/stl_algobase.h:440
#4  0x00007ffff68c8acc in std::copy<std::move_iterator<Stg::Block**>, Stg::Block**> (__first=..., __last=..., __result=0x936e960) at /usr/include/c++/5/bits/stl_algobase.h:472
#5  0x00007ffff68c85c5 in std::__uninitialized_copy<true>::__uninit_copy<std::move_iterator<Stg::Block**>, Stg::Block**> (__first=..., __last=..., __result=0x936e960) at /usr/include/c++/5/bits/stl_uninitialized.h:93
#6  0x00007ffff68c7ec1 in std::uninitialized_copy<std::move_iterator<Stg::Block**>, Stg::Block**> (__first=..., __last=..., __result=0x936e960) at /usr/include/c++/5/bits/stl_uninitialized.h:126
#7  0x00007ffff68c6f9e in std::__uninitialized_copy_a<std::move_iterator<Stg::Block**>, Stg::Block**, Stg::Block*> (__first=..., __last=..., __result=0x936e960) at /usr/include/c++/5/bits/stl_uninitialized.h:281
#8  0x00007ffff68d7aa9 in std::vector<Stg::Block*, std::allocator<Stg::Block*> >::_M_allocate_and_copy<std::move_iterator<Stg::Block**> > (this=0x95b8d68, __n=8, __first=..., __last=...) at /usr/include/c++/5/bits/stl_vector.h:1227
#9  0x00007ffff68d5d0f in std::vector<Stg::Block*, std::allocator<Stg::Block*> >::reserve (this=0x95b8d68, __n=8) at /usr/include/c++/5/bits/vector.tcc:75
#10 0x00007ffff68d563b in Stg::Cell::Cell (this=0x95b8d50) at /home/tai/ws/icra_2019/ours/Stage/libstage/region.hh:54
#11 0x00007ffff68e2ade in std::_Construct<Stg::Cell> (__p=0x95b8d50) at /usr/include/c++/5/bits/stl_construct.h:75
#12 0x00007ffff68e17a7 in std::__uninitialized_default_n_1<false>::__uninit_default_n<Stg::Cell*, unsigned long> (__first=0x95b2610, __n=552) at /usr/include/c++/5/bits/stl_uninitialized.h:519
#13 0x00007ffff68deb9e in std::__uninitialized_default_n<Stg::Cell*, unsigned long> (__first=0x95b2610, __n=1024) at /usr/include/c++/5/bits/stl_uninitialized.h:575
#14 0x00007ffff68da97f in std::__uninitialized_default_n_a<Stg::Cell*, unsigned long, Stg::Cell> (__first=0x95b2610, __n=1024) at /usr/include/c++/5/bits/stl_uninitialized.h:637
#15 0x00007ffff68d7b64 in std::vector<Stg::Cell, std::allocator<Stg::Cell> >::_M_default_append (this=0x77b668, __n=1024) at /usr/include/c++/5/bits/vector.tcc:549
#16 0x00007ffff68d5e41 in std::vector<Stg::Cell, std::allocator<Stg::Cell> >::resize (this=0x77b668, __new_size=1024) at /usr/include/c++/5/bits/stl_vector.h:676
#17 0x00007ffff68d56e1 in Stg::Region::GetCell (this=0x77b668, x=20, y=31) at /home/tai/ws/icra_2019/ours/Stage/libstage/region.hh:82
#18 0x00007ffff68d4e72 in Stg::World::MapPoly (this=0x730930, pts=..., block=0x95c29b0, layer=0) at /home/tai/ws/icra_2019/ours/Stage/libstage/world.cc:1054
#19 0x00007ffff687ffe8 in Stg::Block::Map (this=0x95c29b0, layer=0) at /home/tai/ws/icra_2019/ours/Stage/libstage/block.cc:174
#20 0x00007ffff688465d in Stg::BlockGroup::Map (this=0x9597130, layer=0) at /home/tai/ws/icra_2019/ours/Stage/libstage/blockgroup.cc:120
#21 0x00007ffff6893223 in Stg::Model::Map (this=0x9597060, layer=0) at /home/tai/ws/icra_2019/ours/Stage/libstage/model.cc:811
#22 0x00007ffff6891e41 in Stg::Model::MapWithChildren (this=0x9597060, layer=0) at /home/tai/ws/icra_2019/ours/Stage/libstage/model.cc:490
#23 0x00007ffff68ba4b0 in Stg::ModelPosition::Move (this=0x9597060) at /home/tai/ws/icra_2019/ours/Stage/libstage/model_position.cc:552
#24 0x00007ffff68d36b9 in Stg::World::Update (this=0x730930) at /home/tai/ws/icra_2019/ours/Stage/libstage/world.cc:674
#25 0x00007ffff68ff472 in Stg::WorldGui::Update (this=0x730930) at /home/tai/ws/icra_2019/ours/Stage/libstage/worldgui.cc:428
#26 0x00007ffff68ff369 in Stg::WorldGui::UpdateCallback (world=0x730930) at /home/tai/ws/icra_2019/ours/Stage/libstage/worldgui.cc:408
#27 0x00007ffff3251bcd in Fl::wait(double) () from /usr/lib/x86_64-linux-gnu/libfltk.so.1.3
#28 0x00007ffff3251d3d in Fl::wait() () from /usr/lib/x86_64-linux-gnu/libfltk.so.1.3
#29 0x00007ffff68d1c0c in Stg::World::Run () at /home/tai/ws/icra_2019/ours/Stage/libstage/world.cc:228
#30 0x000000000046f6b6 in main (argc=2, argv=0x7fffffffcc68) at /home/tai/catkin_ws/src/icra_2019/stage_ros/src/stageros.cpp:598

and

Thread 1 "stageros" received signal SIGSEGV, Segmentation fault.
0x00007ffff68c5a1c in Stg::Region::AddBlock (this=0x0)
    at /home/tai/ws/icra_2019/ours/Stage/libstage/region.cc:21
21        ++count;
(gdb) where
#0  0x00007ffff68c5a1c in Stg::Region::AddBlock (this=0x0) at /home/tai/ws/icra_2019/ours/Stage/libstage/region.cc:21
#1  0x00007ffff68c5832 in Stg::Cell::AddBlock (this=0x9494120, b=0x94640a0, layer=0) at /home/tai/ws/icra_2019/ours/Stage/libstage/region.cc:310
#2  0x00007ffff68d4ed3 in Stg::World::MapPoly (this=0x730930, pts=..., block=0x94640a0, layer=0) at /home/tai/ws/icra_2019/ours/Stage/libstage/world.cc:1062
#3  0x00007ffff687ffe8 in Stg::Block::Map (this=0x94640a0, layer=0) at /home/tai/ws/icra_2019/ours/Stage/libstage/block.cc:174
#4  0x00007ffff688465d in Stg::BlockGroup::Map (this=0x94e6870, layer=0) at /home/tai/ws/icra_2019/ours/Stage/libstage/blockgroup.cc:120
#5  0x00007ffff6893223 in Stg::Model::Map (this=0x94e67a0, layer=0) at /home/tai/ws/icra_2019/ours/Stage/libstage/model.cc:811
#6  0x00007ffff6891e41 in Stg::Model::MapWithChildren (this=0x94e67a0, layer=0) at /home/tai/ws/icra_2019/ours/Stage/libstage/model.cc:490
#7  0x00007ffff68ba4b0 in Stg::ModelPosition::Move (this=0x94e67a0) at /home/tai/ws/icra_2019/ours/Stage/libstage/model_position.cc:552
#8  0x00007ffff68d36b9 in Stg::World::Update (this=0x730930) at /home/tai/ws/icra_2019/ours/Stage/libstage/world.cc:674
#9  0x00007ffff68ff472 in Stg::WorldGui::Update (this=0x730930) at /home/tai/ws/icra_2019/ours/Stage/libstage/worldgui.cc:428
#10 0x00007ffff68ff369 in Stg::WorldGui::UpdateCallback (world=0x730930) at /home/tai/ws/icra_2019/ours/Stage/libstage/worldgui.cc:408
#11 0x00007ffff3251bcd in Fl::wait(double) () from /usr/lib/x86_64-linux-gnu/libfltk.so.1.3
#12 0x00007ffff3251d3d in Fl::wait() () from /usr/lib/x86_64-linux-gnu/libfltk.so.1.3
#13 0x00007ffff68d1c0c in Stg::World::Run () at /home/tai/ws/icra_2019/ours/Stage/libstage/world.cc:228
#14 0x000000000046f6b6 in main (argc=2, argv=0x7fffffffcc68) at /home/tai/catkin_ws/src/icra_2019/stage_ros/src/stageros.cpp:598

This is the raytrace error when I use rtv/Stage.

stageros: /xxxx/Stage/libstage/world.cc:843: Stg::RaytraceResult Stg::World::Raytrace(const Stg::Ray&): Assertion `block' failed.

Thread 1 "stageros" received signal SIGABRT, Aborted.
0x00007ffff5bb3428 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) where
#0  0x00007ffff5bb3428 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff5bb502a in __GI_abort () at abort.c:89
#2  0x00007ffff5babbd7 in __assert_fail_base (fmt=<optimized out>, 
    assertion=assertion@entry=0x7ffff6910f5c "block", 
    file=file@entry=0x7ffff6910be0 "/xxxx/Stage/libstage/world.cc", line=line@entry=843, 
    function=function@entry=0x7ffff69111e0 <Stg::World::Raytrace(Stg::Ray const&)::__PRETTY_FUNCTION__> "Stg::RaytraceResult Stg::World::Raytrace(const Stg::Ray&)") at assert.c:92
#3  0x00007ffff5babc82 in __GI___assert_fail (
    assertion=0x7ffff6910f5c "block", 
    file=0x7ffff6910be0 "/xxxx/Stage/libstage/world.cc", 
    line=843, 
    function=0x7ffff69111e0 <Stg::World::Raytrace(Stg::Ray const&)::__PRETTY_FUNCTION__> "Stg::RaytraceResult Stg::World::Raytrace(const Stg::Ray&)")
    at assert.c:101
#4  0x00007ffff68cfeb6 in Stg::World::Raytrace (this=0x732930, r=...)
    at /xxxx/Stage/libstage/world.cc:843
#5  0x00007ffff68b8ab8 in Stg::ModelRanger::Sensor::Update (this=0x33cdd30, 
    mod=0x33728e0)
    at /xxxx/Stage/libstage/model_ranger.cc:240
#6  0x00007ffff68b8770 in Stg::ModelRanger::Update (this=0x33728e0)
---Type <return> to continue, or q <return> to quit---
    at /xxxx/Stage/libstage/model_ranger.cc:206
#7  0x00007ffff6891efa in Stg::Model::UpdateWrapper (mod=0x33728e0)
    at /xxxx/Stage/libstage/stage.hh:2019
#8  0x00007ffff68cee80 in Stg::World::ConsumeQueue (this=0x732930, queue_num=0)
    at /xxxx/Stage/libstage/world.cc:601
#9  0x00007ffff68cf09b in Stg::World::Update (this=0x732930)
    at /xxxx/Stage/libstage/world.cc:635
#10 0x00007ffff68cd611 in Stg::World::UpdateAll ()
    at /xxxx/Stage/libstage/world.cc:221
#11 0x00007ffff68cd577 in Stg::World::Run ()
    at /xxxx/libstage/world.cc:211
#12 0x000000000047025a in main ()

@rakeshshrestha31
Copy link

rakeshshrestha31 commented Aug 30, 2018

Thanks for the detailed logs. I have also used ROS services to teleport the robot to a specific pose. I haven't seen these issues though.

I see that the AVX optimization caused memory access error in one of the traces. Maybe disable this optimization (using -O0 optimize option instead of -O2 option to not use optimization altogether via the CMakeLists.txt file). Multiple issues related to AVX have been documented for different packages and I've had such troubles myself. (Side note: the debug mode is implicitly using an optimization option it seems. Maybe need to set the optimization option for debug mode explicitly. Also try clearing the build files before rebuilding)

Other failure cases might also be stemming from the same root cause (AVX optimization). Just a guess though.

As for the specifics of the stage library (like the region count), I guess @rtv would be able to answer it better.

@onlytailei
Copy link

Thanks

Further tests showed that it is fine to just call the reset_positions services repeatedly without sending any moving commands to robots. But if I let robots move for several steps and then call "reset_positions", and repeat this whole process, the bug will appear soon at the service callback function.

I test it with https://github.com/rtv/stage_ros and 30 robots (positionmodels) in one stage world both in GUI and headless modes.

@onlytailei
Copy link

Further test,

After several actions, sleep for some seconds before resetting the positions of the robots.
Problem solved...

@i-am-neet
Copy link

Further test,

After several actions, sleep for some seconds before resetting the positions of the robots.
Problem solved...

Hi, I got same problem that you raised as above.

I simulated 4 robots, and publish /cmd_vel & call reset_position with command line. It will crash at unknown moment (speed up will crash faster)

Do you have any solution now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants