Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDP bind error: Address family not supported by protocol #126

Open
andreucm opened this issue May 16, 2024 · 16 comments · May be fixed by #127
Open

UDP bind error: Address family not supported by protocol #126

andreucm opened this issue May 16, 2024 · 16 comments · May be fixed by #127

Comments

@andreucm
Copy link

andreucm commented May 16, 2024

Hi ,

Context

Host Machine:

  • ubuntu 20.04
  • ROS noetic
  • pf_lidar_ros_driver at main branch, commit 88bebcc
  • IP: 192.168.2.100
  • mask: 255.255.255.0

Device:

  • Type: OMD30M-R2000-B23-V1V1D-HD-1L
  • Address mode: static
  • IP: 192.168.2.3
  • mask: 255.255.255.0
  • Gateway: 192.168.2.1 or 0.0.0.0 (tried with both)
  • FW: 1.62
  • HW: 1.72
  • PFSDP: 1.04

Ping

  • Ping is ok from host machine to the device

ISSUE

From the host machine, I do:

  1. Edit the pf_lidar_ros_driver/src/pf_driver/config/r2000_params.yaml as:
R2000:
  transport: "udp"
  scanner_ip: "192.168.2.3"
  port: "3000" # optional
  device: "R2000"
  start_angle: -1800000
  max_num_points_scan: 0
  packet_type: "C"
  scan_topic: "/scan"
  frame_id: "scanner_link"
  watchdogtimeout: 60000
  watchdog: true
  1. roslaunch pf_driver r2000.launch
  2. and I get:
... logging to /home/beta/.ros/log/67512ddc-1357-11ef-962f-9cb6d0e16137/roslaunch-maupas-237492.log
Checking log directory for disk usage. This may take a while.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://127.0.0.1:42357/

SUMMARY
========

PARAMETERS
 * /r2000_node/R2000/device: R2000
 * /r2000_node/R2000/frame_id: scanner_link
 * /r2000_node/R2000/max_num_points_scan: 0
 * /r2000_node/R2000/packet_type: C
 * /r2000_node/R2000/port: 3000
 * /r2000_node/R2000/scan_topic: /scan
 * /r2000_node/R2000/scanner_ip: 192.168.2.3
 * /r2000_node/R2000/start_angle: -1800000
 * /r2000_node/R2000/transport: udp
 * /r2000_node/R2000/watchdog: True
 * /r2000_node/R2000/watchdogtimeout: 60000
 * /r2000_node/device: R2000
 * /robot_description: <?xml version="1....
 * /rosdistro: noetic
 * /rosversion: 1.16.0

NODES
  /
    r2000_node (pf_driver/ros_main)
    robot_state_publisher (robot_state_publisher/robot_state_publisher)
    rviz (rviz/rviz)

ROS_MASTER_URI=http://127.0.0.1:11311

process[robot_state_publisher-1]: started with pid [237508]
process[r2000_node-2]: started with pid [237509]
process[rviz-3]: started with pid [237510]
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-beta'
[ INFO] [/r2000_node]: Device found: R2000
terminate called after throwing an instance of 'boost::wrapexcept<boost::system::system_error>'
  what():  bind: Address family not supported by protocol
[r2000_node-2] process has died [pid 237509, exit code -6, cmd /home/beta/ros_ws/install/lib/pf_driver/ros_main __name:=r2000_node __log:=/home/beta/.ros/log/67512ddc-1357-11ef-962f-9cb6d0e16137/r2000_node-2.log].
log file: /home/beta/.ros/log/67512ddc-1357-11ef-962f-9cb6d0e16137/r2000_node-2*.log

It seems there is a problem at socket binding, but the provided IP address seems to be correct v4, so I do not understand the message "Address family not supported by protocol".

Any help will be highly appreciated,

Thanks,

Andreu

@ptruka
Copy link
Contributor

ptruka commented May 21, 2024

Hi,

I just tested it on my machine with nearly the same setup:

  • ROS1 noetic on ubuntu 20.04
  • Same driver branch (latest)
  • R2000 UHD (OMD30M…)

The r2000 yaml is also set to the scanners IP, transport protocol is set to udp. I tested 2 scenarios:

  • The port was set to 3000
  • The port was commented out

In both cases the driver started successfully.


I got the following Questions:

  1. Are there multiple network adapters on the Host machine? (If yes, are they in the same subnet?)
  2. Which Lidar model is used?

In order to get a better understanding what is happening on your side, I would add a print statement to find out what is going on:

In the file: src/pf_driver/src/pf/http_helpers/curl_resource.cpp

add the following into line #37
std::cout << "Request URL: " << url_ << std::endl; // Print the request URL

Like this:

void CurlResource::get(Json::Value& json_resp)
{
  std::cout << "Request URL: " << url_ << std::endl;  // Print the request URL
  request_.setOpt(curlpp::options::Url(url_));
  request_.perform();

  Json::Reader reader;
  reader.parse(response_, json_resp);
}

Then the you would need to build the driver again (catkin build). After the driver is launched, we will be able to see all the requests that are sent from the host to the scanner.

@andreucm
Copy link
Author

Hi Patrick,

thanks for the support.

I answer your questions:

  1. About network adapters, I copy the output of ifconfig. You will see that all the adapters are in separate subnets, and that I'm working on docker. Please recall I have correct ping with the lidar at 192.168.2.3
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:d0:a7:1e:66  txqueuelen 0  (Ethernet)

enx44ed57000019: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.2.100  netmask 255.255.255.0  broadcast 192.168.2.255
        inet6 fe80::8c7e:3819:699a:9886  prefixlen 64  scopeid 0x20<link>
        ether 44:ed:57:00:00:19  txqueuelen 1000  (Ethernet)

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)

tun0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST>  mtu 1500
        inet 10.8.0.14  netmask 255.255.255.255  destination 10.8.0.13
        inet6 fe80::262c:7e33:f1d7:f61c  prefixlen 64  scopeid 0x20<link>
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 100  (UNSPEC)

wlp58s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.71  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::ba5a:e301:fc5d:f3e4  prefixlen 64  scopeid 0x20<link>
        ether 9c:b6:d0:e1:61:37  txqueuelen 1000  (Ethernet)
  1. The lidar model is: OMD30M-R2000-B23-V1V1D-HD-1L (edited the first entry of this issue to also add this info)

  2. Print at line 37 of curl_resource.cpp:
    Nothing appears... According other print statements I added, I think the error appears before in run time, at line:

    socket_ = std::make_unique<udp::socket>(*io_service_, local_endpoint);
    .

Just before of that line, I have a print statement like

std::cout << "address, port: " << address << ", " << port << std::endl;

which results in runtime:

address, port: 192.168.2.3, 3000
terminate called after throwing an instance of 'boost::wrapexcept<boost::system::system_error>'
  what():  bind: Address family not supported by protocol
[r2000_node-2] process has died [pid 33311, exit code -6, cmd /home/beta/ros_ws/install/lib/pf_driver/ros_main __name:=r2000_node __log:=/home/beta/.ros/log/4677694e-1755-11ef-b1b9-9cb6d0e16137/r2000_node-2.log].
log file: /home/beta/.ros/log/4677694e-1755-11ef-b1b9-9cb6d0e16137/r2000_node-2*.log

So the error seems to appear before stablishing any connection, even if the statement

[ INFO] [/r2000_node]: Device found: R2000

appears just before the crash.

@andreucm
Copy link
Author

andreucm commented May 21, 2024

Hi Patrick!

Setting transport to "tcp" works ok ...

Since I guess data transfer through tcp adds some extra traffic. please, let me know why "udp" is the default transport setting and which implications may have to set it to "tcp"

THanks

Andreu

@ptruka
Copy link
Contributor

ptruka commented May 21, 2024

Hi,

I just added your print statement to compare the output.

This is my yaml:

R2000:
  transport: "udp"
  scanner_ip: "192.168.50.153"
  port: "3000"
  device: "R2000"
  start_angle: -1800000
  max_num_points_scan: 0
  packet_type: "C"
  scan_topic: "/scan"
  frame_id: "scanner_link"
  watchdogtimeout: 60000
  watchdog: true

My host IP is 192.168.50.174

Besides your print statement, I also added one for the local_endpoint:

UDPTransport::UDPTransport(std::string address, std::string port) : Transport(address, transport_type::udp)
{
  std::cout << "address, port " << address << ", " << port << std::endl;
  
  io_service_ = std::make_shared<boost::asio::io_service>();
  udp::endpoint local_endpoint = local_endpoint = udp::endpoint(udp::v4(), stoi(port));
  
  std::cout << "Local Endpoint IP, Port: " << local_endpoint.address().to_string() << ", " << local_endpoint.port() << std::endl;
  
  socket_ = std::make_unique<udp::socket>(*io_service_, local_endpoint);
  timer_ = std::make_shared<boost::asio::deadline_timer>(*io_service_.get());
}

The driver is starting, this is the log:


roslaunch pf_driver r2000.launch 
... logging to /home/ubuntu/.ros/log/5b76cdb4-175d-11ef-b750-c52d5417674f/roslaunch-ubuntu-7597.log
Checking log directory for disk usage. This may take a while.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://ubuntu:36695/

SUMMARY
========

PARAMETERS
 * /r2000_node/R2000/device: R2000
 * /r2000_node/R2000/frame_id: scanner_link
 * /r2000_node/R2000/max_num_points_scan: 0
 * /r2000_node/R2000/packet_type: C
 * /r2000_node/R2000/port: 3000
 * /r2000_node/R2000/scan_topic: /scan
 * /r2000_node/R2000/scanner_ip: 192.168.50.153
 * /r2000_node/R2000/start_angle: -1800000
 * /r2000_node/R2000/transport: udp
 * /r2000_node/R2000/watchdog: True
 * /r2000_node/R2000/watchdogtimeout: 60000
 * /r2000_node/device: R2000
 * /robot_description: <?xml version="1....
 * /rosdistro: noetic
 * /rosversion: 1.16.0

NODES
  /
    r2000_node (pf_driver/ros_main)
    robot_state_publisher (robot_state_publisher/robot_state_publisher)
    rviz (rviz/rviz)

auto-starting new master
process[master]: started with pid [7607]
ROS_MASTER_URI=http://localhost:11311

setting /run_id to 5b76cdb4-175d-11ef-b750-c52d5417674f
process[rosout-1]: started with pid [7617]
started core service [/rosout]
process[robot_state_publisher-2]: started with pid [7624]
process[r2000_node-3]: started with pid [7625]
process[rviz-4]: started with pid [7626]
Request URL: http://192.168.50.153/cmd/get_protocol_info
Request URL: http://192.168.50.153/cmd/get_parameter?list=device_family
Request URL: http://192.168.50.153/cmd/get_parameter?list=product
[ INFO] [1716287521.480984789]: Device found: R2000
address, port 192.168.50.153, 3000
Local Endpoint IP, Port: 0.0.0.0, 3000
Request URL: http://192.168.50.153/cmd/request_handle_udp?address=192.168.50.174&packet_type=C&port=3000
Request URL: http://192.168.50.153/cmd/set_scanoutput_config?handle=s39961980&max_num_points_scan=0&packet_type=C&skip_scans=0&start_angle=-1800000&watchdog=on&watchdogtimeout=60000
Request URL: http://192.168.50.153/cmd/get_parameter?list=angular_fov;radial_range_min;radial_range_max;scan_frequency
[ INFO] [1716287521.552085818]: Device state changed to Initialized
Request URL: http://192.168.50.153/cmd/start_scanoutput?handle=s39961980
[ INFO] [1716287521.571733776]: Device state changed to Running

If you could also post the whole log from console, we could compare the what is happening.
Remark: I see 3 "Requested URL" prints before the "address, port" print. This should be the same on your side.

@ptruka
Copy link
Contributor

ptruka commented May 21, 2024

Hi Patrick!

Setting transport to "tcp" works ok ...

Since I guess data transfer through tcp adds some extra traffic. please, let me know why "udp" is the default transport setting and which implications may have to set it to "tcp"

THanks

Andreu

Nice to hear that tcp is working. I would still like to find the issue that you are facing when using udp as protocol.
Of cause the tcp will add a little bit more traffic due to the protocol itself.
The protocol selected depends on the users application. If you want the data as quickly as possible and you are only interested in live data, then udp is the better choice. If the priority is more on the completeness of the data, tcp should be used, due to the completeness checking mechanisms etc. used in the protocol.

@andreucm
Copy link
Author

Hi
yes it is good to have tcp working because we can unblock our development, but as you point, we'd prefer to work with udp, so let's try to fix the thing...

The print statement of the Local Endpoint seems very suspicious:

[ INFO] [/r2000_node]: Device found: R2000
address, port: 192.168.2.3, 3000
Local Endpoint IP, Port: 19:9cbf:759:ed4f:800c:e41c:a57f:0%3180489729, 11801
terminate called after throwing an instance of 'boost::wrapexcept<boost::system::system_error>'
  what():  bind: Address family not supported by protocol
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-beta'
[r2000_node-2] process has died [pid 75951, exit code -6, cmd /home/beta/ros_ws/install/lib/pf_driver/ros_main __name:=r2000_node __log:=/home/beta/.ros/log/4677694e-1755-11ef-b1b9-9cb6d0e16137/r2000_node-2.log].
log file: /home/beta/.ros/log/4677694e-1755-11ef-b1b9-9cb6d0e16137/r2000_node-2*.log

@ptruka
Copy link
Contributor

ptruka commented May 21, 2024

it is really strange that the local enpoint is set to an IPv6 address. could you try the following:

replace this line in udp_transport.cpp:
udp::endpoint local_endpoint = local_endpoint = udp::endpoint(udp::v4(), stoi(port));
with
udp::endpoint local_endpoint = local_endpoint = udp::endpoint(boost::asio::ip::address_v4::any(), stoi(port));

@andreucm
Copy link
Author

Similar output with boost::asio::ip::address_v4::any() :

Request URL: http://192.168.2.3/cmd/get_protocol_info
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-beta'
Request URL: http://192.168.2.3/cmd/get_parameter?list=device_family
Request URL: http://192.168.2.3/cmd/get_parameter?list=product
[ INFO] [/r2000_node]: Device found: R2000
address, port: 192.168.2.3, 3000
Local Endpoint IP, Port: e6:bfb5:cfc5:66ec:803c:bc95:747f:0%892741889, 1682
terminate called after throwing an instance of 'boost::wrapexcept<boost::system::system_error>'
  what():  bind: Address family not supported by protocol
[r2000_node-2] process has died [pid 132670, exit code -6, cmd /home/beta/ros_ws/install/lib/pf_driver/ros_main __name:=r2000_node __log:=/home/beta/.ros/log/4677694e-1755-11ef-b1b9-9cb6d0e16137/r2000_node-2.log].
log file: /home/beta/.ros/log/4677694e-1755-11ef-b1b9-9cb6d0e16137/r2000_node-2*.log

@ptruka
Copy link
Contributor

ptruka commented May 21, 2024

Is it possible that the host machine is preferring IPv6 or using dualstack?

I would check the IPv6 settings and try to disable it if it is not

sysctl net.ipv6.conf.all.disable_ipv6
sysctl net.ipv6.conf.default.disable_ipv6
sysctl net.ipv6.conf.all.forwarding

@andreucm
Copy link
Author

My output:

beta@maupas:~$ $sysctl net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 0
beta@maupas:~$ $sysctl net.ipv6.conf.default.disable_ipv6
net.ipv6.conf.default.disable_ipv6 = 0
beta@maupas:~$ $sysctl net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 0

Seems everything related to ipv6 is set to false

@ptruka
Copy link
Contributor

ptruka commented May 21, 2024

I think this shows that it is enabled. Could you run those commands to disable ipv6? Afterwards I would reconnect the sensor and check the driver again. (To disable it persistently, refer to https://askubuntu.com/questions/1300253/disable-ipv6-on-ubuntu-20-04)

sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.all.forwarding=1

@andreucm
Copy link
Author

andreucm commented May 21, 2024

  1. Setting the three ipv6.conf params to 1
  2. Switch off/on interface
  3. run the pf_driver with the same error
  4. Check after running the pf_driver that ipv6.conf params are still to 1 (just in case)

@ptruka
Copy link
Contributor

ptruka commented May 23, 2024

Thank you for testing those settings.

From what we have seen the issue is coming from the local_endpoint. I will discuss this with @ipa-vsp to track down why you are receiving IPv6 address for your local endpoint.

@andreucm
Copy link
Author

ok!
thanks, let me know if you need some testing from our side.
Meanwhile we are working with tcp transport type.

thanks!

@ptruka
Copy link
Contributor

ptruka commented May 23, 2024

As @ipa-vsp is not available this week I would try this last idea:

UDPTransport::UDPTransport(std::string address, std::string port) : Transport(address, transport_type::udp)
{
  std::cout << "address, port " << address << ", " << port << std::endl;
  
  io_service_ = std::make_shared<boost::asio::io_service>();
  //udp::endpoint local_endpoint = local_endpoint = udp::endpoint(udp::v4(), stoi(port));
  
  boost::asio::ip::address_v4 ipv4_any = boost::asio::ip::address_v4::any();
  // alternatively force to use the 0.0.0.0 address instead of using the any() function:
  // boost::asio::ip::address_v4 ipv4_any(0x00000000); 
  std::cout << "IPv4 any address: " << ipv4_any.to_string() << std::endl;

  udp::endpoint local_endpoint = local_endpoint = udp::endpoint(ipv4_any, stoi(port));
  std::cout << "Local Endpoint IP, Port: " << local_endpoint.address().to_string() << ", " << local_endpoint.port() << std::endl;
  
  socket_ = std::make_unique<udp::socket>(*io_service_, local_endpoint);
  timer_ = std::make_shared<boost::asio::deadline_timer>(*io_service_.get());
}

With the definition of ipv4_any we should receive the 0.0.0.0 address for your local endpoint. I tested the code on my side, which is still working:

[...]
started core service [/rosout]
process[robot_state_publisher-2]: started with pid [4765]
process[r2000_node-3]: started with pid [4766]
process[rviz-4]: started with pid [4770]
Request URL: http://192.168.50.153/cmd/get_protocol_info
Request URL: http://192.168.50.153/cmd/get_parameter?list=device_family
Request URL: http://192.168.50.153/cmd/get_parameter?list=product
[ INFO] [1716461862.107834259]: Device found: R2000
address, port 192.168.50.153, 3000
IPv4 any address: 0.0.0.0
Local Endpoint IP, Port: 0.0.0.0, 3000
Request URL: http://192.168.50.153/cmd/request_handle_udp?address=192.168.50.174&packet_type=C&port=3000
Request URL: http://192.168.50.153/cmd/set_scanoutput_config?handle=s84661416&max_num_points_scan=0&packet_type=C&skip_scans=0&start_angle=-1800000&watchdog=on&watchdogtimeout=60000

If this still does not fix the issue, it would be helpful to provide the full log of the ros console.

Thanks in advance

@andreucm
Copy link
Author

andreucm commented May 23, 2024

Hi!
One bad news and one good news:

  1. The bad is your piece of code does not work. the reported output was similar.

  2. the good is that I've found the bug. The problem is in the line:

udp::endpoint local_endpoint = local_endpoint = udp::endpoint(ipv4_any, stoi(port));

which is something I felt weird ... and effectively it is risky ,since depending on what is executing first, things can be wrong. It is like : int a = a = int(67);

Split it better as:

udp::endpoint local_endpoint;
local_endpoint = udp::endpoint(ipv4_any, stoi(port));

or , second option, in one line:

udp::endpoint local_endpoint(ipv4_any, stoi(port));

and everything works !

You may come back to the initial version and just fix that line.

Please let me know when you fix it to the main branch.

Thanks for the support

carlosjoserg added a commit to beta-robots/pf_lidar_ros_driver that referenced this issue May 29, 2024
@carlosjoserg carlosjoserg linked a pull request May 29, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants