You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have encountered the following abort on zeromq 4.1.3 + CentOS7:
#0 0x00007f285f4175d7 in raise () from /lib64/libc.so.6
#1 0x00007f285f418cc8 in abort () from /lib64/libc.so.6
#2 0x00007f286152d519 in zmq::zmq_abort (errmsg_=errmsg_@entry=0x7f285f55d4ff "Bad file descriptor") at src/err.cpp:84
#3 0x00007f286152d317 in zmq::epoll_t::rm_fd (this=0x24ceaa0, handle_=<optimized out>) at src/epoll.cpp:90
#4 0x00007f286152dfc9 in zmq::io_object_t::rm_fd (this=this@entry=0xe45e330, handle_=<optimized out>) at src/io_object.cpp:70
#5 0x00007f2861557a74 in zmq::tcp_connecter_t::process_term (this=0xe45e000, linger_=0) at src/tcp_connecter.cpp:103
#6 0x00007f286152e23c in zmq::io_thread_t::in_event (this=0x24d2410) at src/io_thread.cpp:83
#7 0x00007f286152d15e in zmq::epoll_t::loop (this=0x24ceaa0) at src/epoll.cpp:176
#8 0x00007f2861558696 in thread_routine (arg_=0x24ceb20) at src/thread.cpp:96
#9 0x00007f28619b5df5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f285f4d81ad in clone () from /lib64/libc.so.6
This happens when closing a socket REQ (which never successfully connected) after a recv timeout. It is a race condition that seems only to occur every once and a while. If I can find a way to reliably force it, I will respond with a test case.
I did a little digging into the problem and it appears that epoll_ctl will set errno to ENOENT when EPOLL_CTL_DEL is called for an fd that doesn't exist in the epoll set. So it would appear that we have some kind of race between adding the fd to the epoll set and removing it when the socket is closed. A quick look over the code seems like this shouldn't be possible, since this interaction is protected by a flag. I'm still investigating, and will reply back if I have any breakthroughs.
The text was updated successfully, but these errors were encountered:
So after some additional digging, I found that our application was using CURVE auth, but our version of zeromq was compiled without libsodium. After compiling it correctly, this issue appears to have disappeared.
I have encountered the following abort on zeromq 4.1.3 + CentOS7:
This happens when closing a socket REQ (which never successfully connected) after a recv timeout. It is a race condition that seems only to occur every once and a while. If I can find a way to reliably force it, I will respond with a test case.
I did a little digging into the problem and it appears that epoll_ctl will set errno to ENOENT when EPOLL_CTL_DEL is called for an fd that doesn't exist in the epoll set. So it would appear that we have some kind of race between adding the fd to the epoll set and removing it when the socket is closed. A quick look over the code seems like this shouldn't be possible, since this interaction is protected by a flag. I'm still investigating, and will reply back if I have any breakthroughs.
The text was updated successfully, but these errors were encountered: