Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assert failed src/epoll.cpp:90 "Bad file descriptor" #1627

Closed
wcs1only opened this issue Oct 28, 2015 · 4 comments
Closed

Assert failed src/epoll.cpp:90 "Bad file descriptor" #1627

wcs1only opened this issue Oct 28, 2015 · 4 comments

Comments

@wcs1only
Copy link
Contributor

wcs1only commented Oct 28, 2015

I have encountered the following abort on zeromq 4.1.3 + CentOS7:

#0  0x00007f285f4175d7 in raise () from /lib64/libc.so.6
#1  0x00007f285f418cc8 in abort () from /lib64/libc.so.6
#2  0x00007f286152d519 in zmq::zmq_abort (errmsg_=errmsg_@entry=0x7f285f55d4ff "Bad file descriptor") at src/err.cpp:84
#3  0x00007f286152d317 in zmq::epoll_t::rm_fd (this=0x24ceaa0, handle_=<optimized out>) at src/epoll.cpp:90
#4  0x00007f286152dfc9 in zmq::io_object_t::rm_fd (this=this@entry=0xe45e330, handle_=<optimized out>) at src/io_object.cpp:70
#5  0x00007f2861557a74 in zmq::tcp_connecter_t::process_term (this=0xe45e000, linger_=0) at src/tcp_connecter.cpp:103
#6  0x00007f286152e23c in zmq::io_thread_t::in_event (this=0x24d2410) at src/io_thread.cpp:83
#7  0x00007f286152d15e in zmq::epoll_t::loop (this=0x24ceaa0) at src/epoll.cpp:176
#8  0x00007f2861558696 in thread_routine (arg_=0x24ceb20) at src/thread.cpp:96
#9  0x00007f28619b5df5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f285f4d81ad in clone () from /lib64/libc.so.6

This happens when closing a socket REQ (which never successfully connected) after a recv timeout. It is a race condition that seems only to occur every once and a while. If I can find a way to reliably force it, I will respond with a test case.

I did a little digging into the problem and it appears that epoll_ctl will set errno to ENOENT when EPOLL_CTL_DEL is called for an fd that doesn't exist in the epoll set. So it would appear that we have some kind of race between adding the fd to the epoll set and removing it when the socket is closed. A quick look over the code seems like this shouldn't be possible, since this interaction is protected by a flag. I'm still investigating, and will reply back if I have any breakthroughs.

@wcs1only
Copy link
Contributor Author

So after some additional digging, I found that our application was using CURVE auth, but our version of zeromq was compiled without libsodium. After compiling it correctly, this issue appears to have disappeared.

@hintjens
Copy link
Member

This combination should still not crash libzmq. Do you want to help fix it?
We'd just need a minimal test case then.

On Thu, Oct 29, 2015 at 6:53 PM, Charlie Stanley notifications@github.com
wrote:

Closed #1627 #1627.


Reply to this email directly or view it on GitHub
#1627 (comment).

@wcs1only
Copy link
Contributor Author

Ok, let me see if I can come up with one that triggers the issue.

@bluca
Copy link
Member

bluca commented Mar 24, 2018

No repro found, closing

@bluca bluca closed this as completed Mar 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants