Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failure: !(pollset [i].revents & POLLNVAL) (poll.cpp:157) #2895

Closed
sigiesec opened this issue Jan 25, 2018 · 9 comments
Closed

Assertion failure: !(pollset [i].revents & POLLNVAL) (poll.cpp:157) #2895

sigiesec opened this issue Jan 25, 2018 · 9 comments

Comments

@sigiesec
Copy link
Member

sigiesec commented Jan 25, 2018

Issue description

An assertion failure occurs in the following line:

zmq_assert (!(pollset [i].revents & POLLNVAL));

The fd member of the offending entry in pollset in that case has the value -1 / retired_fd.

Apparently, this was already the case when poll was called, which caused poll to set the revents of the entry to POLLNVAL.

I found that it is possible to call add_fd with a fd_ parameter of retired_fd/-1, which is not checked, so I assume this has happened somehow, but I do not know via which call chain this happened.

Environment

  • libzmq version (commit hash if unreleased): 4.2.3 built with ZMQ_USE_POLL
  • OS: Windows

Minimal test code / Steps to reproduce the issue

Unfortunately, I have not been able to reproduce this issue. It occurs if some networking issues happen, such as firewall configurations between peers change, or the local network settings are reconfigured. I don't know yet what exactly triggers the issue, but it has occurred several times on different machines.

What's the actual result? (include assertion message & call stack if applicable)

Assertion failure: !(pollset [i].revents & POLLNVAL) within a zmq::poll_t::worker_routine thread.

What's the expected result?

No assertion failure, operation continues.

@sigiesec
Copy link
Member Author

sigiesec commented Jan 26, 2018

I investigated where an INVALID_SOCKET might come from and found the following:

@sigiesec
Copy link
Member Author

A similar situation is here:

handle = poller->add_fd (fd, this);

Maybe fd can never be retired_fd/-1 here for some reason, but then an assertion should be added here.

@sigiesec
Copy link
Member Author

I tried to create a large number of context, and one socket in each context. After about 4000 contexts/sockets, zmq_socket fails with an error code of ENOBUFS, which is out of the specified behaviour of zmq_socket. However, probably the same error might occur during make_fdpair. This is most probably caused by the global limit of ephemeral ports being exceeded (https://blogs.technet.microsoft.com/tristank/2008/03/11/maxuserport-what-it-is-what-it-does-when-its-important/). The appropriate reaction would be to retry after some time IMO.

@sigiesec
Copy link
Member Author

sigiesec commented Jan 31, 2018

With the following code I was able to trigger the problem zmq::signaler_t:

    std::vector<void *> contexts;
    std::vector<void *> sockets;

    const size_t count = 10000;

    for (size_t i = 0; i < count; ++i) {
        fprintf (stderr, ".");
        contexts.emplace_back (zmq_ctx_new ());

        if (contexts.back () == nullptr) {
            fprintf (
              stderr,
              "\ncontext creation failed after %i contexts, errno = %i\n",
              (int) i, errno);
            break;
        }

        void *bind_socket = zmq_socket (contexts.back (), ZMQ_DEALER);
        void *connect_socket = zmq_socket (contexts.back (), ZMQ_DEALER);
        if (bind_socket == nullptr || connect_socket == nullptr) {
            if (bind_socket != nullptr)
                zmq_close (bind_socket);
            fprintf (stderr,
                     "\nsocket creation failed after %i contexts, errno = %i\n",
                     (int) i, errno);
            continue;
        }

        int rc = zmq_bind (bind_socket, "inproc://test");
        if (rc != 0) {
            fprintf (stderr,
                     "\nsocket bind failed after %i contexts, errno = %i\n",
                     (int) i, errno);
            zmq_close (bind_socket);
            zmq_close (connect_socket);
            continue;
        }

        rc = zmq_connect (connect_socket, "inproc://test");
        if (rc != 0) {
            fprintf (stderr,
                     "\nsocket connect failed after %i contexts, errno = %i\n",
                     (int) i, errno);
            zmq_close (bind_socket);
            zmq_close (connect_socket);
            continue;
        }

        sockets.emplace_back (bind_socket);
        sockets.emplace_back (connect_socket);
    }

I then get an assertion failure here:

Assertion failed: Socket operation on non-socket (D:\Dev\libzmq\src\signaler.cpp:192)

i.e. here:

wsa_assert (nbytes != SOCKET_ERROR);

and the socket was retired_fd, so make_fdpair has failed in the ctor.

@sigiesec
Copy link
Member Author

On at least once attempt, I got a socket error WSAENOBUFS from this call:

rc = connect (*w_, (struct sockaddr *) &addr, sizeof addr);
with the following call stack:

 	libzmq-v141-mt-gd-4_2_4.dll!zmq::signaler_t::make_fdpair(unsigned __int64 * r_, unsigned __int64 * w_) Line 528	C++	Symbols loaded.
 	libzmq-v141-mt-gd-4_2_4.dll!zmq::signaler_t::signaler_t() Line 127	C++	Symbols loaded.
 	libzmq-v141-mt-gd-4_2_4.dll!zmq::mailbox_t::mailbox_t() Line 35	C++	Symbols loaded.
 	libzmq-v141-mt-gd-4_2_4.dll!zmq::reaper_t::reaper_t(zmq::ctx_t * ctx_, unsigned int tid_) Line 41	C++	Symbols loaded.
>	libzmq-v141-mt-gd-4_2_4.dll!zmq::ctx_t::create_socket(int type_) Line 344	C++	Symbols loaded.
 	libzmq-v141-mt-gd-4_2_4.dll!zmq_socket(void * ctx_, int type_) Line 267	C++	Symbols loaded.

This should either fail the whole zmq_socket call, and leave a consistent state, or retry (or both).

@sigiesec
Copy link
Member Author

@Crypto2
Copy link

Crypto2 commented Jun 9, 2018

I'm also having this same problem with 4.2.3 with ZMQ_USE_POLL

@bluca
Copy link
Member

bluca commented Jun 9, 2018

It's fixed in 4.2.5

@sigiesec
Copy link
Member Author

sigiesec commented Jun 9, 2018

@Crypto2 Note however that poller=poll is broken on Windows, and will no longer be offered as an option in master. Either use poller=select with 4.2.5, or poller=epoll with master, support for which was only recently added under Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants