Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ASIO one shot lost notifications problem #2897

Merged
merged 1 commit into from
Oct 12, 2018

Conversation

dipinhora
Copy link
Contributor

Prior to this commit, the way the ASIO notifications worked on
linux epoll included an edge case/race condition that allowed
for a loss of ASIO events.

As part of testing wallaroo scalability, we encountered an issue
(WallarooLabs/wally#2547) where some of our TCP actors would
stop reading from the socket for no reason even though there
was data waiting on the socket. After much fighting, @SeanTAllen
inferred that the ASIO subsystem was losing notifications and
the TCP actor was never notified about the data waiting on the
socket.

This commit changes things so that going forward:

  • ASIO one shot subscriptions do not use edge triggered mode, but
    instead use level triggered mode
  • Always tries to resubscribe to both read and write depending on
    the state of the ev->readable and ev->writeable booleans
  • Only send exactly one notification to one shot subscribers
    for either reads or writes, until they reset the ev->readable
    and/or ev->writeable booleans and resubscribe for notifications
    again in case of spurious notifications from epoll due to the
    use of level triggered mode (even though one shot should ensure
    that we only get one notification).

These changes ensure that if there's data on the socket at the
time we (re)subscribe to reads, we will be told immediately by
epoll that there is a read notification.

Similarly, if the socket is writeable at the time we (re)subscribe
to writes, we will be told immediately by epoll that there is a
write notification.

Both of these ensure that the ASIO subsystem will always be told
by epoll when the socket is readable or writeable and avoid any
race conditions due to edge triggering and the state of the socket
at the time the subscription occurs.


This PR also supersedes PR #2830 for the ASIO re-subscription related logic.

Prior to this commit, the way the ASIO notifications worked on
linux epoll included an edge case/race condition that allowed
for a loss of ASIO events.

As part of testing wallaroo scalability, we encountered an issue
(WallarooLabs/wally#2547) where some of our TCP actors would
stop reading from the socket for no reason even though there
was data waiting on the socket. After much fighting, @SeanTAllen
inferred that the ASIO subsystem was losing notifications and
the TCP actor was never notified about the data waiting on the
socket.

This commit changes things so that going forward:

* ASIO one shot subscriptions do *not* use edge triggered mode, but
  instead use level triggered mode
* Always tries to resubscribe to both read and write depending on
  the state of the `ev->readable` and `ev->writeable` booleans
* Only send exactly one notification to one shot subscribers
  for either reads or writes, until they reset the `ev->readable`
  and/or `ev->writeable` booleans and resubscribe for notifications
  again in case of spurious notifications from epoll due to the
  use of level triggered mode (even though one shot should ensure
  that we only get one notification).

These changes ensure that if there's data on the socket at the
time we (re)subscribe to reads, we will be told immediately by
epoll that there is a read notification.

Similarly, if the socket is writeable at the time we (re)subscribe
to writes, we will be told immediately by epoll that there is a
write notification.

Both of these ensure that the ASIO subsystem will always be told
by epoll when the socket is readable or writeable and avoid any
race conditions due to edge triggering and the state of the socket
at the time the subscription occurs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants