Skip to content

"Bad response for 8 -> 17" bug related to use of serial #89

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
microbit-matt-hillsdon opened this issue Apr 28, 2021 · 13 comments
Open
Labels
bug Something isn't working serial

Comments

@microbit-matt-hillsdon
Copy link
Collaborator

microbit-matt-hillsdon commented Apr 28, 2021

  1. Change program to
while True:
    print("Hi")
  1. Connect and Flash
  2. Open Serial (V2 only, it's already open for V3)
  3. Refresh page
  4. Attempt to connect

image

or for V2

image

Only physically reconnecting the device seems to fix it.

If you disconnect (or in V2 close serial) before reloading the page then it works fine.

@microbit-carlos
Copy link
Contributor

@microbit-matt-hillsdon
Copy link
Collaborator Author

Does it replicate in https://armmbed.github.io/dapjs/examples/daplink-serial/web.html ?

If you get the device in the bad state in our editor then that example will fail to connect, but you can't trigger the bad state it in that example. Interestingly the example disconnects before starting serial read (which, based on the flickering USB icon in the tab, probably reconnects repeatedly).

Also confirmed that restarting Chrome doesn't fix the bad state.

@jaustin
Copy link

jaustin commented Apr 28, 2021

And un/replugging the micro:bit but independent of doing something to the browser does?

@microbit-matt-hillsdon
Copy link
Collaborator Author

Confusingly, the error we see here comes from an error handler, though I think it's error handling a quite similar scenario. The original error message is "Bad response for 17 -> 8".

Sequence of events so far as I can tell:

  1. DAPLink.connect sends DAP_SWJ_CLOCK (command 17) but gets a 255 (error) response for command 8 DAP_WRITE_ABORT. It throws an error at the discrepancy.
  2. The error handler for DAPLink connect calls clearAbort
  3. DAPLink.clearAbort sends DAP_WRITE_ABORT but gets a 0 (success) response for command 17. It throws an error at the discrepancy. This is the error we see. Due to this error it doesn't close the transport, which is the next step in the error handler.

@microbit-matt-hillsdon
Copy link
Collaborator Author

With the knock-on error code modified to swallow that error, I've also seen 17 -> 131 as the initial error. 131 is the serial read command. I'm pretty convinced nothing serial-related is happening on the browser end as that's all after the connect.

Using the DAPLink serial sample I can't reproduce it even if I remain connected to daplink throughout. I'll try a combination of stripping out code down and building that example up to try to figure out what makes the difference.

@microbit-matt-hillsdon
Copy link
Collaborator Author

OK, that didn't take long.

It reproduces in the demo if you set the delay passed to startSerialRead to 1 (as per V2 & V3) rather than the default of 100.

If I set the delay to 10 then I can only reproduce it sporadically.

So perhaps it's related to whether a serial read is in progress when the device is disconnected, the odds of which are significantly higher with 1ms delay.

@microbit-matt-hillsdon
Copy link
Collaborator Author

Potentially related discussion about the interval. It's not really clear from issues how happy folks were with the net result of the changes.

ARMmbed/dapjs#51
https://github.com/microbit-foundation/platform-software-issue-tracker/issues/355

We can't really guarantee shutting down serial before the user closes the tab (we can try in beforeunload, and it does sometimes seem to run in time, but not often enough to be worth it).

Ideally we'd find some way to reset whatever's left in a bad state on the device that we could run as part of the initial connection.

@microbit-matt-hillsdon
Copy link
Collaborator Author

microbit-matt-hillsdon commented Apr 29, 2021

It reproduces on V1 but only with 0254:

  • 0249 Does not reproduce
  • 0250 Does not reproduce
  • 0251 Does not reproduce
  • 0252 Does not reproduce
  • 0253 Does not reproduce
  • 0254 Reproduces

Testing was done with the V3 app and 5 cycles of the reproduction.

I've redone 0253 and 0254 with the modified serial sample and significantly more iterations and I'm now convinced that 0254 has introduces this issue.

Serial sample modified code (contents of listen function):

            const transport = new DAPjs.WebUSB(device);
            const target = new DAPjs.DAPLink(transport);

            target.on(DAPjs.DAPLink.EVENT_SERIAL_DATA, data => {
                // Ignore it for now.
            });

            await target.connect();
            await target.setSerialBaudrate(115200);

            resultsEl.innerHTML = `Listening...\n`;

            serialPromise = target.startSerialRead(1);

@microbit-matt-hillsdon
Copy link
Collaborator Author

microbit-matt-hillsdon commented Apr 30, 2021

It looks like this can be worked around with:

            window.addEventListener("beforeunload", () => {
                target.stopSerialRead();
                serialPromise.then(() => console.log("Wow!"));
            });

where serialPromise is the promise returned from startSerialRead (so far tested with a V1 device using 0254).

From quick testing this seems reliable in Chrome though I'm not sure there's any standards-basis to expect it to work.
Not tried any of this in Edge yet.
I'd much rather a workaround at at connection time rather than this (e.g. somehow reset DAPLink's messed up state).

My earlier (more rushed) testing of this didn't seem so positive, but that was in the app not the serial sample. I'll retry in the app.

@microbit-matt-hillsdon
Copy link
Collaborator Author

microbit-matt-hillsdon commented Apr 30, 2021

I can get away with calling "stopSerialInternal" but not "disconnect" (these are V3 app device.ts methods not DAPjs). Throttling the CPU doesn't seem to stop stopSerialInternal working so perhaps it's related to how many times the event loop is allowed to run before/during unload.

Unfortunately there's no event when a user decided to stay on the page so not clear when to restart serial if it was in progress. Some timeout after the page gets focus again? Nasty.

@microbit-matt-hillsdon
Copy link
Collaborator Author

Confirmed the same bug affects MakeCode (though I was using a Python script sending over serial).

microbit-matt-hillsdon added a commit that referenced this issue Apr 30, 2021
- Stop serial for hidden tabs to reduce chance of disconnect during
  serial which triggers bug.
- Best effort to stop serial before unload. No errors yet with this
  on MacOS Chrome but not tested elsewhere. Pretty nasty though!

#89
microbit-matt-hillsdon added a commit that referenced this issue Apr 30, 2021
- Stop serial for hidden tabs to reduce chance of disconnect during
  serial which triggers bug.
- Best effort to stop serial before unload. No errors yet with this
  on MacOS Chrome but not tested elsewhere. Pretty nasty though!

#89
@microbit-matt-hillsdon
Copy link
Collaborator Author

microbit-matt-hillsdon commented Apr 30, 2021

Confirmed the issue also exists on Windows 10 with Edge and Chrome.

This workaround seems to address the issue for me on MacOS and Windows.

There's no real guarantee that it'll stay working but I don't see that there's much choice but to work around it or abandon plans to make more use of serial than in V2. I guess the most likely risk is browsers further restricting access to the beforeunload hook but could also be implementation specific changes relating to how quickly the page closes.

I'll split it out from my other semi-related webusb testing work and create a PR.

microbit-matt-hillsdon added a commit that referenced this issue Apr 30, 2021
- Stop serial for hidden tabs to reduce chance of disconnect during
  serial which triggers bug.
- Best effort to stop serial before unload. No errors yet with this
  on MacOS Chrome but not tested elsewhere. Pretty nasty though!

#89
microbit-matt-hillsdon added a commit that referenced this issue Apr 30, 2021
- Stop serial for hidden tabs to reduce chance of disconnect during
  serial which triggers bug.
- Best effort to stop serial before unload. No errors yet with this
  on MacOS Chrome but not tested elsewhere. Pretty nasty though!

Ensure we wait for the serial polling loop in normal operation too.

#89
microbit-matt-hillsdon added a commit that referenced this issue Apr 30, 2021
- Stop serial for hidden tabs to reduce chance of disconnect during
  serial which triggers bug.
- Best effort to stop serial before unload. No errors yet with this
  on MacOS Chrome but not tested elsewhere. Pretty nasty though!

Ensure we wait for the serial polling loop in normal operation too.

#89
microbit-matt-hillsdon added a commit that referenced this issue May 4, 2021
- Stop serial for hidden tabs to reduce chance of disconnect during
  serial which triggers bug.
- Best effort to stop serial before unload. No errors yet with this
  on MacOS Chrome or Windows 10 Chrome/Edge. Clearly a hack
  though.

Ensure we wait for the serial polling loop in normal operation too.

#89
microbit-matt-hillsdon added a commit that referenced this issue May 4, 2021
- Stop serial for hidden tabs to reduce chance of disconnect during
  serial which triggers bug.
- Best effort to stop serial before unload. No errors yet with this
  on MacOS Chrome or Windows 10 Chrome/Edge. Clearly a hack
  though.

Ensure we wait for the serial polling loop in normal operation too.

#89
microbit-matt-hillsdon added a commit that referenced this issue Aug 25, 2021
Somewhat speculative fix as I can't get the visibilitychange event to trigger during flash anymore. It seems to occur at the end of flash. But given there's async work in the flash it makes sense that it can occur.

I've added logging for the visibility change work.

I think we'd like to drop all this if we could fix the underlying cause of #89.

Fixes #223
@pdg137
Copy link

pdg137 commented Oct 27, 2021

Is the workaround here what causes the microbit program and serial console to constantly reset when switching tabs? This behavior is quite annoying!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working serial
Projects
None yet
Development

No branches or pull requests

4 participants