-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Print processes that ended in "unexpected final state" #3426
Comments
grepping for |
I think we should tell people exactly what to grep for in this case (just the "ERROR" string I think?). But it's not super nice for people in the CI, since they have to download for example a 100 MB log artifact and grep locally.
As far as I can tell, the logs are written to a file, so stdout and stderr should be pointing to different destinations. Have you looked into this before? (I think you mentioned it before?) Is this a bug in shadow, or some weird GitLab thing? |
Here it is: Ah right, we only do it when we detect that I'm not sure whether it's worth fixing the terminal detection logic, since I'm not sure in general what the "right" answer is as to whether the job output is a terminal in this case. Maybe instead of trying to suppress the |
(I think I was remembering the original idea to detect when the two descriptors are writing to the same file description, but that there's no general way to detect that as far as we could tell) |
Actually, I think using |
Trying to detect whether stdout and stderr are terminals are tricky. In particular, in CI environments where job output is being logged but is also displayed in some web-pseudo-terminal it's difficult to say what the right behavior is. It's probably better to have consistent behavior here that doesn't quietly change under you based on what type of device we think stdout and stderr are. Effectively reverts 223aa4a. Fixes shadow#3426.
Trying to detect whether stdout and stderr are terminals are tricky. In particular, in CI environments where job output is being logged but is also displayed in some web-pseudo-terminal it's difficult to say what the right behavior is. It's probably better to have consistent behavior here that doesn't quietly change under you based on what type of device we think stdout and stderr are. Effectively reverts 223aa4a. Fixes shadow#3426.
Trying to detect whether stdout and stderr are terminals are tricky. In particular, in CI environments where job output is being logged but is also displayed in some web-pseudo-terminal it's difficult to say what the right behavior is. It's probably better to have consistent behavior here that doesn't quietly change under you based on what type of device we think stdout and stderr are. Effectively reverts 223aa4a. Progress on shadow#3426.
Trying to detect whether stdout and stderr are terminals and deduplicating based on that can be tricky, e.g. in CI environments.. Instead: * Make the `stderr` version more of a human-consumable message instead of the raw log line, which also makes the duplication easier to understand in the case where stdout and stderr are merged. * Disable this feature in our cmake tests, where stdout and and stderr are merged. New output on a process failure in the unmerged case (where stdout is written to a file and stderr is the terminal): ** Starting Shadow 3.2.0 Error: process 'testnode.test_sleep.1000' exited with status Normal(101); expected end state was exited: 0 but was exited: 101 Error: Failed to run the simulation Error: Error: Caused by: Error: 1 managed processes in unexpected final state ** Shadow did not complete successfully: Failed to run the simulation ** 1 managed processes in unexpected final state ** See the log for details The merged case is a little ugly, but at least it's harder to mistake the duplicated message on stderr for a second distinct log line: ... 00:00:00.016603 [507267:shadow-worker] 00:00:01.000000000 [DEBUG] [testnode:11.0.0.1] [process.rs:1486] [shadow_rs::host::process] process 'testnode.test_sleep.1000' has completed or is otherwise no longer running Error: process 'testnode.test_sleep.1000' exited with status Normal(101); expected end state was exited: 0 but was exited: 101 00:00:00.016660 [507267:shadow-worker] 00:00:01.000000000 [ERROR] [testnode:11.0.0.1] [process.rs:1571] [shadow_rs::host::process] process 'testnode.test_sleep.1000' exited with status Normal(101); expected end state was exited: 0 but was exited: 101 00:00:00.016710 [507267:shadow-worker] 00:00:01.000000000 [TRACE] [testnode:11.0.0.1] [process.rs:750] [shadow_rs::host::process] Not notifying parent of exit: no signal specified ... Progress on #3426.
When a shadow simulation ends and not all processes are in their expected final state, shadow prints:
to stderr, and expects the user to look through possibly hundreds of MBs of log messages to figure out which processes failed, and what state they were actually in.
Shadow should list the processes here (in stderr) and at the end of the log output.
The text was updated successfully, but these errors were encountered: