Skip to content

ghcide hangs during typechecking on a single-core machine or setting -j1 #727

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
expipiplus1 opened this issue Nov 17, 2020 · 9 comments
Closed
Labels
component: ghcide performance Issues about memory consumption, responsiveness, etc. type: bug Something isn't right: doesn't work as intended, documentation is missing/outdated, etc..

Comments

@expipiplus1
Copy link
Contributor

Copied from #573 as this happens with just the ghcide command.

I am bumping into this on NixOS with HLS 0.6.0 8.8.4, or ghcide 0.5.0 8.8.4

--verbose doesn't say much interesting aside from [DEBUG] Set files of interest to: [(NormalizedFilePath "/home/alice/Foo.hs". OnDisk)] after Type checking the files`

Here is a reproducer in the form of a nixos test:

import <nixpkgs/nixos/tests/make-test-python.nix> ({ pkgs, ... }: {
  name = "test";
  machine = { ... }: {
    imports = [ <nixpkgs/nixos/tests/common/user-account.nix> ];
    environment.systemPackages =
      [ pkgs.vim pkgs.ghc pkgs.haskellPackages.ghcide ];
  };
  testScript = ''
    # Login
    machine.wait_for_unit("multi-user.target")

    # Create a hie.yaml file and Foo.hs (this happens with and without the yaml file)
    machine.succeed('echo "cradle:\n  direct:\n    arguments: []" > hie.yaml')
    machine.succeed('echo "module Foo where\nfoo = ()" > Foo.hs')

    # Run ghcide
    machine.succeed("ghcide")

    # Record for posterity
    machine.screenshot("ghcide-output")
  '';
})

Either

  • Run with nix-build repro.nix and observe the hang
  • Start the VM to experiment with $(nb tests/vim.nix -A driverInteractive)/bin/nixos-run-vms
    • logging in with alice/foobar

I suspect that it could be something about the minimality of the environment which causes this?

strace tells me that it's hung in FUTEX_WAIT_PRIVATE, but that's not really surprising

@expipiplus1 expipiplus1 changed the title ghcide hangs during typechecking in a very simple case ghcide hangs during typechecking on a single-core machine Nov 17, 2020
@expipiplus1
Copy link
Contributor Author

If I set virtualisation.cores = 2 then there is no hang, looks like this is a problem just for single core machines.

@pepeiborra
Copy link
Collaborator

I recently discovered this independently. It happens also with +RTS -N1.

It is caused by the Shake thread pool being bound by the number of capabilities. Ghcide runs a "daemon"-like thread to process Shake actions from a TQueue, and a main thread to do the typchecking. If the thread pool is of size 1, the daemon thread will be unable to progress and the typechecking thread will be unable to run.

In general a threadpool should never be of size 1, so I would say this needs to be fixed in Shake. /cc @ndmitchell

@ndmitchell
Copy link
Collaborator

shakeThreads controls the number of threads in the Shake thread pool, and also threads should never mutually depend on each other in a dead-locking way since "normal" Shake programs shouldn't be communicating via TQueue. I suggest that you increase shakeThreads by 1 more than the number of threads you really want, which will ensure its at least 2.

@pepeiborra
Copy link
Collaborator

pepeiborra commented Nov 17, 2020

Cool, so we need to set shakeThreads to if numCapabilities == 1 then 2 else 0 or simply numCapabilities + 1.
Joe, will you test and send a PR?

@ndmitchell
Copy link
Collaborator

I'd go for numCapabilities + 1 - if you're taking control of shakeThreads and want it at a precise value better to make sure there's no chance Shake infers a different number of threads to you. At one point Shake tried to infer the number of processors you had if it looked like you weren't on the threaded runtime, which would mess up any implicitness.

@expipiplus1
Copy link
Contributor Author

expipiplus1 commented Nov 18, 2020

Why the + 1, @ndmitchell? (instead of max 2 numCapabilities)

so

shakeThreads = max 2 (fromMaybe numCapabilities (optThreads options))

@ndmitchell
Copy link
Collaborator

If Ghcide runs a mostly-blocked thread then + 1 is appropriate. If it's a mostly-busy thread then max 2 ... is the right thing to do. The code has changed so much since I was last deeply involved in it I've no idea which it is!

@pepeiborra pepeiborra transferred this issue from haskell/ghcide Dec 30, 2020
@jneira jneira added component: ghcide type: bug Something isn't right: doesn't work as intended, documentation is missing/outdated, etc.. performance Issues about memory consumption, responsiveness, etc. labels Dec 30, 2020
@jneira
Copy link
Member

jneira commented Apr 1, 2021

@michaelpj from #1624:

Can we check the number of capabilities and give an error if it's 1?

@pepeiborra pepeiborra mentioned this issue May 13, 2021
@jneira jneira changed the title ghcide hangs during typechecking on a single-core machine ghcide hangs during typechecking on a single-core machine or setting -j1 Dec 9, 2021
@pepeiborra
Copy link
Collaborator

I dont think that this problem exists anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: ghcide performance Issues about memory consumption, responsiveness, etc. type: bug Something isn't right: doesn't work as intended, documentation is missing/outdated, etc..
Projects
None yet
Development

No branches or pull requests

4 participants