Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve rebuild detection #232

Merged
merged 42 commits into from
Sep 22, 2022
Merged

fix: improve rebuild detection #232

merged 42 commits into from
Sep 22, 2022

Conversation

zinserjan
Copy link
Contributor

@zinserjan zinserjan commented Sep 8, 2022

This PR tries to resolve the issues in #231 and improves the algorithm when a project needs to be rebuild.

It's still a WIP and tries to solve the following problems:

  • input files detection / output files detection
  • detection of which projects needs a rebuild

The reason to rebuild a project within the monorepo depends on the following conditions:

  1. changed input files of the project (src files)
  2. expected output files doesn't exist, yet
  3. one of it's dependencies (recursive) needs a rebuild

1. input files of the project changed

The first problem I had was the correct detection of the input files. The main problem for me was the issue, that the default configuration is still in use, even when you set it explicitly via yarn.build in package.json.

That's why I refactored the whole input detection:

  1. no input configuration in package.json -> use default configuration / try to determine it via different mechanisms
    1.1 use default path ./
    1.2 check if default tsconfig.json or custom tsconfig exists and add include to the already existing input path config and use exclude to exclude these files from input cache detection
  2. explicit input configuration via package.json
    2.1 use the provided pattern, files and folders of the package.json input configuration
    2.2. if a custom tsconfig is expliclity configured via yarn.build/tsconfig in package.json, enhance the input config with files from include and use exclude to exclude (like 1.2)

2. expected output files doesn't exist, yet

Since #226 every output file, folder or pattern needs to exist, otherwise a rebuild is necessary. I don't remember what didn't work there, but it was quite a lot and lead to many issues for me.

That's why I had to improve this also:

  1. no output configuration in package.json -> use default configuration / try to determine it via different mechanisms
    1.1 theres no longer any default path, because this doesn't makes any sense
    1.2 check if one of the following fields exist in package.json and add it as output path: bin, directories.bin, files, main
    1.3 check if default tsconfig.json or custom tsconfig exists and add outDir to the already existing output path config
    1.4 fallback: if we didn't find any paths yet, use build as default
  2. explicit output configuration via package.json
    2.1 use the provided pattern, files and folders of the package.json output configuration
    2.2 if a custom tsconfig is expliclity configured via yarn.build/tsconfig in package.json enhance the output config with outDir

3. one of it's dependencies (recursive) needs a rebuild

We have to make sure that the right projects are marked for rebuild, when we need to build something. That's why I modified the build state of yarn.build.json:

  1. Property haveCheckedForRerun is no longer necessary. To be honest I didn't understand the reason for this variable, because the we have to check every project anyway and the local state is already cached in checkIfRunIsRequiredCache.
  2. Property rerun is set to true, whenever a build is open/necessary for this project.
    2.1 successful builds sets rerun to false, failed builds set it to true again
    2.2 when we build only a single project, we need to set rerurn to true for all dependent projects (recursively). Otherwise we can't catch the necessary build in the future, which was necessary due to one of the dependencies.
  3. Whenever we build a project we have to make sure that we determined the timestamps, otherwise we have to build it again the next time

Summary

  • changed input file detection: default config vs. explicit config (see above)
  • changed output detection: default config vs. explicit config (see above)
  • fix reading of tsconfig (support comments, inheritance, exclude affects only input)
  • support custom tsconfig filename
  • fix output existence check (handle files)
  • support glob as input & output
  • remove haveCheckedForRerun from yarn.build.json (no longer necessary)
  • mark every non-related project for this build for rerun, when this project was affected by others and therefore needs a build in the future (see above)
  • using a checksum based on last modified and size for each file instead of the latest last modified timestamp

This seems to work pretty good for my project and for this workspace, too. I've tested this on MacOS and Windows. But of course this needs some intensive testing :)

@zinserjan zinserjan marked this pull request as ready for review September 10, 2022 16:32
@zinserjan
Copy link
Contributor Author

@ojkelly I think I'm done. At least for now.

Regarding the reading of tsconfig. Initially, I tried to use the typescript package for reading tsconfig, but I wasn't able to mark typescript as an optional dependency. Either the build failed or the compiled plugin wasn't able to resolve typescript (maybe due to pnp).
Including typescript into the bundle increased the plugin from around 350KB to ~1MB. That's why I used get-tsconfig as an alternative way to read the tsconfig. The only downside is that the resolved output format differs from typescript. Typescript parses and resolves the tsconfig completely and transforms include and exclude into a single filenames property which contains all files to compile. With get-tsconfig we have to resolve these files ourselves, but that isn't really a problem due to the added support of globs.

@jeffrson can you give this a try? Just replace the existing version of yarn.build with one of the files of this PR in .yarn/plugins/@ojkelly.

@jeffrson
Copy link

@zinserjan So far this looks very promising - great work!
Two points after a couple of tests:

  • previously I had 8 slots for building (corresponding to 8 CPU cores), now there are 16. I'm not sure, why - taking hyperthreading into account? Does not seem faster - I think it's better to keep the number for cores
  • for some reason our sources for typescript are located inside src, but tsconfig.json contains "outDir": "." (I'm not happy with it, but we currently need it like this), that's why I had ""output": "." in yarn.build settings inside package.json. Unfortunately this seems to defeat detection of changes: project is not rebuilt anymore at all. AFAICT, it worked previously (which might be due to not so sophisticated change detection though). Do you think, this could work as well? It would be hard to maintain a list of subfolders of src in package.json

@zinserjan
Copy link
Contributor Author

zinserjan commented Sep 11, 2022

previously I had 8 slots for building (corresponding to 8 CPU cores), now there are 16. I'm not sure, why - taking hyperthreading into account? Does not seem faster - I think it's better to keep the number for cores

Nothing changed here in this PR. That was introduced in #224, but isn't released, yet.

for some reason our sources for typescript are located inside src, but tsconfig.json contains "outDir": "." (I'm not happy with it, but we currently need it like this), that's why I had ""output": "." in yarn.build settings inside package.json. Unfortunately this seems to defeat detection of changes: project is not rebuilt anymore at all. AFAICT, it worked previously (which might be due to not so sophisticated change detection though). Do you think, this could work as well? It would be hard to maintain a list of subfolders of src in package.json

I guess it rebuilts before due to the changed timestamps (the latest generated file triggered the rebuilt). The problem here is that I reduce the input files with the output files, but this is of course a problem when input and output are the same. I don't think that we can handle this case automatically. Ignoring output when it's the same as input means that we cannot cache, using both doesn't rebuild.
I think the best solution is a proper configuration of your input and output. Add the following to your package json:

  "yarn.build": {
    "input": ["src", "!**/*.js", "!**/*.d.ts"]
    "output": ["src/**/*.js", "src/**/*.d.ts"]
  },

Note: add the d.ts patterns only if they exists, otherwise you have no caching, cause every output pattern must exist.

@ojkelly ojkelly self-assigned this Sep 12, 2022
Copy link
Owner

@ojkelly ojkelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @zinserjan, it's great.

The main problem for me was the issue, that the default configuration is still in use, even when you set it explicitly via yarn.build in package.json.

Yep definitely a bug, thanks for reworking.

Since #226 every output file, folder or pattern needs to exist, otherwise a rebuild is necessary. I don't remember what didn't work there, but it was quite a lot and lead to many issues for me.

Sorry for the work this caused, there's been a tension between nice zero-config defaults, and not being too opinionated such that the tool can be flexible.

  1. no output configuration in package.json -> use default configuration / try to determine it via different mechanisms
  2. explicit output configuration via package.json

Yep I think this approach is good. If you add config, you're opting out of any defaults(?).

I've added some code suggestions around this, as I noticed some packages I have forever rebuilt — as they don't produce any output. The suggestions are used with explicitly setting the input in package.json#yarn.build.

Are you able to test if they still work they way you're expecting too?

Property haveCheckedForRerun is no longer necessary. To be honest I didn't understand the reason for this variable

I think it's legacy cruft mostly, happy to see it gone.


With get-tsconfig we have to resolve these files ourselves, but that isn't really a problem due to the added support of globs.

Sounds good to me.


previously I had 8 slots for building (corresponding to 8 CPU cores), now there are 16. I'm not sure, why - taking hyperthreading into account?

@jeffrson the previous config accidentally set it to 8 irrespective of the actual core count. So on CI boxes with 2 CPU's it was slowing dramatically from the context switching on the CPU trying to compile 8 things at once.

This is worth it's own issue, I think we might need to do something like this https://gist.github.com/brandon93s/a46fb07b0dd589dc34e987c33d775679, to get the physical CPU count, instead of the logical count (which includes each hyperthread as a CPU).

That was introduced in #224, but isn't released, yet.

Yep, we'll get it released with this.


for some reason our sources for typescript are located inside src, but tsconfig.json contains "outDir": "." (I'm not happy with it, but we currently need it like this), that's why I had ""output": "." in yarn.build settings inside package.json.

think the best solution is a proper configuration of your input and output.

Yep this is a bit of a tricky one. Generally the principle we're aiming for follows yarn itself in wanting soundness.

The most sound approach is explicit definition of your input and output files. Previously, this was mostly just achievable with folder based separation, but with this PR the glob approach @zinserjan suggested is good.

Overall @zinserjan this is great, I'll do some more testing with the suggested changes. If you can as well, I think we can get this merged and released this week.

@zinserjan zinserjan requested a review from ojkelly September 12, 2022 20:48
@zinserjan
Copy link
Contributor Author

It looks good to me. My own tests are still working. I have created an example for the 'upload only' case.

Setting null as output skips the output checks and only rebuilds when files changed. Explicit input and output file configuration will no longer be touched from anything else.

Copy link
Owner

@ojkelly ojkelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for this @zinserjan. I've done a final set of testing, and it's looking good.

@ojkelly ojkelly merged commit e884a41 into ojkelly:main Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants