Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eager retry for short retry delays #105

Merged
merged 1 commit into from
Dec 13, 2023
Merged

Conversation

brandur
Copy link
Contributor

@brandur brandur commented Dec 10, 2023

An aspect of River that could currently be considered a bug is that
retry delays below five seconds are effectively ignored. This is because
when a job errors it always transitions to retryable state, and needs
to wait for the scheduler to run to transition it back to available,
and the scheduler run interval is five seconds.

This is in practice a problem because it means that the first retry in a
retry policy isn't actually one second as claimed in docs, etc., but
actually ~5 seconds, and it won't be obvious why.

Here, implement an "eager" retry system such that if we detect that the
retry delay is smaller than the scheduler's run interval, we place the
job immediately into an available state, allowing it to be worked
sooner.

To facilitate this we add a predicate on the "get available" query that
checks for scheduled_at <= now() so that errored jobs with short
retries can be available for a short time without being worked
immediately. Most of the time this should have no effect because this
really only applies to the first retry in any failure (the second is at
16 seconds, which is well above the scheduler's run interval).

@brandur brandur force-pushed the brandur-eager-job-retry branch from 9b0559c to 91ebc57 Compare December 10, 2023 21:20
@brandur brandur requested a review from bgentry December 10, 2023 21:27
Copy link
Contributor

@bgentry bgentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent improvement :shipit:

@@ -72,6 +72,7 @@ WITH locked_jobs AS (
WHERE
state = 'available'::river_job_state
AND queue = @queue::text
AND scheduled_at <= now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be totally fine from a query performance perspective because it's part of the compound fetching index and is already being used for sorting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Yeah I figured that the extra predicate is on average going to have very little work to do since the vast majority of rows that aren't ready yet will still be scheduled.

An aspect of River that could currently be considered a bug is that
retry delays below five seconds are effectively ignored. This is because
when a job errors it always transitions to `retryable` state, and needs
to wait for the scheduler to run to transition it back to `available`,
and the scheduler run interval is five seconds.

This is in practice a problem because it means that the first retry in a
retry policy isn't actually one second as claimed in docs, etc., but
actually ~5 seconds, and it won't be obvious why.

Here, implement an "eager" retry system such that if we detect that the
retry delay is smaller than the scheduler's run interval, we place the
job immediately into an `available` state, allowing it to be worked
sooner.

To facilitate this we add a predicate on the "get available" query that
checks for `scheduled_at <= now()` so that errored jobs with short
retries can be `available` for a short time without being worked
immediately. Most of the time this should have no effect because this
really only applies to the first retry in any failure (the second is at
16 seconds, which is well above the scheduler's run interval).
@brandur brandur force-pushed the brandur-eager-job-retry branch from 91ebc57 to 8d39a9e Compare December 13, 2023 02:12
@brandur
Copy link
Contributor Author

brandur commented Dec 13, 2023

Excellent. Thanks! Added a changelog and pulling this in.

@brandur brandur merged commit 1468298 into master Dec 13, 2023
@brandur brandur deleted the brandur-eager-job-retry branch December 13, 2023 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants