Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Speed up scoring in ForecastAnomalyModel #2673

Open
mcapuccini opened this issue Feb 14, 2025 · 8 comments · May be fixed by #2709
Open

[QUESTION] Speed up scoring in ForecastAnomalyModel #2673

mcapuccini opened this issue Feb 14, 2025 · 8 comments · May be fixed by #2709
Labels
question Further information is requested

Comments

@mcapuccini
Copy link

mcapuccini commented Feb 14, 2025

Speed up scoring in ForecastAnomalyModel

Custom stride
Is there any reason for having a hard-coded stride (i.e. 1, code here) when running the historical_forecasts in ForecastingAnomalyModel? If the underlying model has a horizon of 10, I might as well set the stride to 10 to have 10x faster scoring processing.

@mcapuccini mcapuccini added question Further information is requested triage Issue waiting for triaging labels Feb 14, 2025
@dennisbader
Copy link
Collaborator

dennisbader commented Feb 14, 2025

Hi @mcapuccini, yes and no :)

Why Yes: It was the most robust parameter choice with the lowest implementation complexity.
Using a stride=1 and last_points_only=True returns a contiguous target series (a single TimeSeries, not a list of forecast TimeSeries). The values represent to the last predicted values from the historical forecasts.

That means that all values were subject to the same forecast uncertainties. This "consistency" can play a crucial role when fitting the anomaly scorers., since they are trained on the error (or other statics) between the forecasts and the actuals.

If we used a stride=horizon, we must set last_points_only=False. This gives us one contiguous target series, where the values represent all historical forecasts concatenated along the time axis.
This might result in for example having a "ramping" sensation in forecast errors (higher errors the further ahead the prediction) -> non consistent model uncertainty -> can have a negative effect on scorer fitting & prediction.

So, all in all, we chose to fix it for the beginning, to have a robust behavior and avoid some pitfalls (at the cost of efficiency).

Why No: As you say, using stride=horizon can drastically speed up the process, and I assume for many use cases this approach is still valid. We could think of adding support for this. WDYT @madtoinou?

@madtoinou
Copy link
Collaborator

Yeah, let's add it to the roadmap. To make our life maybe a bit easier, I would only support stride=horizon and not stride<horizon to avoid overlapping forecasts and have to arbitrary select one over the other. Then, we would have to extensively check that everything behave as expected, especially at the "junctions" and as you said, that we don't have too much drift along the horizon dimension (which is pretty much the cost of the time gain).

@mcapuccini
Copy link
Author

I see your point. With a windowed scorer though one could match the size of the window with the lookback plus the forecasting horizion. As I understand k-means scorer considers the forecasts for each window individually hence the accumulated forecasting error due an horizon >1 should be the same if the window matches lookback plus horizon. Does this make sense?

@dennisbader dennisbader removed the triage Issue waiting for triaging label Feb 15, 2025
@mcapuccini
Copy link
Author

@dennisbader I am trying out what you suggested but concatenating the historical forecast is also painfully slow:

def concatenate_predictions(
    historical_forecasts: List[List[TimeSeries]],
) -> List[TimeSeries]:
    # Figure out tot iterations
    total_iterations = sum(len(pred_list) - 1 for pred_list in historical_forecasts)

    concatenated_forecasts = []
    with tqdm(total=total_iterations, desc="Concatenating predictions") as pbar:
        for pred_list in historical_forecasts:
            concatenated = pred_list[0]
            for pred in pred_list[1:]:
                concatenated = concatenated.append(pred)
                pbar.update(1)
            concatenated_forecasts.append(concatenated)

    return concatenated_forecasts

Is there anything I can do to run this faster?

@dennisbader
Copy link
Collaborator

If your forecasts are contiguous in time the you can use darts.concatenate on the list of forecasts :)

@mcapuccini
Copy link
Author

Cool! I assume that the results from historical_forecasts are contiguous in time. You meant trying something like this right?

historical_forecasts = model.historical_forecasts(
      series=train_target_series,
      past_covariates=train_past_covariates,
      future_covariates=train_future_covariates,
      forecast_horizon=10,
      stride=10,
      last_points_only=False,
      retrain=False,
      verbose=True,
)                
historical_forecasts = [concatenate(ts_list) for ts_list in historical_forecasts]
scorer.fit_from_prediction(train_target_series, historical_forecasts)

@dennisbader
Copy link
Collaborator

Yes, exactly :) Yes, they will be contiguous with this config since forecast_horizon == stride.
(I hope using concatenate sped up things drastically?) 😉

@mcapuccini
Copy link
Author

Yes, it did :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants