Getting predicted distributions for y ~ Normal(mean=0, std=x) #329
-
I am hoping to use NGBoost to fit a conditional distribution to input features that should be linearly associated with the scale (standard deviation) of the noise in the observations. To this end, I wrote a test script that generates a simple dataset where the observations are sampled from a normal distribution with standard deviation proportional to the only feature, and mean zero. I used import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
import pandas as pd
from ngboost import NGBRegressor
from ngboost.distns import Normal
import scipy.stats as stats
"""
This script tests NGBoost's ability to fit a normal distribution
when the true distribution is a normal distribution with:
- scale (standard deviation) = x
- loc (mean) = 0
"""
N = 5000 # Number of data points
X_max = 10 # x ranges from 0 to X_max
q = .99 # Specified quantile
x = np.arange(start=0, stop=X_max, step=X_max/N)
scale = x # scale (standard deviation) as a function of x
# Note: scipy and numpy both use scale = standard deviation.
gaussian_noise = np.random.normal(0, scale, N)
target = gaussian_noise.reshape(-1, 1)
features = scale.reshape(-1,1) # the only feature is the actual standard deviation (scale)
# Fit NGBRegressor with specified base.
ngb = NGBRegressor(Dist=Normal,
n_estimators=500, # default is 500
learning_rate=0.01, # default is 0.01
tol=0.0001, # default is 0.0001
validation_fraction=0.1, # default is 0.1
Base=LinearRegression(), # comment out for default base.
)
ngb.fit(features, target)
pred_dists = ngb.pred_dist(features)
# Calculate the quantiles using the NGBoost-fitted distribution.
# Note: scipy and numpy both use scale = standard deviation.
ngb_high_q = stats.norm.ppf(q, loc=pred_dists.loc, scale=pred_dists.scale)
ngb_low_q = stats.norm.ppf(1-q, loc=pred_dists.loc, scale=pred_dists.scale)
# plot the target as a scatter plot
plt.scatter(x, target, s=20/np.sqrt(N), label='samples')
# Add to the plot a curve of ngb_high_q and ngb_low_q.
plt.plot(x, ngb_high_q, 'g-', label='NGB %.2f Quantile' % q)
plt.plot(x, ngb_low_q, 'g-', label='NGB %.2f Quantile' % (1-q))
# Add dashed lines for the actual quantiles using scipy ppf
plt.plot(x, stats.norm.ppf(q, loc=0, scale=scale), 'k--', label='actual')
plt.plot(x, stats.norm.ppf(1-q, loc=0, scale=scale), 'k--', label='_nolegend_')
# Add a textbox in the bottom left corner that says with mathematical formatting that "y sampled from N(0,x)"
plt.text(0.05, 0.05, r'$y \sim N(\mu=0,\sigma=x)$', transform=plt.gca().transAxes)
# Show legend
plt.legend()
plt.title('NGBoost w/ Base: %s' % str(ngb.Base))
plt.show() Instead of finding that p(y|x) = N(0,std=x), it finds that the standard deviation is some convex function of x. I've tried numerous changes to the above script to attempt to correct this: changing the feature and/or scale to x^2 or sqrt(x), changing the hyperparameters, turning on/off Can anyone share their insights into what is happening here? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
wow that is completely bizarre. If you're boosting with linear base learners the learned functions will be sums of linear functions and thus themselves linear (i.e. the boosting part shouldn't really do anything except implicit L1 regularization: see LARS paper and connection b/t lasso + boosting). So, yeah, I don't really understand how this is possible other than the output is not sigma(x) but instead sigma^2(x) or something like that |
Beta Was this translation helpful? Give feedback.
-
@alejandroschuler I see now that the class So here, with a linear base model, features would need to contain at least two columns - log(x) and x - to hope to capture y ~ N(0, std=x). When I do this with: # Need one feature for the linear log(std_dev) dependence
# because log(std_dev) is the NGBoost Normal distribution's scale parameter.
feature0 = np.log(np.maximum(scale, 0.1)).reshape(-1,1) # avoid log(0).
# Need another feature for the linear dependence of the location parameter (loc).
feature1 = scale.reshape(-1,1)
features = np.concatenate((feature0, feature1), axis=1) I suppose decision trees as the base model don't really care about whether This parameterization of |
Beta Was this translation helpful? Give feedback.
-
I really enjoyed this investigation, thank you! :) |
Beta Was this translation helpful? Give feedback.
@alejandroschuler I see now that the class
Normal
is parameterized by log(sigma) (aka log(scale) / log(std_dev)) and mean (aka loc). So when the linear base models are fit, they output distributionparams
that are linear, and these become linear log(sigma) and linear mean; in math, ax + b = log(sigma) -> sigma = exp(ax + b). The standard deviation ends up exponential while the mean has the correct linear dependence.So here, with a linear base model, features would need to contain at least two columns - log(x) and x - to hope to capture y ~ N(0, std=x).
When I do this with: