Stats API

Silicone’s custom statistical operations.

silicone.stats.calc_all_emissions_correlations(emms_df, years, output_dir)[source]

Save csv files of the correlation coefficients and the rank correlation coefficients between emissions at specified times.

This function includes all undivided emissions (i.e. results recorded as Emissions|X) and CO2 emissions split once (i.e. Emissions|CO2|X). It does not include Kyoto gases. It will also save the average absolute value of the coefficients.

Parameters

emms_df (pyam.IamDataFrame) – The database to search for correlations between named values
output_dir (str) – The folder location to save the files.
years (list[int]) – The years upon which to calculate correlations.
created (Files) –
------------- –
"variable_counts.csv" (the number of scenario/model pairs where the emissions) –
occurs. (data) –
"gases_correlation_{year}.csv" (The Pearson's correlation between gases emissions) – in a given year.
"gases_rank_correlation_{year}.csv" (The Spearman's rank correlation between) –
year (gases in a given) –
"time_av_absolute_correlation_{}_to_{}.csv" (The magnitude of the Pearson's) –
emissions (correlation between) –
requested. (averaged over the years) –
"time_av_absolute_rank_correlation_{}_to_{}.csv" (The magnitude of the Spearman's) – rank correlation between emissions, averaged over the years requested.
"time_variance_rank_correlation_{}_to_{}.csv" (The variance over time in the rank) – correlation values above.

silicone.stats.calc_quantiles_of_data(distribution, points_to_quant, smoothing=None, weighting=None, to_quantile=True)[source]

Calculates the quantiles of points_to_quant in the distribution of values described by distribution. Optionally treats points_to_quant as quantiles and returns the values that would lead to them instead.

Parameters

distribution (pd.Series) – The distribution of values.
points_to_quant (pd.Series) – The points which we want find: if to_quantile is True (default) these are the values which we will compare to the distribution, if False, these are the quantiles which we want to find.
smoothing (float or string) – By default, no smoothing is done on the distribution. If a value is provided, it is fed into scipy.stats.gaussian_kde() - see full documentation there. In short, if a float is input, we fit a Gaussian kernel density estimator with that width to the points. If a string is used, it must be either “scott” or “silverman”, after those two methods of determining the best kernel bandwidth.
weighting (None or Series) – If a series, must have the same indices as distribution, giving the relative weights of each point.
to_quantile (Bool) – If True, we return the quantiles of the data in points_to_quant. If False, we instead treat points_to_quant as the quantiles themselves (they must all be 0-1) and return the values in distribution that occur at these quantiles.

Returns

An array with one row and a column for each entry in points_to_quant, containing the quantiles of these points in order. Or, if to_quantile is False, containing the values corresponding to the quantiles points_to_quant.

Return type

np.ndarray

silicone.stats.rolling_window_find_quantiles(xs, ys, quantiles, nwindows=11, decay_length_factor=1)[source]

Perform quantile analysis in the y-direction for x-weighted data.

Divides the x-axis into nwindows of equal length and weights data by how close they are to the center of these windows. Then returns the quantiles of this weighted data. Quantiles are defined so that the values returned are always equal to a y- value in the data - there is no interpolation. Extremal points are given their full weighting, meaning this will not agree with the np.quantiles under uniform weighting (which effectively gives 0 weight to min and max values).

The weighting of a point at \(x\) for a window centered at \(x_0\) is:

\[w = \frac{1}{1 + \left (\frac{x - x_0}{l_{window}} \times f_{dl} \right)^2}\]

for \(l_{window}\) the window width (range of values divided by nwindows -1) and \(f_{dl}\) the decay_length_factor.

Parameters

xs (np.ndarray, pd.Series) – The x co-ordinates to use in the regression.
ys (np.ndarray, pd.Series) – The y co-ordinates to use in the regression.
quantiles (list-like) – The quantiles to calculate in each window
nwindows (int) – How many points to evaluate between x_max and x_min. Must be > 1.
decay_length_factor (float) – gives the distance over which the weighting of the values falls to 1/4, relative to half the distance between window centres. Defaults to 1.

Returns

Quantile values at the window centres.

Return type

pd.DataFrame

Raises

AssertionError – xs and ys don’t have the same shape