Model comparison tests

fast_automl.test.corrected_repeated_kfold_cv_test

def fast_automl.test.corrected_repeated_kfold_cv_test(estimators, X, y, repetitions=10, cv= 10, scoring=None, n_jobs=None) [source]

Performs pairwise corrected repeated k-fold cross-validation tests. See Bouckaert and Frank.

Parameters: estimators : list

List of (name, estimator) tuples.

X : array-like of shape (n_samples, n_features)

Features.

y : array-like of shape (n_samples, n_targets)

Targets.

repetitions : int, default=10

Number of cross-validation repetitions.

cv : int, cross-validation generator, or iterable, default=10

Scikit-learn style cv parameter.

scoring : str, callable, list, tuple, or dict, default=None

Scikit-learn style scoring parameter.

n_jobs : int, default=None

Number of jobs to run in parallel.

Returns: results_df : pd.DataFrame

Examples

from fast_automl.test import corrected_repeated_kfold_cv_test

from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import Ridge
from sklearn.svm import SVR

X, y = load_boston(return_X_y=True)
corrected_repeated_kfold_cv_test(
    [
        ('rf', RandomForestRegressor()),
        ('ridge', Ridge()),
        ('svm', SVR())
    ],
    X, y, n_jobs=-1
)

Out:

Estimator1 Estimator2  PerformanceDifference       Std     t-stat       p-value
        rf      ridge               0.165030  0.030266   5.452600  3.652601e-07
        rf        svm               0.670975  0.045753  14.665154  1.460994e-26
     ridge        svm               0.505945  0.045031  11.235469  2.258586e-19

fast_automl.test.r_by_k_cv_test

def fast_automl.test.r_by_k_cv_test(estimators, X, y, repetitions=5, cv=2, scoring=None, n_jobs=None) [source]

Performs pariwise RxK (usually 5x2) cross-validation tests. See here.

Parameters: estimators : list

List of (name, estimator) tuples.

X : array-like of shape (n_samples, n_features)

Features.

y : array-like of shape (n_samples, n_targets)

Targets.

repetitions : int, default=10

Number of cross-validation repetitions.

cv : int, cross-validation generator, or iterable, default=10

Scikit-learn style cv parameter.

scoring : str, callable, list, tuple, or dict, default=None

Scikit-learn style scoring parameter.

n_jobs : int, default=None

Number of jobs to run in parallel.

Returns: results_df : pd.DataFrame

Examples

from fast_automl.test import r_by_k_cv_test

from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import Ridge
from sklearn.svm import SVR

X, y = load_boston(return_X_y=True)
r_by_k_cv_test(
    [
        ('rf', RandomForestRegressor()),
        ('ridge', Ridge()),
        ('svm', SVR())
    ],
    X, y, n_jobs=-1
)

Out:

Estimator1 Estimator2  PerformanceDifference       Std     t-stat   p-value
        rf      ridge               0.143314  0.026026   5.506631  0.002701
        rf        svm               0.659547  0.035824  18.410644  0.000009
     ridge        svm               0.516233  0.021601  23.898480  0.000002