Model comparison tests
fast_automl.test.corrected_repeated_kfold_cv_test
def fast_automl.test.corrected_repeated_kfold_cv_test(estimators, X, y, repetitions=10, cv= 10, scoring=None, n_jobs=None) [source]
Performs pairwise corrected repeated k-fold cross-validation tests. See Bouckaert and Frank.
Parameters: | estimators : list
List of (name, estimator) tuples. X : array-like of shape (n_samples, n_features)Features. y : array-like of shape (n_samples, n_targets)Targets. repetitions : int, default=10Number of cross-validation repetitions. cv : int, cross-validation generator, or iterable, default=10Scikit-learn style cv parameter. scoring : str, callable, list, tuple, or dict, default=NoneScikit-learn style scoring parameter. n_jobs : int, default=NoneNumber of jobs to run in parallel. |
---|---|
Returns: | results_df : pd.DataFrame
|
Examples
from fast_automl.test import corrected_repeated_kfold_cv_test
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import Ridge
from sklearn.svm import SVR
X, y = load_boston(return_X_y=True)
corrected_repeated_kfold_cv_test(
[
('rf', RandomForestRegressor()),
('ridge', Ridge()),
('svm', SVR())
],
X, y, n_jobs=-1
)
Out:
Estimator1 Estimator2 PerformanceDifference Std t-stat p-value
rf ridge 0.165030 0.030266 5.452600 3.652601e-07
rf svm 0.670975 0.045753 14.665154 1.460994e-26
ridge svm 0.505945 0.045031 11.235469 2.258586e-19
fast_automl.test.r_by_k_cv_test
def fast_automl.test.r_by_k_cv_test(estimators, X, y, repetitions=5, cv=2, scoring=None, n_jobs=None) [source]
Performs pariwise RxK (usually 5x2) cross-validation tests. See here.
Parameters: | estimators : list
List of (name, estimator) tuples. X : array-like of shape (n_samples, n_features)Features. y : array-like of shape (n_samples, n_targets)Targets. repetitions : int, default=10Number of cross-validation repetitions. cv : int, cross-validation generator, or iterable, default=10Scikit-learn style cv parameter. scoring : str, callable, list, tuple, or dict, default=NoneScikit-learn style scoring parameter. n_jobs : int, default=NoneNumber of jobs to run in parallel. |
---|---|
Returns: | results_df : pd.DataFrame
|
Examples
from fast_automl.test import r_by_k_cv_test
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import Ridge
from sklearn.svm import SVR
X, y = load_boston(return_X_y=True)
r_by_k_cv_test(
[
('rf', RandomForestRegressor()),
('ridge', Ridge()),
('svm', SVR())
],
X, y, n_jobs=-1
)
Out:
Estimator1 Estimator2 PerformanceDifference Std t-stat p-value
rf ridge 0.143314 0.026026 5.506631 0.002701
rf svm 0.659547 0.035824 18.410644 0.000009
ridge svm 0.516233 0.021601 23.898480 0.000002