Automated machine learning

fast_automl.automl.make_cv_regressors

def fast_automl.automl.make_cv_regressors() [source]

Returns:	cv_regressors : list List of default CV regresssors.

fast_automl.automl.make_cv_classifiers

def fast_automl.automl.make_cv_classifiers() [source]

Returns:	cv_classifiers : list List of default CV classifiers.

fast_automl.automl.AutoEstimator

class fast_automl.automl.AutoEstimator(cv_estimators=[], preprocessors=[], ensemble_method= 'auto', max_ensemble_size=50, n_ensembles=1, n_iter=10, n_jobs=None, verbose=False, cv=None, scoring=None) [source]

Parameters:

cv_estimators : list of CVEstimators, default=[]

If an empty list, a default list of CVEstimators will be created.

preprocessors : list, default=[]

List of preprocessing steps before data is fed to the cv_estimators.

ensemble_method : str, default='auto'

If 'rfe', the ensemble is created using recursive feature elimination. If 'stepwise', the ensemble is created using stepwise addition. If 'auto', the ensemble is the better of the RFE and stepwise ensemble methods.

max_ensemble_size : int, default=50

The maximum number of estimators to consider adding to the ensemble.

n_ensembles : int, default=1

Number of ensembles to create using different CV splits. These ensembles get equal votes in a meta-ensemble.

n_iter : int, default=10

Number of iterations to run randomized search for the CVEstimators.

n_jobs : int or None, default=None

Number of jobs to run in parallel.

verbose : bool, default=False

Controls the verbosity.

cv : int, cross-validation generator, or iterable, default=None

Scikit-learn style cv parameter.

scoring : str, callable, list, tuple, or dict, default=None

Scikit-learn style scoring parameter. By default, a regressor ensembles maximizes R-squared and a classifier ensemble maximizes ROC AUC.

Attributes:

best_estimator_ : estimator

Ensemble or meta-ensemble associated with the best CV score.

Methods

fit(self, X, y, sample_weight=None) [source]

Fit the model.

Parameters:

X : array-like of shape (n_samples, n_features)

Training data.

y : array-like of shape (n_samples,)

Target values.

sample_weight, array-like of shape (n_samples,), default=Noone :

Individual weights for each sample.

Returns:

self :

predict(self, X) [source]

Predict class labels for samples in X.

Parameters:	X : array-like of shape (n_samples, n_features) Samples.
Returns:	C : array of shape (n_samples,) Predicted class label for each sample.

predict_proba(self, X) [source]

Probability estimates.

Parameters:	X : array-like of shape (n_samples, n_features) Samples.
Returns:	T : array-like of shape (n_samples, n_classes) Probability of the sample for each classes on the model, ordered by `self.classes_`.

fast_automl.automl.AutoClassifier

Automatic classifier. Inherits from AutoEstimator.

Examples

from fast_automl.automl import AutoClassifier

from sklearn.datasets import load_digits
from sklearn.model_selection import cross_val_score, train_test_split

X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle=True, stratify=y)

clf = AutoClassifier(ensemble_method='stepwise', n_jobs=-1, verbose=True).fit(X_train, y_train)
print('CV score: {:.4f}'.format(cross_val_score(clf.best_estimator_, X_train, y_train).mean()))
print('Test score: {:.4f}'.format(clf.score(X_test, y_test)))

This runs for about 6-7 minutes and typically achieves a test accuracy of 96-99% and ROC AUC above .999.

fast_automl.automl.AutoRegressor

Automatic regressor. Inherits from AutoEstimator.

Examples

from fast_automl.automl import AutoRegressor

from sklearn.datasets import load_diabetes
from sklearn.model_selection import cross_val_score, train_test_split

X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle=True)

reg = AutoRegressor(n_jobs=-1, verbose=True).fit(X_train, y_train)
print('CV score: {:.4f}'.format(cross_val_score(reg.best_estimator_, X_train, y_train).mean()))
print('Test score: {:.4f}'.format(reg.score(X_test, y_test)))

This runs for about 30 seconds and typically achieves a test R-squared of .47-.53.