Random Forest Regressor parameters

Overview:

Random forest regression is an ensemble learning method that constructs multiple decision trees and combines their outputs to improve predictive performance and control overfitting. OtasML provides a range of configuration options to fine-tune random forest models for better performance. Below is a detailed guide to these parameters.

Configurations page:

The Configurations page allows users to adjust various parameters of the random forest regressor model. Here are the details:

N estimators

Default Value: 100
Description: It is a parameter specifies the number of individual decision trees (also known as estimators) that will be used to form the random forest ensemble. Each decision tree in the ensemble is trained on a different random subset of the training data.

Criterion

Default Value: squared_error
Description: It is a parameter specifies the criterion used to measure the quality of a split when constructing individual decision trees within the Random Forest ensemble for regression tasks.
- Options:
  - squared_error (or mse): This criterion uses the Mean Squared Error (MSE) as a measure of the quality of a split. The MSE quantifies the average squared difference between the predicted values and the actual target values for the samples in a node.
  - absolute_error (or mae): This criterion uses the Mean Absolute Error (MAE) as a measure of the quality of a split. The MAE quantifies the average absolute difference between the predicted values and the actual target values for the samples in a node.
  - friedman_mse: This criterion is an improvement of MSE, specifically designed for the Friedman's gradient boosting algorithm. It is not used as frequently as the other two criteria.
  - poisson: This criterion uses the Poisson deviance as a measure of the quality of a split. It is primarily used for regression problems where the target variable follows a Poisson distribution, such as count data.

Max depth

Default Value: None
Description: It is a parameter determines the maximum depth of each individual decision tree in the random forest ensemble.

Min samples split

Default Value: 2
Description: Its a parameter specifies the minimum number of samples required to split an internal node of an individual decision tree within the Random Forest ensemble.

Min samples leaf

Default Value: 1
Description: It is a parameter specifies the minimum number of samples required to be in a leaf node of an individual decision tree within the Random Forest ensemble.

Min weight fraction leaf

Default Value: 0.0
Description: It is a parameter specifies the minimum weighted fraction of the sum total of weights (of all the input samples) required to be present in a leaf node of an individual decision tree within the Random Forest ensemble.

Max features

Default Value: 1
Description: It is a parameter that controls the maximum number of features that are considered when looking for the best split at each node during the construction of individual decision trees within the Random Forest ensemble.

Max leaf nodes

Default Value: None
Description: It is a parameter that controls the maximum number of leaf nodes (terminal nodes) that each individual decision tree in the Random Forest ensemble is allowed to have.

Min impurity decrease

Default Value: 0.0
Description: It is a parameter that controls the minimum amount of impurity decrease that must be achieved to split an internal node during the construction of individual decision trees within the Random Forest ensemble.

Bootstrap

Default Value: True
Description: It is a parameter that controls whether or not bootstrap samples are used when building individual decision trees within the Random Forest ensemble.

OOB score

Default Value: False
Description: OOB score stands for Out-of-Bag score. It is a measure of a Random Forest model's performance on the samples that were not included in the bootstrap samples used for training individual decision trees. OOB score provides a way to estimate the model's performance without the need for a separate validation set.

N jobs

Default Value: None
Description: It is a parameter determines the number of CPU cores or threads to use when fitting (training) the Random Forest model in parallel. It allows you to speed up the training process by utilizing multiple cores or threads on your computer's CPU.
Warning: -1 means using all processors.

Random state

Default Value: None
Description: It is a parameter used to control the randomization applied during the training and fitting of the Random Forest model. It allows you to specify a seed for the random number generator, ensuring that the results are reproducible.

Verbose

Default Value: 0
Description: It is a parameter that controls the level of verbosity or the amount of information that is displayed during the training and fitting of the Random Forest model. It allows you to specify whether you want to see progress updates and information about the training process.

Warm start

Default Value: False
Description: It is a parameter is a boolean flag that allows you to control whether to reuse the existing solution (trees) from a previous call when calling it again. It can be useful when you want to incrementally train a Random Forest ensemble or when you want to add more trees to an existing model.

CCP alpha

Default Value: 0.0
Description: Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed.

Max samples

Default Value: None
Description: It is a parameter, and its purpose is to determine the number of samples drawn from the training data to train each base estimator (individual decision tree) when bootstrap sampling is enabled.
Warning: If bootstrap is True, the number of samples to draw from input to train each base estimator.

Test size

Default Value: 0.2
Description: The test size parameter is used when splitting the dataset into these subsets, and it specifies the portion of the data that will be used for testing.

Train size

Default Value: 0.8
Description: The train size parameter is used when splitting the dataset into these subsets, and it specifies the portion of the data that will be used for model training.

Conclusion

OtasML’s configuration options for random forest regressor models allow users to fine-tune their models to specific datasets and prediction tasks. Adjusting parameters such as the number of estimators, criterion, and maximum depth can significantly impact the model’s performance and behavior. Use this guide to fine-tune your random forest models and achieve superior predictive accuracy with OtasML.

OtasAI

Did You Know?

Random Forest Regressor parameters

Overview:

Configurations page:

Conclusion

Related articles

Suggested articles