The base learning algorithms used by an ensemble could be of different types, for example the individual base learners could be Bayesian, Decision Trees, SVM, etc. in other cases, the base learners could be the same algorithm with different tuning parameters and training sets, for example in Random Forest ensemble, each base learner is a Decision Tree.
I’m going to discuss the five key ensemble techniques (Voting, Stacking, Bagging, Random Forest, and Boosting) and will attempt to represent them using a simple graphic.
Some terminology, before we enter the ensembles:
This uses the sheer power of democracy. To create a Voting Ensemble, train several base learners on the training set. Each of the base learners is allowed to make a prediction on the test set.
For classification prediction, take majority vote on the predictions of base learners. For regression prediction, use the average of the predictions from the base learners.
Creating a Stacked Ensemble:
Creating a Bagging Ensemble:
Creating a Random Forest:
The generalized error with Random Forest depends on two items:
AdaBoost is a popular boosting algorithm, and is implemented as below: