Ensemble methods in machine learning use multiple learning algorithms to obtain a better predictive performance than could be obtained from any of the individual learning algorithms separately. Ensembles aim to combine multiple hypotheses to form one better hypothesis.

We’ll look at two examples of this approach: boosting and, firstly, bagging.

Bagging is short for ‘bootstrap aggregating’. It is used to combine an ‘ensemble‘ of individually weak machine learning algorithms and aggregate the predictions we get from each of them to get the final prediction. As the name suggests, it consists of two parts: bootstrapping and aggregation.

In the diagram above, you can see that different subsets of training data are created using replacement sampling. Each of these subsets is then trained using different algorithms (bootstrapping). The results of these different algorithms are combined to create a single, and more accurate, result (aggregating).

The next section explores boosting.

Boosting, like bagging, seeks to convert multiple weaker learners to stronger ones. In other words, it aims to boost their performance. Unlike bagging, the successive data sets it uses are not randomly created; every new subset contains the elements that were (likely to be) misclassified by previous models.

Boosting starts off with a base classifier prepared on the training data. A second classifier is then created to focus on the instances in the training data that the first classifier got wrong. New classifiers continue to be added until a limit is reached in the number of models or accuracy (Brownlee, 2014).

Boosting requires a large amount of training data, which may not be practicable in some cases. This restriction can be overcome by using another boosting algorithm known as the AdaBoost. Initially, the boosting algorithm was developed for binary classification problems. Then boosting algorithms such as AdaBoost.M1 and AdaBoost.M2 were developed for multi-class cases.