Ensembles

Q: What are the 3 approaches of ensemble neural networks?

1. Concatenation Ensemble2. Average Ensemble3. Weighted Ensemble

Q: What are the different types of ensemble methods in machine learning?

1. BAGGing, or Bootstrap AGGregating2. Random Forest Models

What are ensembles?

An ensemble is a machine learning model that combines the predictions from two or more models.

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models, but typically allows for much more flexible structure to exist among those alternatives.

ensembles — Source: Corporate Finance Institute

‍

What is an ensemble neural network?

The concept of ‘ensembling’ is common in machine learning. Different algorithms are built on this mechanism. Only to cite the most famous, random forests used to combine a variable number of decision trees. In general terms, an ensemble can be considered a learning technique where many models are joined to solve a problem. This is done because an ensemble tends to perform better than singles improving the generalization ability.

The way an ensemble can be carried out doesn’t know any limit. The same is true for the number and the types of models considered. The habit to keep in mind is to choose components with low bias and high variance. This is possible simply choosing models with variegated structure and format.

‍

What are the 3 approaches of ensemble neural networks?

1. Concatenation Ensemble

The concatenation is the most common technique to merge different data sources. A concatenation ensemble receives different inputs, whatever their dimensions are, and concatenates them on a given axis. A drawback, of attaching information side by side, can be the dimensionality explosion. More the networks in the ensemble or greatest are the dense dimensionalities and bigger is the output of the concatenation. This operation can be dispersive, not allowing the final part of the network to learn important information or resulting in overfitting.

2. Average Ensemble

The averaging ensemble can be seen as the opposite of concatenation operation. The pooled outputs of our networks are passed through dense layers with a fixed number of neurons in order to equalize them. In this way, we can compute the average. The result is an output that shares the same dimensionality of each input but that is the average of them. As intuitively understandable, the drawback of this approach can be the loss of information caused by the nature of the average operation.

3. Weighted Ensemble

A weighted ensemble is a special form of average operation, where the tensor outputs are multiplied by a weight and then linearly combined. To each input, coming from our models is assigned a weight. These weights determine the contribution of every single model in the final result. The particularity is that these weights are not fixed and previously imposed but their values can be optimized during the training process. The only constraint is that the weights must sum up to 1, but this is easy to achieve applying a softmax on them. With this type of ensemble, we can also retrieve the contribution of each model simply extracting the ensemble weights.

‍

What are the different types of ensemble methods in machine learning?

1. BAGGing, or Bootstrap AGGregating

BAGGing gets its name because it combines Bootstrapping and Aggregation to form one ensemble model. Given a sample of data, multiple bootstrapped subsamples are pulled. A Decision Tree is formed on each of the bootstrapped subsamples. After each subsample Decision Tree has been formed, an algorithm is used to aggregate over the Decision Trees to form the most efficient predictor.

Given a Dataset, bootstrapped subsamples are pulled. A Decision Tree is formed on each bootstrapped sample. The results of each tree are aggregated to yield the strongest, most accurate predictor.

2. Random Forest Models

Random Forest Models can be thought of as BAGGing, with a slight tweak. When deciding where to split and how to make decisions, BAGGed Decision Trees have the full disposal of features to choose from. Therefore, although the bootstrapped samples may be slightly different, the data is largely going to break off at the same features throughout each model.

In contrary, Random Forest models decide where to split based on a random selection of features. Rather than splitting at similar features at each node throughout, Random Forest models implement a level of differentiation because each tree will split based on different features. This level of differentiation provides a greater ensemble to aggregate over, ergo producing a more accurate predictor. Refer to the image for a better understanding.

Similar to BAGGing, bootstrapped subsamples are pulled from a larger dataset. A decision tree is formed on each subsample. However, the decision tree is split on different features.

‍

What is ensemble learning?

Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. Ensemble learning is primarily used to improve the classification, prediction, and the function approximation performance of a model, or reduce the likelihood of an unfortunate selection of a poor one.

Other applications of ensemble learning include assigning a confidence to the decision made by the model, selecting optimal features, data fusion, incremental learning, nonstationary learning and error-correcting.

The models that contribute to the ensemble, referred to as ensemble members, may be the same type or different types and may or may not be trained on the same training data.

The predictions made by the ensemble members may be combined using statistics, such as the mode or mean, or by more sophisticated methods that learn how much to trust each member and under what conditions.

‍

What are the advantages of ensemble learning?

We explicitly use ensemble learning to seek better predictive performance, such as lower error on regression or high accuracy for classification. ... there is a way to improve model accuracy that is easier and more powerful than judicious algorithm selection: one can gather models into ensembles.

There are two main reasons to use an ensemble over a single model, and they are related; they are:

Performance

An ensemble can make better predictions and achieve better performance than any single contributing model.

Reducing the variance element of the prediction error improves predictive performance.

We explicitly use ensemble learning to seek better predictive performance, such as lower error on regression or high accuracy for classification.

This is the primary use of ensemble learning methods and the benefit demonstrated through the use of ensembles by the majority of winners of machine learning competitions, such as the Netflix prize and competitions on Kaggle.

Robustness

An ensemble reduces the spread or dispersion of the predictions and model performance.

On a predictive modeling project, typically multiple models or modelling pipelines are evaluated to find the one that performs best.

The algorithm or pipeline is then fit on all available data and used to make predictions on new data.

An average accuracy or error of a model is a summary of the expected performance, when in fact, some models performed better and some models performed worse on different subsets of the data.

The simplest ensemble is to fit the model multiple times on the training datasets and combine the predictions using a summary statistic, such as the mean for regression or the mode for classification. Importantly, each model needs to be slightly different due to the stochastic learning algorithm, difference in the composition of the training dataset, or differences in the model itself.

This will reduce the spread in the predictions made by the model. The mean performance will probably be about the same, although the worst- and best-case performance will be brought closer to the mean performance.

In effect, it smooths out the expected performance of the model.

We can refer to this as the “robustness” in the expected performance of the model and is a minimum benefit of using an ensemble method.

An ensemble may or may not improve modeling performance over any single contributing member, discussed more further, but at minimum, it should reduce the spread in the average performance of the model.