2.5 Modelling and training – Final Master’s Project

This section is divided into two sub-sections in which the models that were built and the procedure used to train them are briefly described. The first of the subheadings explains the structures of the models used, while the second subheading explains the particularities of the training methodology used.

2.5.1 Modelling

As previously explained, the main elements of the artificial neural network models used are a CNN layer and an LSTM layer. In addition to this, an input layer and an output layer were used, which are in charge of supplying the models with the information of the vectors previously constituted. A more detailed explanation regarding the code used to carry out the procedure described in this sub-heading can be found in Annex. 4 - Modelling.

Since three different sizes of observations were defined to consider making a prediction, it was necessary to build three different model structures that would adapt to the dimensions of the different input vectors, the different structures can be observed in the Figure 17.

The first notable difference between the structures are the outputs of the input layers, this difference is due to the sample sizes if you have chosen to use 1, 2 or 3 observations to build the model. As can be seen, the size of the input layer output therefore modifies the size of the inputs and outputs of the CNN layer.

As previously mentioned, the variations in the second dimension in the outputs of the CNN layer can be explained by the different sizes of the input vectors. But as can be seen, the size of the third dimension of the output of this layer is the same in all the structures, 64, which indicates the number of filters chosen to use, one of the main parameters to consider when configuring these layers. The latter means that the observations corresponding to the 6 variables used were divided into 64 variables that allow the model a better understanding of the relationship between the variables.

Another aspect that was modified in the CNN layer of the structures was the activation function that by default is called ReLU (for its acronym in English, Rectified Linear Unit) was changed to Leaky ReLU because as explained in OmG (2021) , ReLU is a nonlinear activation function that generates zero for negative inputs, which can cause some neurons to stop learning if many of their inputs are negative, since their gradients will be zero.

Given what was previously explained and that some of the variables used in the input values have a high number of negative observations, as is the case of returns or the correlation of some of the series in certain periods of time, the use of the ReLU activation function did not seem like a good option. Therefore, it was decided to use Leaky Relu as activation function, which as explained in OmG (2021), this is a variant that allows a small constant gradient, non-zero, for negative inputs. This means that this activation function allows some neurons to continue learning from negative inputs.

In the Figure 18 the domain of the ReLU and Leaky ReLU function is observed, which will allow you a better understanding of what was previously exposed.

The CNN layer in all the structures is linked to an LSTM layer, which in all cases had 64 neurons. The output of this layer was linked to the output layer which returns a single value.

To conclude with the construction of the models, it was determined to use the mean square error (hereinafter MSE) as the function used to evaluate a candidate solution, the results of the model and the SGD optimizer (for its acronym in English, Stochastic Gradient Descent) with an Alpha of 0.0005.

2.5.2 Training

A more detailed explanation of the code used during the procedure described in this subsection can be found in Annex. 4 - Training.

The training of Machine Learning algorithms in the forecast of time series has its peculiarities to how models are trained with the aim of solving other types of problems. Therefore, in this sub-section the training methodology used is briefly covered, which is the so-called walk forward validation or advance validation.

As already mentioned, feedforward validation is a method used to evaluate machine learning models on time series data. This is because as explained by Brownlee (2019) it provides the most realistic evaluation of machine learning models on time series data. Traditional model evaluation methods from machine learning, such as k-fold cross-validation or splitting into training and validation data, do not work for time series data because they ignore the time components inherent in the problem. Walk-forward validation takes these temporal components into account and provides a more realistic assessment of how the model will perform when used operationally.

When evaluating a model, we are interested in how the model performs on data that was not used to train it. In machine learning, this is called unseen or out-of-sample data. Commonly, for the resolution of other problems, the data is divided into different subsets: training, testing and validation, whose objective is to train and validate the model. With the walk forward validation methodology, the data is divided by time periods and the model is trained and validated consecutively, which allows evaluating how the model understands the temporal dependence of the data.

By dividing the data by time periods, it allows us to evaluate the actual functioning of the model if it had been applied from the first period, as well as to analyse its behaviour throughout all the periods, observing whether its performance improves or not.

From what is stated in this sub-section, it is understood that the models were trained using the corresponding sample sets, passing all the available samples in a certain period of time before continuing with the next period. Obtaining as a result of the above a prediction corresponding to each period of time contemplated, with the exception of the first two that would be used to train the model for the first time, as seen in the following diagram of the Figure 19.