Application of Radial Basis Function Neural Networks for Reference Evapotranspiration Prediction

The present study investigates the potential of Radial Basis Function (RBF) neural networks for the prediction of reference evapotraspiration (ETo). The study utilizes daily climatic data of temperature, relative humidity, sunshine hours, wind speed, and rainfall for five years collected from Mosul meteorological station, north of Iraq. Thirteen RBF networks each using varied input combination of climatic variables have been trained and tested. The network output is compared with estimated daily Penman-Monteith ETo values. To evaluate the performance of RBF networks, the same networks in the studied cases were re-trained using the well-known feedforward-backpropagation (FFBP) networks. In addition, the effect of including a time index within the inputs of considered networks is investigated. The study shows that the RBF network is seen to emulate the FFBP in its performance and can be effectively used for ETo prediction. Besides, it is much easier to built and much faster to train. It is noticed that the networks’ output are very highly correlated to estimated ETo, especially when concerning all the climatic parameters. The study results reveal that adding a time index to the inputs highly improves the ETo prediction of the studied cases.

Accurate estimation of the reference crop evapotranspiration (ETo) is an important part of many studies such as hydrologic water balance, and water resources planning and management.Evapotranspiration can be either measured with a lysimeter or water balance approach, or estimated indirectly from climatic data.Most of the models that developed to predict ETo using the climatic data as an input are empirical in nature due to the difficulties of the nonlinear model structure identification and parameter estimation of the complex evapotranspiration process.The Food and Agricultural Organization of the United Nations (FAO) developed a practical procedure to estimate crop water requirements [1] which is widely accepted as a standard, especially in irrigation studies.
The methods used to estimate ETo vary from empirical relationships to complex methods based on physical processes such as Penman combination method [2], which links evaporation dynamics with the flux of net radiation and aerodynamic transport characteristics of a natural surface.Monteith [3] introduced a surface conductance term that accounted for the response of leaf stomata to its hydrologic environment.This modified form of the Penman equation is widely known as the Penman-Monteith (PM) evapotranspiration model.This modified method is ranked as the best method for all climatic conditions in a study conducted by [4] to analyze the performance of 20 different methods against lysimeter measured ET for 11 stations located in different climatic zones around the world.Moreover, the latest modification of this method was presented by the FAO [5].
The Artificial Neural Network (ANN) is a computing paradigm designed to mimic the human brain and nervous system [6].It is a mathematical structure, which is capable in representing arbitrarily complex nonlinear processes that relate the inputs and outputs of any system.ANN models have been used successfully to model complex nonlinear input/output time series relationships in a wide variety of fields.The high degree of empiricism and approximation in the analysis of hydrologic systems may find the use of ANN highly suitable [7].
In recent years, the Artificial Neural Networks (ANNs) have been successfully applied to the modeling and forecasting of hydrological processes.In the hydrological forecasting context, recent papers have reported that ANNs may offer a promising alternative for rainfall-runoff modeling [6]; streamflow prediction [8]; reservoir inflow forecasting [9]; prediction of water quality parameters [10]; estimation of reference evapotranspiration [11], and forecasting reference evapotranspiration [12].These applications utilize different types of neural networks, but one thing they have in common is that they give better results than the conventional models they are compared to.
There are several types of models available for ANN application, but the feedforward networks (FF) trained with the Back-propagation (BP) algorithm is the most prominent used for hydrologic modeling [13].Fernando and Jayawardena [14] reported that the Radial Basis Function (RBF) type network is found to perform better than (FF) network trained with BP algorithm.Park and Sandberg [15] proved that RBF networks with one hidden layer are capable of universal approximation.However, the application of RBF type neural networks to hydrological problems still rare, but recently it is getting more attention due to its advantages over FF networks.
This study aimed to investigate the potential of the RBF networks to: (1) Predict the daily ETo values using various climatic parameters with different combinations, (2) Compare the performance of RBF networks to that of the feedforward-backpropagation networks, and (3) To study the effect of introducing a time index as an additional input to the studied networks.
An ANN attempts to mimic, in a very simplified way, human mental and neural structure and functions.It can be characterized as massively parallel interconnections of simple neurons that function as a collective system.The network topology consists of a set of nodes (neurons) connected by links and usually organized in a number of layers.Each node in a layer receives and processes weighted input from previous layers and transmits its output to nodes in the following layer through links.The weighted summation of inputs to a node is converted to an output according to a transfer function.Each link is assigned a weight, which is a numerical estimate of the connection strength.The basic structure of an ANN usually consists of three layers: (1) the input layer, where the data are introduced to the network; (2) the hidden layer(s), where data are processed; and (3) the output layer, where the results of given input are produced.This type of network, where data flow is in one direction, is known as a feed-forward network [16].
The process of determining ANN weights is called learning or training and it is similar to calibration of a mathematical model.The ANNs are trained with a training set of input and known output data.At the beginning of training, the weights are initialized either with a set of random values or based on previous experience.Next, the weights are systematically changed by the learning algorithm such that, for a given input, the difference between the ANN output and the actual output is small.Many learning examples are repeatedly presented to the network, and the process is terminated when this difference is less than a specified value.At this stage, the ANN is considered trained.In the backpropagation algorithm, a set of inputs and outputs is selected from the training set and the network calculates the output based on the inputs.This output is subtracted from the actual output to find the output-layer error.The error is backpropagated through the network, and the weights are suitably adjusted.This process continues for the number of prescribed sweeps or until a prespecified error tolerance is reached.The mean square error over the training samples is the typical objective function to be minimized [17].
The back-propagation algorithm of a multi-layer feed-forward ANN is a gradient descent algorithm that may terminate at a local optimum, in addition to its long training time.This problem is overcome in Radial-Basis Function (RBF) networks by incorporating the nonlinearity in the activation functions of the nodes of the hidden layer [18].
A radial basis function (RBF) network as described by Fu [19] is a two-layer network (see Figure 3) whose output units form a linear combination on the basis (kernel) functions computed by the hidden units.The basis functions in the hidden layer produce a localized response to the input.That is, each hidden unit has a localized receptive field.The basis function can be viewed as the activation function in the hidden layer.The most common basis function chosen is a Gaussian function, in which case the activation level of hidden unit is calculated by where is the input vector, is the weight vector associated with hidden unit (i.e., the center of the Gaussian function), and 2 is the normalization factor.The outputs of the hidden unit lie between 0 and 1; the closer the input to the center of the Gaussian, the larger the response of the node.Because the node produces an identical output for inputs with equal distance from the center of the Gaussian, it is called a radial basis.
The activation level of an output unit is determined by where is the weight from hidden unit to output unit .The output units form a linear combination of the nonlinear basis functions, and thus the overall network performs a nonlinear transformation of the input.

Figure (3): A schematic diagram of RBF network
Learning in the RBF network can be divided into two stages: learning in the hidden layer, followed by learning in the output layer.Typically, learning in the hidden layer is performed using unsupervised methods (i.e., does not depend on teaching patterns) such as the -means clustering algorithm (clustering is concerned with grouping objects according to their similarity), while learning in the output layer uses supervised methods like the least mean square (LMS) algorithm.After the initial solution is found by this approach, a supervised learning algorithm (e.g., back-propagation) can be applied to both layers to finetune the parameters of the network, since the clustering algorithm does not guarantee an optimal set of parameters for the basis functions.
The normalization factor represents a measure of the spread of the data in the cluster associated with the hidden unit.It is commonly determined by the average distance between the cluster center and the training instances in that cluster.That is, for hidden unit where X is a training pattern in the cluster, C j is the center of the cluster associated with hidden unit , and is the number of training instances in that cluster [19].
The main difference between the RBF network and the feedforward back-propagation network is in their basis function.The radial basis function in the former network covers only small regions, whereas the sigmoid function assumes nonzero values over an infinitely large region of the input space.For some problems, sigmoid basis functions provide better results, but for others, radial basis functions are more advantageous [18].
For the purpose of this study, daily climatic data of minimum and maximum temperature ( o C), average relative humidity (%), wind speed (m/s), sunshine hours (hr), and rainfall (mm) for Mosul Meteorological station (36 o 9 1 latitude, 43 o 9 0 longitude and elevation of 222.6 m asl) was collected for five years (January 1, 1996 to December 31, 2000).
Daily ETo values were estimated using the Penman-Monteith (PM) method which is proposed as the sole standard method for the computation of reference evapotranspiration [12].Because of the unavailability of the lysimeter measured values, the estimated ETo values are considered as a standard and used for training and validation of different architectures of ANN.The total input data (1826 patterns) is divided into two parts, the data of four years (1460 patterns) is considered for training and the remaining one year data (366 patterns) is used for validation [17].
The neural network training can be made more efficient if certain preprocessing steps are performed on the network inputs and targets.It is often useful, before training, to scale the inputs and targets so that they always fall within a specified range.In the present study, the input and output data have been scaled to make it bounded in the intervals -1 and +1, which is preferable when tan-sigmoid activation function is used in the network [20].The standardization equation can be represented as: After simulation, all the output values are de-standardized by multiplying with the respective standardization factor to get actual ETo values.This step helps the neural network training to be more efficient [21].Both the FF-BP and RBF networks considered in this study are built and trained using the which is one of the (Release 12) package tools.
It is aimed here to study the potential of the RBF networks for the prediction of the daily ETo values using various climatic parameters.Several networks were built utilizing various combinations of these parameters as an input.The estimated daily ETo using the PM method considered as the output for all presented networks.These combinations are listed in Table (1) and Table (2) which included ١٣ various cases/combinations.These cases were gathered in groups according to the number of input parameters and the type of parameters to ease the performance evaluation process of the networks.Each climatic parameter in these cases is presented by an input node in the first layer of the neural network model.
The performance evaluation process included two statistical methods, i.e., correlation coefficient (R) and the mean square error (MSE) The MSE values are shown for both training and validation phases.The proper spread values and the maximum number of hidden nodes are estimated by trial and error method as there are no defined guidelines to assign their values.A massive number of trials were carried out using different combinations of spread and maximum hidden nodes number and the combinations which gave the best network performance are listed in Table (1).Maximum hidden nodes number.
The first group in the table included three cases which utilized the data of maximum temperature ( ), minimum temperature ( ), and the average temperature ( ) of MaxT and MinT.The results revealed that there are no major differences in the performance of the three networks, and the network which utilizes both the MaxT and MinT reveals the best performance compared to other two cases as shown in validation phase R and MSE values.In addition, it is seen that the number of nodes in the hidden layer increases with the increase of input parameters.
Cases 4, 5, and 6 investigated the effect of including the Sunshine hours ( ), wind speed ( ), and average humidity ( ) to the network inputs which already contained the MaxT and MinT.The network which contained the wind speed performed better than other networks which contained the Avh or the Sunh.Additionally, the network which contained the data of sunshine hours has a better performance than the network that contained the average humidity as shown in validation MSE and R values (MSE of 1.1535, 0.9821, and 0.4096 and R values of 0.9164, 0.9278, and 0.9700 for the networks containing average humidity, sunshine hours and wind speed, respectively).In contrast, this network required more hidden layer nodes to give its best results (31 nodes) as compared to 10 and 6 nodes needed for networks containing Sunh or Avh, respectively, which reveals the intricacy of this relation.
With regard to outcomes of case 7, it is not promoting to exploit the wind speed and the sunshine hours without incorporating the temperature data in the network as it is evident in validation MSE of 1.37 which is the highest among the other cases.
The other group consisted of cases 8, 9, and 10 which have the MaxT, MinT and Avh in common.Case 8 showed that including the sunshine hours enhanced the network performance (validation MSE is 0.95 compared to 1.1535 for case 6 with decrease of 17.64%), while including the wind speed instead of sunshine hours (case 9) resulted in a validation MSE decrease of 66.38% w.r.t case 6.However, the results showed that it is preferable to use both the wind speed and sunshine hours in the same network (i.e., case 10) which its performance was superior to the previous networks as shown in Figure (4a) (validation MSE decreased by 86.3% and the validation R values increased by 8% w.r.t case 6).Conversely, the number of the hidden layer nodes in case 10 was the highest comparing to the previous cases (excluding case 9) which is an evident to the complexity of the relation comparing to other cases.
To study the effect of including the daily rainfall data as an input to the network, three networks out of the 10 studied cases were selected as shown in Table (2).The performance of networks 11, 12, and 13 were compared to the performance of networks 2, 5, and 10, respectively.It is shown that the cases 11 and 12 have a slight decrease in validation MSE value (i.e., 2.08% and 0.39%) w.r.t cases 5 and 10, respectively, which is not a considerable improvement.On the other hand, the performance of case 11 showed a drawback as compared to the performance of case 2. The results reveal that there is no significant effect for including the rainfall data on the improvement of network performance.Therefore, it is not encouraging to employ the rainfall data in the relations of ETo prediction.Maximum hidden nodes number.
To evaluate the performance of RBF networks, the very same cases considered in Table (1) were taken into consideration using FF networks, and the results are showed in Table (3).Numerous trials have been carried out to estimate the number of epochs needed to train the FF networks and to obtain the best performance.It was noticed that using 50 epochs for the FF network training tend to result in best network performance.As the epoch number is fixed, it is tried to estimate the proper number of hidden nodes in each studied case which gives the minimum MSE and maximum R. For this task, and as described by [7], the training process is usually started with a small number of nodes then the number is increased gradually until no improvement in network performance due to increasing the nodes is noticed, then the final structure of the network is set.Hidden layer nodes number.
A comparison of FF network results (Table 3) to the results of RBF networks (Table 1) reveals that the performance of FF and RBF networks in all cases are very close.In addition, it is noticeable that the RBF networks needed much more hidden nodes than the FF networks.However, FF network shows slightly better performance both in the training and verification periods than the RBF Network, but the FF network needs relatively longer time to tune the training parameters and train the network.The results demonstrated that the RBF networks are emulative to the conventional FF networks in its performance and it is much easier to deal with.
In order to represent the evapotranspiration data as a time series, an additional input node representing the month number throughout the year has been incorporated in each of the input structures discussed previously and the networks are re-trained while keeping the spread and maximum number of nodes the same for all cases (Table 4).In all studied cases, there are noticeable improvements in both R and MSE values compared to the results of cases without time index.The cases may not be equally affected by adding the additional parameter but it is noticed that this effect is higher when more input parameters are included.The decrease in MSE values due to incorporating time index for validation phase is ranged between 24.84% (case 4) up to 83.6% (case 10) and validation R values increased in the range of 2.18% for case 4 up to 9.65% for case 7. The time index effect is illustrated in Figure ( 4) for case 10 (Table 4) as an example.The RBF network is employed here to predict the daily ETo values using various climatic data.The analysis of the results of the study signified that the RBF networks have the ability to be used for ETo prediction studies.It is proved that the BRF networks emulate the FF-BP networks in performance.On the other hand, RBF networks have the advantage of being easy to build and much faster to train.
The results showed that using both maximum and minimum temperature data is better than using the average temperature.Obvious improvement is also noticed when the wind speed data is included within the network inputs, which already contains the maximum and minimum temperature data.This would be helpful to predict ETo values when less climatic parameters are available.However, best results are obtained when all considered climatic parameters are included as the network inputs.In addition, no significant improvement on network performance is noticed when incorporating the rainfall as an input for network.
It is seen that there is a valuable effect for including a monthly time index to the inputs, which led to obvious improvement in all studied cases.Therefore, the use of a time index is highly recommended for future research works.
input values lying in the range of [-1, +1], and min X and max X are minimum and maximum input values, respectively.

Figure ( 4 )
Figure (4).Time series plots (a and c) and scatter plots (b and d) of validation phase (case 10) for the standard and predicted ETo values in mm/day, (a and b) without time index, (c and d) with time index.