Khassa Chay Stream Flow Forecasting by Markove Autoregressive AR. Model.(eng)

In this research a successfull analysis of the monthly mean of Khassa Chay stream flow record for the period (1941-2001) was adopted, then the Markove Autoregressive AR model was fitted to the resulted stochastic component of the seies, then suitable tests were done to detect the order of the model. Also another decision whether the parameter of this model should be constant or periodic was decided after testing the model. The parameters of this model were found and used to generate the monthly flow for 56 years ahead. It was concluded that Markove AR scheme is adequate to describe the structure of the monthly mean Khassa chay stream flow and the periodic type of this model is more realistic than the constant type because the high sensitivity to errors.


Introduction
Forecasting of seasonal stream flow provides many benefits' to society, by improving the ability to plan, adapt and changing water supplies.The development of stochastic models of hydrological phenomena play an important role in water resources engineering including their use to forecast river flows [3]. The choice of the right model for a given hydrological series is an important aspect of the modeling. Stochastic analysis and modeling of monthly river flows have attracted the attention of many researchers. Many researchers tend to prefer the analysis of daily stream flow rather than the monthly record. Early attempts of modeling hydrologic events were made by Young and Pisano who proposed a multisite monthly stream flow model, using Matalas algorithim in order to preserve the cross correlation between the stations [9]. Alsuhaily 1985 used stochastic analysis to the daily Tigris flow series at four stations, then the stochastic component was fitted by different models (AR,and ARIMA models) for each single site and also the multi site model (MATALAS) was applied to the mentioned four stations successfully [1] Ten candidate models of the Auto-Regressive Moving Average (ARMA) family are investigated for representing and forecasting monthly and ten-day streamflow in three Indian rivers by P. P. Mujumdar & D. Nageshkumar1990, the best models for forecasting and representation of data were selected by using the criteria of Minimum Mean Square Error (MMSE) and Maximum Likelihood (ML) respectively. The selected models were validated for significance of the residual mean, the periodicities in the residuals and of the correlation in the residuals [8]. Kadri Yurkli, Ahmet kuruncu, Fazli ozturk 2004 presented a methodology on modeling monthly data of Kelkit stream in which the watershed is located north Anatolia and the stream is formed together by joining small streams, Box-Jenkins or ARIMA(autoregressive-moving average) were used to simulate monthly data. Diagnostic checks were done for all the models selected from the autocorrelation function (ACF) and partial autocorrelation function (PACF) [10]. A. K. Mishra and V. R. Desai 2005 worked on Kansabati river basin in India, they concluded that due to the random nature of contributing factors, occurrence and severity of droughts can be treated as stochastic in nature. In their study, linear stochastic models known as ARIMA and Multiplicative Seasonal Autoregressive Integrated Moving Average (SARIMA) models were used to forecast droughts based on the procedure of model development [7]. In this research an attempt to model the Khassa Chy river which is a tributary of Zghitun river which is flowing into the existing Adhaim dam reservoir, (this stream is dividing city of Kirkuk into two main sectors) depending on the monthly recorded data for a period  which begins from October. These data were taken from a report on Khassa Chay dam project hydrological study By Engineering Consultancy Bureau of Mustansiriya University -College of Engineering, 2006 [4]. These recorded monthly Khassa Chay stream flows for the mentioned period  were plotted and shown in Figure( By referring to the Figures 1 and 2 which show the monthly discharge in m 3 /sec during the period  it is obvious that the Khassa Chy river is a seasonal stream and by referring to the discharge values for the months(Jun, july, Augustous, and September)and for all years which are equal to zero or nearby zero, this means that this stream is suffering from a drought at these times. This characteristic of Khassa Chay stream requires from any researcher especial treatment in the analysis and modeling the stream. This treatment was done in this research by using many trials in the analysis and by rearranging the steps of the stochastic analysis till reaching the best description of the dependent stochastic component of the stream, and also by using two different trials when finding the model parameters as will be described in the following sections later .

Khassa Chay Streamflow Data Analysis.
As was mentioned in the previous section the data which was used in the analysis and modeling process was for the period . This means that the monthly khassa Chay flow was for 61 years (in m 3 /sec) , 56 years of this data was used for the analysis and the remained 5 years in data which were used for Calibration the model after building it. The Stochastic analysis process was done first by investigating the homogeneity and existence of trend portion in the monthly discharge of Khassa Chay stream for the 56 years, which was done successfully by method suggested by Yevjevich, depending on the T-test and by choosing a suitable level of significance [1]. After doing the process above the data was concluded to be homogenous and there was no error record in the khassa chay series. The series also was detected to be empty from trend component by using parametric and non parametric tests. The next step in the analysis was to investigate the periodicity of the series, which can be defined as a value repeats itself year after year in its seasonal position inside the year. The physical explanation of the presence of this component in seasonal hydrologic time series is that it is attributed to the astronomical events which are also periodic. The investigation of this component was done by measuring what is called as a persistency which is defined as the tendency of low flows to follow low flows or high flows to follow high flows, thus it represents the correlation between the flow at the time t with the flow at the time t+1, t+2,…….t+k which is measured by the serial correlation coefficient r k : where X me is the mean of the sample of n values of Xt and k is usually taken for values from zero up to 2*w where w represents the number of months [1], [5]. The periodic component of this series was removed using a non parametric method by standerdization. The series whole mean after this operation completion was=-1.3*10 -17 and the whole standard deviation = 0.9918. As it was recommended by many researchers dealing with stochastic modeling, the analysis operation requires a transforming the series into normally distributed series, this step was done using a Box-Cox transformation method. After this transformation step the Khassa Chay series whole mean became=-6.6546 and the standard deviation = 0.3782. Another step was done to randomize the series by standardizing the series, this also was done by subtracting the whole mean of the series from the transformed data and dividing it to the whole standard deviation. The whole mean of the sires after this step was = 5.1*10 -14 and the standard deviation=1. After the previous steps of the Stochastic analysis, the residual part in the series must be uncorrelated and random to reflect the non deterministic stochastic part of the series, this was checked by calculating the serial correlation coefficient of the series and the critical limits depending on the equation below : Where rc is the critical limits of serial correlation coefficient, N: number of the data, K: the value of the lag or delay [2], [5]. Table  (1) represents these values which reflect the success of the analysis that made the series nearly uncorrelated and random. Now, the next step was to choose a right model to describe the series and then to find the model parameters, this will be explained in the following section.

Markove Autoregressive AR.
This model describes the dependent in a hydrologic stochastic series (ε p,t ) by assuming that each value is a combined effect of a previous value in addition to an independent stochastic component (ξ p,t ), which occurs at the same time of occurrence of (ε p,t ). The formulation of this model is given by : Where(a i ,σ) are the parameters of the model . As the effect of the pervious values on the current values decreases with time in all series, a finite sum of the equation above is often sufficient to approximate the dependence ε p,t values . So that an alternative representation of this equation is: Where m is the order of the model. In a hydrologic time series the dependence can be well approximated by first, second or third order linear autoregressive model . Many tests are done to find the required order of this model. A modified form of autoregressive model is given by Yevjevich with general form ; Where (α k,t and σ ξ,t ) are the parameters of the model , both which may be periodic or non periodic depending on the estimated r k,t values using the equation below : where rk is lag k serial correlation coefficient N : number of annual data or years, K lag in units X p,t = value at time t , P is the year . If t =w-k+1 then n will be replaced by n-1 and X p,t+k with X p+1,k , W : number of seasonal positions in year [1], [5], [3].
If these values exhibit periodicity then α k,t and σ ξ,t should be periodic, otherwise they should be constant. The estimation of the first and second order Autoregressive model parameters is shown below :

1. Constant Parameters
By using equation (6) to find the lag one , two, and three months serial correlation coefficient , the values of r 1,t , r 2,t , r 3,t where calculated for the khassa chay stream flow time series after the analysis steps and found as shown in Table (2).
To determine a suitable order of this model a simplified practical method suggested by Yevjevich is used . This method expresses the goodness of fit the model by determination coefficients (Di, i=1, 2, 3, 4…….)which represents the percentage of the total variance of(ε p,t ) explained by ith order term of an autoregressive equation, while the remaining portion of this variance is explained by the(σt,ξp,t)term. The criteria used is that if a higher order model explains a percent of the variance less than a chosen quantity (∆ D) then this higher order model is neglected, i.e, if the difference between the percentage of the explained variance by jth and ith order terms of the model(Dj-Di) with j=i+1 is smaller than (∆ D) then the model order is taken to be i and (∆ D) is usually taken =0.01 , i.e, 1percent of the total variance of (ε p,t ) series. It is expected  The results indicated that the model is 1 st order and that the first degree autoregressive model fits the series, since higher degree models do not account for an increase of the explained variance of 1 percent or more over that explained by the first order model. Another decision, whether model parameters should be constant or periodic over the year, should be made. Since those parameters depend on r 1,t then they should be periodic if r 1,t found to be periodic. By comparing the found values of lag1 serial correlation coefficients with the critical limits which can be taken as was recommended by many researchers ,according to Anderson test : Where N is the number of the data [2]. See Table(2). The minimum value is -0.077 and the maximum one is 0.0602. Due to above limits the values of r 1,t reflect a periodicity therefor the model should be considered as a periodic. The Auto regressive model now is considered as a periodic and from 1st degree.

Model Parameters.
The Autoregressive periodic 1 st degree model parameters were found from the equations (12) and shown in Table(3). By using the model parameters, the independent deterministic stochastic part was extracted and the resulted series was analyzed to know its properties. The whole mean of this series was =0.0063, and the standard deviation=1.0535. After this operation a series of normally distributed numbers were generated with the same mean and same standard deviation with length covering the same 56 future years to represent the monthly independent deterministic part of khassa chay stream flow for the future years. The reason of selecting this length was to ensure future forecasting years reaches nearly 50 years ahead after 2001 since the 5 years of the data period ( 1941-2001) will 011 be used for verification or calibration of the model. The generated series was used with the resulted parameters above to find the stochastic dependent component of monthly Khassa Chay flow series for the future years(1997-2052)years, this was done by reversing the operation which were used in extracting the independent part of the series the following steps later were also reversing all the operations before the step of extracting the independent part with taking into consider the sequence of the steps which mean de-standardization the resulted series by using the same values of the mean and standard deviation which were used in the standardization then the power transformation method was reversed to return the series un transformed, the periodicity was also returned by using the same values of monthly means and monthly standard deviations. Many statistical tests were done to the generated series using the 1 st 5 years of the generated series as calibration period. Table( 4) shows the comparison between the forecasted khassa chay stream  flow and the pervious recorded data by comparing the (mean, standard deviation, coefficient of sekweness and coefficient of kurtosis) while the results of the statistical tests on the generated series and the recorded data for the 5 years which are the calibration years were as: tvalue for the mean value=-0.4866;t-value for the standard deviation value =-0.0356, while the critical limit =2.306 at 95% significance level,and the found F-value=1.125 while the critical value is=6.39.
In addition to these tests the same t, f tests were done to the monthly statistical parameters (monthly mean and monthly standard deviation, monthly coefficient of skwness, monthly coefficient of kurtosis)which were found. Most of these tests indicated to the success of the fitting the Markove autoregressive model to Khassa chay stream flow series, seeTable (8).

Seperating Khassa Chay Stream Series (The 2 nd Approach ) :
Because of the sensitivity of khassa chay stream due to the drought periods there was another trial which was done to model the khassa chay stream by the same autoregressive model Markove AR. But after separating the Khassa Chay stream data into wet periods and dry periods then the same analysis was done to the wet months in each year (oct, nov, dec, jan, feb, mar, apr, may) for the period of 56 years. The same stochastic analysis steps were done to the selected wet months as was resulted in the wet khassa chay months flow series was composed from a periodic component in addition to the stochastic non deterministic part The periodic part was removed by the same manner (non parametric), and the stochastic non deterministic part (dependent component ) was analyzed after the operation of the normalization and the standardization, The resulted dependent stochastic residual was analyzed to find the Markov model parameter. The results which were depended on the same equations above to find the order of the model were:r 1 = 0.1415, r 2 = 0.0246, r 3 = 0.0255, D 1 = 1.1111 , D 2 =0.0139, D 3 =0.0205, (∆D )= 0.0100, D 2 -D 1 =-0.0061, D 3 -D 1 = 4.8759*10 -4 , D 3 -D 2 = 0.0066.The model according to the above parameters was 1 st order and due to results of r 1,t was periodic see Table(5) .  The parameters of the model was found as shown in Table(6). The same forecasting operation steps were done to forecast the khassaChay stream flow just for the wet months. As was done in the pervious trial many statistical tests were done to check the wet months model, the results were as explained below: t-value=0.6733 for mean, t-value for standard deviation =0.1293, F-value=1.3419, which were a good indications of the success of the model and also the tests were done to the monthly parameters values and indicated to the same conclusion of model success, see Table ( 8). Figure (4) is also an other indication of the conformity of the forecasted and recorded two series.

Conclusions.
Many conclusions were found during the completion of this study, some of them are listed below : 1. The treatment of a seasonal streams that are suffering from drought periods require especial arrangement in the stochastic analysis steps which were done with Khassa Chay stream flow many times till reaching the most successful analysis . 2. The Markov AR scheme is found to be adequate to describe the structure of the stochastic component of the monthly mean Khassa chay stream flow and it was too suitable model to describe the Khassa Chay stream flow behavior. 3. The periodic type of this model is more realistic than the constant type because the model parameters are highly sensitive to errors in the estimates of the serial correlation coefficients, which was also clear from the results of lag one month serial correlation coefficients values for both models . 4. Dealing with wet times separately can be more adequate, this is clear by referring to the figures (3,4), since it is obvious from the conformity of the wet months model more than the conformity in the comprehensive model, although the monthly mean values were described in the comprehensive model better than the wet model , see Table(8).