Learning rate for the back propagation algorithm based on modified scant equation

The classical Back propagation method (CBP) is the simplest algorithm for training feed-forward neural networks. It uses the steepest descent direction with fixed learning rate  to minimize the error function E, since  is fixed for each iteration this causes slow convergence for the CBP algorithm. In this paper we suggested a new formula for computing learning rate k  , using modified secant equation to accelerate the convergence of the CBP algorithm. Simulation results are presented and compared with


1.Introduction
Neural networks are composed of simple elements operating in parallel.
These elements are inspired by biological neurons systems. As in nature, the network function is determined largely by the connections between elements.
We can train a neural network to perform a particular function by adjusting the values of the connections(weights), between elements, commonly neural networks are adjusted, or trained so that a particular input leads to as specific target output. The network is adjusted, based on a comparison of the output and the target, until the network output matches the target. Typically many such input/target pairs are used in this supervised learning to train a network. Batch training of network proceeds by making weight and bias changes based on an entire set (batch) of input vectors [6].
The batch training of the Multi-layer Feed-forward Neural network (MFFN) can be formulated as a non-linear unconstrained minimization problem [8,9] .Namely . ), ( min where E is the batch error measure defined as the sum of squared differences Error functions over the entire training set , defined by is the squared differences between the actual j-th output layer neuron for pattern P and the target output value. The scalar P is an index over input-output pairs, the general purpose of the training is to search an optimal set of connection weights in the manner that the error of the network output can be minimized. The most popular training algorithm is the Classical Batch Back Propagation (CBP) introduced by Rumelhart, Hinton and Williams [12]. Although the CBP  [5].
In order to overcome to the drawbacks of the CBP algorithm many gradient based training algorithms have been proposed in the literature [1,2,5,7,13] .

Some Modifications on CBP.
A surprising result was given by Brazilian and Brownie [3], which gives formula for the learning rate k  and leads to super linear convergence. The main idea of BB method is to use the information in the previous iteration to decide the step size (learning rate) in the current iteration. The iteration (3) is viewed as . In order to force matrix respectively. Note that we abbreviate the method defined in equation (3) with learning rate defined in equations (7) and (8) as BB1 and BB2 methods, respectively.
An alternative approach is based on the work of Plagianakos et al [11].
Following this approach, equation (3) is reformulated to the following Scheme: . A well known difficulty to this approach is that the computation of the Eigen values or estimating them is not asimple task, hence the schema defined in equation (9) is not practical .

A New Efficient Monotone Learning rate
Due to the unexpected theoretical properties and the striking numerical performance of the BB1 and BB2 methods, it inspired lots of researches on the gradient methods [4]. We believe that the main drawback of the BB methods happen when  [14], in this paper we consider the following To compute the value of k  in equation (12), we minimize the following Quadratic equation . At this point we will summarize the new training algorithm, we abbreviate this new algorithm as MSBP

Note:
Below are some observations on which the convergence of MSBP

Experiments and Results:
A computer simulation has been developed to study the performance of the learning algorithms. The simulations have been carried out using MATLAB(7.6) the performance of the MSBP has been evaluated and compared with batch versions of the Classical Back Propagation (CBP). The algorithms were tested using the initial weights, initialized by the Nguyenwidrow method [10] and received the same sequence of input patterns . The weights of network are updated only after the entire set of patterns to be learned has been presented . If an algorithm fails to converge within the above limit considered that it fails to train the FFNN, but its epochs are not included in the statical analysis of the algorithm, one gradient and one error function evaluations are necessary at each epoch.

Problem (1): (Spect Heart Problem)
This data set contains data instances derived from Cardiac Single Proton Emission Computed Tomography (SPECT) images from the university of    (1), we note that the algorithm MSBP is the best algorithm with respect to the epochs number and the time.

Problem (2): Continuous Function Approximation:
The second test problem we consider is the approximation of the continuous trigonometric function:

Problem (3):(XOR Problem)
The last problem we have been encountered with is the XOR Boolean function problem, which is considered as a classical problem for the FFNN training . The XOR function maps two binary inputs to a single binary output.
As it is well known this function is not linearly separable. The network architectures for this binary classification problem consists of one hidden layer   (3), we conclude that the algorithm BB1 is the beast algorithm with respect to the succeeded simulations, number of epochs and the time.