Scaled Fletcher-Revees Method for Training Feed forward Neural Network

The training phase of a Back-Propagation (BP) network is an unconstrained optimization problem. The goal of the training is to search an optimal set of connection weights in the manner that the error of the network out put can be minimized. In this paper we developed the Classical Fletcher-Revees (CFRB) method for non-linear conjugate gradient to the scaled conjugate gradient (SFRB say) to train the feed forward neural network. Our development is based on the sufficient descent property and pure conjugacy conditions. Comparative results for (SFRB), (CFRB) and standard Back-Propagation (BP) are presented for some test problems. رجتیلف ةقیرط ةیماملأا ةیذغتلا تاذ ةیبصعلا ةكبشلا میلعتل ةملعمب زییفیر


1.Introduction
The Back Propagation (BP) algorithm is perhaps the most widely used supervised training algorithm for multi-layered Feed Forward Neural Networks (FFNN), [6 , 15].
The BP learns a predefined set of output example pairs by using a twophase propagate adapts cycle.After an input pattern has been applied as a stimulus to first layer of network units, it is propagated through each layer until an output is generated [1], this output pattern is then compared to the target output and an error signal is computed for each output unit, the signals are then transmitted backward from the output layer to each unit in the intermediate that contributes directly to the output, each unit in the intermediate layer receives only a portion of the total error signal based roughly on the relative contribution the unit made to the original output.This process repeats layer by layer until each unit in the network has received an error signal that describes it is relative contribution to the total error [18].Mathematically the standard training problem of a neural network reduces to finding a set of weights w to minimize the error function E defined as the sum of the squared errors in the output [3] 2 Where l x is a function of w (the weight vector), b (the bias) and T is the target through the equations of the forward pass.This cost function measures the squared error between the desired and actual output vectors.

2.The Standard Backpropagation (SBP) Algorithm
Lets diagram the network as  , l b is the bias for each L l ,...., 1 = there are 1 + L layers of neurons, and L hidden layers, we would like to change the weights w and biases b so that the actual output l x becomes closer to the desired output d.The Backpropagation algorithm consists the following steps.
1-Forward pass.The input vector o x is transformed into the output vector l x , by evaluating the equation

For l=1 to L, and k is index of iteration usually called epoch 2-Error computation .The difference between the desired output d and actual output
(2) 3-Backward pass.The error signal at the output units is propagated Backwards through the entire network, by evaluating 4-Learning updates.The weights and biases are updated using the results of the forward and backward passes.Compute the gradient and learning rate k a and update the weights and biases and set k=k+1; go to step(1).Where k is the current iteration usually called epoch, and are initial weights and biases respectively and 0 > a is the learning rate (step-size).We see from step(4) that the SBP algorithm uses the Steepest Descent (SD) search direction i e ( k g d k k , " -= ) with fixed step-size ( say 3 .0 = a ) or (learning rate) a i n o r d er t o p er f o r m t h e minimization of the error function E. The inefficiency of (SD) is due to the fact that the minimization directions and learning are chosen poorly, if the first step-size does not lead directly to the minimum SD will zig-zag with many small steps [2 , 10 ] The backpropagation search direction k d is usually augmented with a momentum term [10 ] This extra term is generally interpreted as to avoid oscillations, adding the momentum term is wise when the values k a and k b are well chosen.One method which chooses these parameters is known as Conjugate Gradient (CG) method.
In this work we modify the BP algorithm in two ways, the first way instead of using constant learning rate we use line search procedure to compute the learning rate k a such that the Wolfe conditions (given later) hold, the second way is to modify the search directions The plan of this paper is as follows: In section 2 we present the proposed (SFRBP) training algorithm.Section 3 contains our numerical examples and results.

The Proposed Method 2.1 Conjugate Gradient (CG)
In Conjugate gradient methods the basic idea for determining the search direction k d in step(4) Eq.( 4) is the linear combination of the negative gradient at the current iteration with the previous search direction namely In the literature there have been proposed several choices for defining the scalar parameter k b which give rise to distinct conjugate gradient methods [4,12].The most famous ones were proposed by Fletcher-Reeves (FR) [7] defined as: The convergence analysis [12,13] of this method is usually based on mild conditions which refer to the Lipschtz and boundedness assumption and is closely connected with sufficient descent property Hagar and Zhang [8] presented an excellent survey on conjugate gradient methods.As a learning acceptability criterion we will apply the standard Wolfe conditions that is we suggest that the learning rate k a can be computed along the search direction k d by Wolfe line search conditions :

The proposed Scaled FR Training algorithm (SFRBP)
Multilayer networks typically use sigmoid transfer functions in the hidden layers [9].These functions are often called ' squashing ' functions, since they compress an infinite input range into a finite output range.Sigmoid functions are characterized by the fact that their slop must approach zero as input gets large [18].This causes a problem when using SD to train Multilayer network with sigmoid functions, since the gradient can have a very small magnitude and therefore cause small changes in the weights and biases even though the weights and biases are far from their optimal values [17].In this section we introduce a new scaled Fletcher-Reeves (SFRBP) algorithm to train a multilayer feed forward neural networks.The prupose of the SFRBP training algorithm is to eliminate these harmful effects of the magnitudes of the partial derivatives.know consider the scaled search direction of the form: is parameter.In the conjugate gradient algorithms a search is performed along conjugate directions that is the search directions Solve the above equation for 1 To find the value of c we use the equation ( 13), then 0 ) ( From equations ( 16) and ( 15) we have Therefore the search direction for the new scaled FR algorithm is q is defined in the equation (17).

New Scaled FR Backpropagation (SFRBP) Algorithm
The steps (1), ( 2) and (3) are the same as SBP algorithm and step(4) changes to the following form Step(4): 1. Initialization: use Nguyen widrow method to initialize the weights and Biases and set k=1, 0 , > = e err gaol and compute ) ( ), ( = is optimal else goto 3 3. learning rate computation: compute k a by line search procedure such that Wolfe conditions (10) and (11) are satisfied and update the weights and biases according to the , if Powell restart [13] is satisfied then set

Experimental Results
In this section a computer simulation has been developed to study the performance of the learning algorithms, the simulation has been carried out using MATLAB.The following table lists the algorithms that are tested and the acronyms we use to identify them Toolbox default values for the heuristic parameters of the above algorithms are used unless stated otherwise.The algorithms were tested using the same initial weights, initialized by the Nguyen-Widrow method [11] and received the same sequence of input patterns.The weights of the network are updated only after the entire set of patterns to be learned has been presented.
For each of the test problems, a table summarizing the performance of the algorithms for simulations that reached solution is presented.The reported parameters are: min the minimum number of epochs listed in the first column, max the maximum number of epochs listed in the second column, mean the mean value of epochs listed in the third column , Tav the average of total time in the fourth column and finally succ in the last column.The succeeded simulations out of (100) trials within the error function evaluations limit.If an algorithm fails to converge within the above limit, it is considered that it fails to train the FFNN, but its epochs are not included in the statical analysis of the algorithms, one gradient and one error function evaluations are necessary at each epoch.

Problem 1 (XOR problem).
The selected architecture of the FFNN is the 2-3-1 layers with logsig transfer function in the hidden layers and purelin transfer function in out put layer, the error goal has been set to conjugate gradient algorithm (SFRBP say).

d
defined by equation (2.6) satisfies the sufficient descent condition (9), then The Fourth Scientific Conference of the College of Computer Science & Mathematics