2016年12月16日 星期五

[ NNF For Java ] Other Neural Network Types (Ch7)

Preface 
• Understanding the Elman Neural Network
• Understanding the Jordan Neural Network
• The ART1 Neural Network
• Evolving with NEAT

We have primarily looked at feedforward neural networks so far in this book. All connections in a neural network do not need to be forward. It is also possible to create recurrent connections. This chapter will introduce neural networks that are allowed to form recurrent connections. Though not a recurrent neural network, we will also look at the ART1 neural network. This network type is interesting because it does not have a distinct learning phase like most other neural networks. The ART1 neural network learns as it recognizes patterns. In this way it is always learning, much like the human brain. 

This chapter will begin by looking at Elman and Jordan neural networks. These networks are often called simple recurrent neural networks (SRN). 

The Elman Neural Network 
Elman and Jordan neural networks are recurrent neural networks that have additional layers and function very similarly to the feedforward networks in previous chapters. They use training techniques similar to feedforward neural networks as well. Below Figure 7.1 shows an Elman neural network. The Elman neural network uses context neurons. They are labeled as C1 and C2. The context neurons allow feedback. Feedback is when the output from a previous iteration is used as the input for successive iterations. Notice that the context neurons are fed from hidden neuron output. There are no weights on these connections. They are simply an output conduit from hidden neurons to context neurons. The context neurons remember this output and then feed it back to the hidden neurons on the next iteration. Therefore, the context layer is always feeding the hidden layer its own output from the previous iteration. 


The connection from the context layer to the hidden layer is weighted. This synapse will learn as the network is trained. Context layers allow a neural network to recognize context. To see how important context is to a neural network, consider how the previous networks were trained. The order of the training set elements did not really matter. The training set could be jumbled in any way needed and the network would still train in the same manner. With an Elman or a Jordan neural network, the order becomes very important. The training set element previously supported is still affecting the neural network. This becomes very important for predictive neural networks and makes Elman neural networks very useful for temporal neural networks. 

Chapter 8 will delve more into temporal neural networks. Temporal networks attempt to see trends in data and predict future data values. Feedforward networks can also be used for prediction, but the input neurons are structured differently. This chapter will focus on how neurons are structured for simple recurrent neural networks. 

Dr. Jeffrey Elman created the Elman neural network. Dr. Elman used an XOR pattern to test his neural network. However, he did not use a typical XOR pattern like we’ve seen in previous chapters. He used a XOR pattern collapsed to just one input neuron. Consider the following XOR truth table. 


Now, collapse this to a string of numbers. To do this simply read the numbers left-to-right, line-by-line. This produces the following: 
1.0 , 0.0 , 1.0 , 0.0 , 0.0 , 0.0 , 0.0 , 1.0 , 1.0 , 1.0 , 1.0 , 0.0

We will create a neural network that accepts one number from the above list and should predict the next number. This same data will be used with a Jordan neural network later in this chapter. Sample input to this neural network would be as follows: 



It would be impossible to train a typical feedforward neural network for this. The training information would be contradictory. Sometimes an input of 0 results in a 1; other times it results in a 0. An input of 1 has similar issues. The neural network needs context; it should look at what comes before. We will review an example that uses an Elman and a feedforward network to attempt to predict the output. An example of the Elman neural network can be found at ElmanXOR. When run, this program produces the following output: 
...
Training Elman, Epoch #3690 Error:0.27448061278629504
Training Elman, Epoch #3691 Error:0.2744620123332587
Training Elman, Epoch #3692 Error:0.2744434778358156
...
Training Feedforward, Epoch #245 Error:0.5000011288463052
Training Feedforward, Epoch #246 Error:0.5000011286004022
Best error rate with Elman Network: 0.23241543139461884
Best error rate with Feedforward Network: 0.5000011286004022

Elman should be able to get into the 30% range,
feedforward should not go below 50%.
The recurrent Elment net can learn better in this case.
If your results are not as good, try rerunning, or perhaps training longer.

As you can see, the program attempts to train both a feedforward and an Elman neural network with the temporal XOR data. The feedforward neural network does not learn the data well, but the Elman learns better. In this case, feedforward neural network gets to 50% and Elman neural network gets to 23%. The context layer helps considerably. (This program uses random weights to initialize the neural network. If the first run does not produce good results, try rerunning. A better set of starting weights can help.

Creating an Elman Neural Network 
Calling the createElmanNetwork method creates the Elman neural network in this example. This method is shown here. 
  1. static MLMethod createElmanNetwork() {  
  2.     // construct an Elman type network  
  3.     ElmanPattern pattern = new ElmanPattern();  
  4.     pattern.setActivationFunction(new ActivationTANH());  
  5.     pattern.setInputNeurons(1);  
  6.     pattern.addHiddenLayer(6);  
  7.     pattern.setOutputNeurons(1);  
  8.     return pattern.generate();  
  9. }  
As you can see from the above code, the ElmanPattern is used to actually create the Elman neural network. This provides a quick way to construct an Elman neural network. 

Training an Elman Neural Network 
Elman neural networks tend to be particularly susceptible to local minima. A local minimum is a point where training stagnates. Visualize the weight matrix and thresholds as a landscape with mountains and valleys. To get to the lowest error, you want to find the lowest valley. Sometimes training finds a low valley and searches near this valley for a lower spot. It may fail to find an even lower valley several miles away. 

This example’s training uses several training strategies to help avoid this situation. The training code for this example is shown below. The same training routine is used for both the feedforward and Elman networks and uses backpropagation with a very small learning rate. However, adding a few training strategies helps greatly. The trainNetwork method is used to train the neural network. This method is shown here. 
  1. public static double trainNetwork(final String what,  
  2.         final MLMethod network, final MLDataSet trainingSet) {  
  3.     // train the neural network  
  4.     // (1)  
  5.     TrainingSetScore score = new TrainingSetScore(trainingSet);  
  6.     final MLTrain trainAlt = new NeuralSimulatedAnnealing((BasicNetwork)network, score, 10.02.0100);  
  7.   
  8.     final MLTrain trainMain = new Backpropagation((BasicNetwork)network, trainingSet, 0.0000010.0);  
  9.   
  10.     // (2)  
  11.     ((Propagation)trainMain).setThreadCount(1);  
  12.     final StopTrainingStrategy stop = new StopTrainingStrategy();  
  13.   
  14.     // (3)  
  15.     trainMain.addStrategy(new Greedy());  
  16.     trainMain.addStrategy(new HybridStrategy(trainAlt));  
  17.     trainMain.addStrategy(stop);  
  18.   
  19.     // (4)  
  20.     int epoch = 0;  
  21.     while (!stop.shouldStop()) {  
  22.         trainMain.iteration();  
  23.         System.out.println("Training " + what + ", Epoch #" + epoch  
  24.                 + " Error:" + trainMain.getError());  
  25.         epoch++;  
  26.     }  
  27.     return trainMain.getError();  
  28. }  
One of the strategies employed by this program is a HybridStrategy. This allows an alternative training technique to be used if the main training technique stagnates. We will use simulated annealing as the alternative training strategy. (1). As you can see, we use a training set-based scoring object. For more information about simulated annealing, refer to Chapter 6, “More Supervised Training.” The primary training technique is back propagation; (2). We will use a StopTrainingStrategy to tell us when to stop training. The StopTrainingStrategy will stop the training when the error rate stagnates. By default, stagnation is defined as less than a 0.00001% improvement over 100 iterations; (3). These strategies are added to the main training technique; (4). We also make use of a greedy strategy. This strategy will only allow iterations to improve the error rate of the neural network. The loop continues until the stop strategy indicates that it is time to stop. 

The Jordan Neural Network 
Encog also contains a pattern for a Jordan neural network. The Jordan neural network is very similar to the Elman neural network. Figure 7.2 shows a Jordan neural network. 

As you can see, a context neuron is used and is labeled C1, similar to the Elman network. However, the output from the output layer is fed back to the context layer, rather than the hidden layer. This small change in the architecture can make the Jordan neural network better for certain temporal prediction tasks. 

The Jordan neural network has the same number of context neurons as it does output neurons. This is because the context neurons are fed from the output neurons. The XOR operator has only one output neuron. This leaves you with a single context neuron when using the Jordan neural network for XOR. Jordan networks work better with a larger number of output neurons. To construct a Jordan neural network, the JordanPattern should be used. The following code demonstrates this. 
  1. static BasicNetwork createJordanNetwork() {  
  2.     // construct an Elman type network  
  3.     JordanPattern pattern = new JordanPattern();  
  4.     pattern.setActivationFunction(new ActivationTANH());  
  5.     pattern.setInputNeurons(1);  
  6.     pattern.addHiddenLayer(6);  
  7.     pattern.setOutputNeurons(1);  
  8.     return (BasicNetwork)pattern.generate();  
  9. }  
The above code would create a Jordan neural network similar to Figure 7.2. 

Encog includes an example XOR network that uses the Jordan neural network. This example is included mainly for completeness for comparison of Elman and Jordan on the XOR operator. As previously mentioned, Jordan tends to do better when there are a larger number of output neurons. The Encog XOR example for Jordan will not be able to train to a very low error rate and does not perform noticeably better than a feedforward neural network. The Jordan example can be found at JordanXOR

The ART1 Neural Network 
The ART1 neural network is a type of Adaptive Resonance Theory (ART) neural network. ART1, developed by Stephen Grossberg and Gail Carpenter, supports only bipolar input. The ART1 neural network is trained as it is used and is used for classification. New patterns are presented to the ART1 network and are classified into either new or existing classes. Once the maximum number of classes has been used, the network will report that it is out of classes. 

An ART1 network appears as a simple two-layer neural network. However, unlike a feedforward neural network, there are weights in both directions between the input and output layers. The input neurons are used to present patterns to the ART1 network. ART1 uses bipolar numbers, so each input neuron is either on or off. A value of one represents on, and a value of negative one represents off. The output neurons define the groups that the ART1 neural network will recognize. Each output neuron represents one group. 

Using the ART1 Neural Network 
We will now see how to actually make use of an ART1 network. The example presented here will create a network that is given a series of patterns to learn to recognize. This example can be found at NeuralART1. This example constructs an ART1 network. This network will be presented new patterns to recognize and learn. If a new pattern is similar to a previous pattern, then the new pattern is identified as belonging to the same group as the original pattern. If the pattern is not similar to a previous pattern, then a new group is created. If there is already one group per output neuron, then the neural network reports that it can learn no more patterns. The output from this example can be seen here. 


The above output shows that the neural network is presented with patterns. The number to the right indicates in which group the ART1 network placed the pattern. Some patterns are grouped with previous patterns while other patterns form new groups. Once all of the output neurons have been assigned to a group, the neural network can learn no more patterns. Once this happens, the network reports that all classes have been exhausted. 

First, an ART1 neural network must be created. This can be done with the following code. 
  1. ART1 logic = new ART1(INPUT_NEURONS,OUTPUT_NEURONS);  
This creates a new ART1 network with the specified number of input neurons and output neurons. Here we create a neural network with 5 input neurons and 10 output neurons. This neural network will be capable of clustering input into 10 clusters. Because the input patterns are stored as string arrays, they must be converted to a boolean array that can be presented to the neural network. Because the ART1 network is bipolar, it only accepts Boolean values. The following code converts each of the pattern strings into an array of Boolean values. 
  1. public void setupInput() {  
  2.     this.input = new boolean[PATTERN.length][INPUT_NEURONS];  
  3.     for (int n = 0; n < PATTERN.length; n++) {  
  4.         for (int i = 0; i < INPUT_NEURONS; i++) {  
  5.             this.input[n][i] = (PATTERN[n].charAt(i) == 'O');  
  6.         }  
  7.     }  
  8. }  
The patterns are stored in the PATTERN array. The converted patterns will be stored in the boolean input array. Now that a boolean array represents the input patterns, we can present each pattern to the neural network to be clustered. This is done with the following code, beginning by looping through each of the patterns: 
  1. public void run() {  
  2.     this.setupInput();  
  3.     ART1 logic = new ART1(INPUT_NEURONS,OUTPUT_NEURONS);  
  4.       
  5.     for (int i = 0; i < PATTERN.length; i++) {  
  6.         // (1)  
  7.         BiPolarNeuralData in = new BiPolarNeuralData(this.input[i]);  
  8.         BiPolarNeuralData out = new BiPolarNeuralData(OUTPUT_NEURONS);  
  9.   
  10.         // (2)  
  11.         logic.compute(in, out);  
  12.   
  13.         if (logic.hasWinner()) {  // (3)  
  14.             System.out.println(PATTERN[i] + " - " + logic.getWinner());  
  15.         } else {  // (4)  
  16.             System.out.println(PATTERN[i]  
  17.                     + " - new Input and all Classes exhausted");  
  18.         }  
  19.     }  
  20. }  
(1). First, we create a BiPolarNeuralData object that will hold the input pattern. A second object is created to hold the output from the neural network; (2). Using the input, we compute the output; (3). Determine if there is a winning output neuron. If there is, this is the cluster that the input belongs to; (4). If there is no winning neuron, the user is informed that all classes have been used. 

The ART1 is a network that can be used to cluster data on the fly. There is no distinct learning phase; it will cluster data as it is received. 

The NEAT Neural Network 
NeuroEvolution of Augmenting Topologies (NEATis a Genetic Algorithm for evolving the structure and weights of a neural network. NEAT was developed by Ken Stanley at The University of Texas at Austin. NEAT relieves the neural network programmer of the tedious task of figuring out the optimal structure of a neural network’s hidden layer. 

A NEAT neural network has an input and output layer, just like the more common feedforward neural networks. A NEAT network starts with only an input layer and output layer. The rest is evolved as the training progresses. Connections inside of a NEAT neural network can be feedforward, recurrent, or self-connected. All of these connection types will be tried by NEAT as it attempts to evolve a neural network capable of the given task. 

As you can see, the above network has only an input and output layers. This is not sufficient to learn XOR. These networks evolve by adding neurons and connections. Below shows a neural network that has evolved to process the XOR operator. 


The above network evolved from the previous network. An additional hidden neuron was added between the first input neuron and the output neuron. Additionally, a recurrent connection was made from the output neuron back to the first hidden neuron. These minor additions allow the neural network to learn the XOR operator. The connections and neurons are not the only things being evolved. The weights between these neurons were evolved as well. 

As shown in Figure 7.4, a NEAT network does not have clearly defined layers like traditional feed forward networks. There is a hidden neuron, but not really a hidden layer. If this were a traditional hidden layer, both input neurons would be connected to the hidden neuron. NEAT is a complex neural network type and training method. Additionally, there is a new version of NEAT, called HyperNEAT. Complete coverage of NEAT is beyond the scope of this book. I will likely release a future book on focused on Encog application of NEAT and HyperNEAT. This section will focus on how to use NEAT as a potential replacement for a feedforward neural network, providing you all of the critical information for using NEAT with Encog. 

Creating an Encog NEAT Population 
This section will show how to use a NEAT network to learn the XOR operator. There is very little difference between the code in this example that used for a feedforward neural network to learn the XOR operator. One of Encog’s core objectives is to make machine learning methods as interchangeable as possible. You can see this example at XORNEAT
  1. package org.encog.examples.neural.neat;  
  2.   
  3. import org.encog.Encog;  
  4. import org.encog.ml.CalculateScore;  
  5. import org.encog.ml.data.MLDataSet;  
  6. import org.encog.ml.data.basic.BasicMLDataSet;  
  7. import org.encog.ml.ea.train.EvolutionaryAlgorithm;  
  8. import org.encog.neural.neat.NEATNetwork;  
  9. import org.encog.neural.neat.NEATPopulation;  
  10. import org.encog.neural.neat.NEATUtil;  
  11. import org.encog.neural.networks.training.TrainingSetScore;  
  12. import org.encog.util.simple.EncogUtility;  
  13.   
  14. public class XORNEAT {  
  15.     public static double XOR_INPUT[][] = { { 0.00.0 }, { 1.00.0 },  
  16.                                            { 0.01.0 }, { 1.01.0 } };  
  17.   
  18.     public static double XOR_IDEAL[][] = { { 0.0 }, { 1.0 }, { 1.0 }, { 0.0 } };  
  19.   
  20.     public static void main(final String args[]) {  
  21.         // (1)  
  22.         MLDataSet trainingSet = new BasicMLDataSet(XOR_INPUT, XOR_IDEAL);  
  23.   
  24.         // (2)  
  25.         NEATPopulation pop = new NEATPopulation(211000);  
  26.         pop.setInitialConnectionDensity(1.0);// not required, but speeds  
  27.                                                 // training  
  28.         pop.reset();  
  29.   
  30.         // (3)  
  31.         CalculateScore score = new TrainingSetScore(trainingSet);  
  32.   
  33.         // (4) train the neural network  
  34.         final EvolutionaryAlgorithm train = NEATUtil.constructNEATTrainer(pop,score);  
  35.   
  36.         do {  
  37.             train.iteration();  
  38.             System.out.println("Epoch #" + train.getIteration() + " Error:"  
  39.                                          + train.getError() + ", Species:"  
  40.                                          + pop.getSpecies().size());  
  41.         } while (train.getError() > 0.01);  
  42.   
  43.         NEATNetwork network = (NEATNetwork) train.getCODEC().decode(  
  44.                 train.getBestGenome());  
  45.   
  46.         // test the neural network  
  47.         System.out.println("Neural Network Results:");  
  48.         EncogUtility.evaluate(network, trainingSet);  
  49.   
  50.         Encog.getInstance().shutdown();  
  51.     }  
  52. }  
(1). This example begins by creating an XOR training set to provide the XOR inputs and expected outputs to the neural network. To review the expected inputs and outputs for the XOR operator, refer to Chapter 3; (2). Next a NEAT population is created. Previously, we would create a single neural network to be trained. NEAT requires the creation of an entire population of networks. This population will go through generations producing better neural networks. Only the fit members of the population will be allowed to breed new neural networks. Here the population is created with two input neurons, one output neuron and a population size of 1,000. The larger the population, the better the networks will train. However, larger populations will run slower and consume more memory. 

(3). Earlier we said that only the fit members of the population are allowed to breed to create the next generations; (4). One final required step is to set the evolutionary algorithm which implement interface EvolutionaryAlgorithm. Here leverage class NEATUtil to construct one for us. 

Now that the population has been created, it must be trained. 

Training an Encog NEAT Neural Network 
Training a NEAT neural network is very similar to training any other neural network in Encog: create a training object and begin looping through iterations. As these iterations progress, the quality of the neural networks in the population should increase. A NEAT neural network is trained with the class implementing TrainEA interface. Here you can see a EvolutionaryAlgorithm object being created through factory method of NEATUtil
  1. final EvolutionaryAlgorithm train = NEATUtil.constructNEATTrainer(pop, score);  
This object trains the population to a 1% error rate. 
  1. EncogUtility.trainToError((MLTrain)train, 0.01);  
Once the population has been trained, extract the best neural network. 
  1. NEATNetwork network = (NEATNetwork) train.getCODEC().decode(train.getBestGenome());  
With an established neural network, its performance must be tested. Now, display the results from the neural network: 
  1. // test the neural network  
  2. System.out.println("Neural Network Results:");  
  3. EncogUtility.evaluate(network, trainingSet);  
  4.   
  5. Encog.getInstance().shutdown();  
This will produce the following output. 
Beginning training...
Iteration #1 Error:25.000000% Target Error: 1.000000%
Iteration #2 Error:25.000000% Target Error: 1.000000%
Iteration #3 Error:24.792446% Target Error: 1.000000%
...
Iteration #17 Error:0.410496% Target Error: 1.000000%
Neural Network Results:
Input=0.0000,0.0000, Actual=0.1279, Ideal=0.0000
Input=1.0000,0.0000, Actual=0.9930, Ideal=1.0000
Input=0.0000,1.0000, Actual=0.9987, Ideal=1.0000

Input=1.0000,1.0000, Actual=0.0000, Ideal=0.0000

If you want to process single record of input data, you can try below way: 
  1. MLData outData = network.compute(new BasicMLData(XOR_INPUT[0]));  
  2. System.out.printf("Input=[0, 0] with Output=%.03f\n", outData.getData(0));  
  3. outData = network.compute(new BasicMLData(XOR_INPUT[1]));  
  4. System.out.printf("Input=[1, 0] with Output=%.03f\n", outData.getData(0));  
The output look like: 
Input=[0, 0] with Output=0.027
Input=[1, 0] with Output=1.000

The network has learned the XOR operator from the above results. XOR will produce an output of 1.0 only when the two inputs are not both of the same value.

沒有留言:

張貼留言

[ FP with Java ] Ch1 - What is functional programming

Preface   This chapter covers: ( Functional Programming in Java )  *  The benefits of functional programming *  Problems with side ef...