So far, this book has only explored training a neural network by using the supervised propagation training methods. This chapter will look at some nonpropagation training techniques. The neural network in this chapter will be trained without a training set. It is still supervised in that feedback from the neural network’s output is constantly used to help train the neural network. We simply will not supply training data ahead of time.
Two common techniques for this sort of training are simulated annealing and genetic algorithms. Encog provides built-in support for both. The example in this chapter can be trained with either algorithm, both of which will be discussed later in this chapter. The example in this chapter presents the classic “Lunar Lander” game. This game has been implemented many times and is almost as old as computers themselves.
The idea behind most variants of the Lunar Lander game is very similar and the example program works as follows: The lunar lander spacecraft will begin to fall. As it falls, it accelerates. There is a maximum velocity that the lander can reach, which is called the ‘terminal velocity.’ Thrusters can be applied to the lander to slow its descent. However, there is a limited amount of fuel. Once the fuel is exhausted, the lander will simply fall, and nothing can be done.
This chapter will teach a neural network to pilot the lander. This is a very simple text-only simulation. The neural network will have only one option available to it. It can either decide to fire the thrusters or not to fire the thrusters. No training data will be created ahead of time and no assumptions will be made about how the neural network should pilot the craft. If using training sets, input would be provided ahead of time regarding what the neural network should do in certain situations. For this example, the neural network will learn everything on its own.
Even though the neural network will learn everything on its own, this is still supervised training. The neural network will not be totally left to its own devices. It will receive a way to score the neural network. To score the neural network, we must give it some goals and then calculate a numeric value that determines how well the neural network achieved its goals.
These goals are arbitrary and simply reflect what was picked to score the network. The goals are summarized here:
In the next section we will run the Lunar Lander example and observe as it learns to land a spacecraft.
Running the Lunar Lander Example
To run the Lunar Lander game you should execute the LunarLander class. This class requires no arguments. Once the program begins, the neural network immediately begins training. It will cycle through 50 epochs, or training iterations, before it is done. When it first begins, the score is a negative number. These early attempts by the untrained neural network are hitting the moon at high velocity and are not covering much distance.
The training techniques used in this chapter make extensive use of random numbers. As a result, running this example multiple times may result in entirely different scores. More epochs may have produced a better-trained neural network; however, the program limits it to 50. This number usually produces a fairly skilled neural pilot. Once the network is trained, run the simulation with the winning pilot. The telemetry is displayed at each second.
The neural pilot kept the craft aloft for 911 seconds. So, we will not show every telemetry report. However, some of the interesting actions that this neural pilot learned are highlighted. The neural network learned it was best to just let the craft free-fall for awhile.
How the winning network landed:
You can see that 171 seconds in and 3,396 meters above the ground, the terminal velocity of -40 m/s has been reached. There is no real science behind -40 m/s being the terminal velocity; it was just chosen as an arbitrary number. Having a terminal velocity is interesting because the neural networks learn that once this is reached, the craft will not speed up. They use the terminal velocity to save fuel and “break their fall” when they get close to the surface. The freefall at terminal velocity continues for some time. Finally, the thrusters are fired for the first time.
Finally, the craft lands, with a very soft velocity of positive 8.66. You wonder why the lander lands with a velocity of 8.66. This is due to a slight glitch in the program. This “glitch” is left in because it illustrates an important point: when neural networks are allowed to learn, they are totally on their own and will take advantage of everything they can find.
The final positive velocity is because the program decides if it wants to thrust as the last part of a simulation cycle. The program has already decided the craft’s altitude is below zero, and it has landed. But the neural network
“sneaks in” that one final thrust, even though the craft is already landed and this thrust does no good. However, the final thrust does increase the score of the neural network.
Recall equation 6.1. For every negative meter per second of velocity at landing, the program score is decreased by 1,000. The program figured out that the opposite is also true. For every positive meter per second of velocity, it also gains 1,000 points. By learning about this little quirk in the program, the neural pilot can obtain even higher scores. The neural pilot learned some very interesting things without being fed a pre-devised strategy. The network learned what it wanted to do. Specifically, this pilot decided the following:
The neural pilot in this example was trained using a genetic algorithm. Genetic algorithms and simulated annealing will be discussed later in this chapter. First, we will see how the Lander was simulated and how its score is actually calculated.
Examining the Lunar Lander Simulator
We will now examine how the Lunar Lander example was created by physical simulation and how the neural network actually pilots the spacecraft. Finally, we will see how the neural network learns to be a better pilot.
Simulating the Lander
First, we need a class that will simulate the “physics” of lunar landing. The term “physics” is used very loosely. The purpose of this example is more on how a neural network adapts to an artificial environment than any sort of realistic physical simulation. All of the physical simulation code is contained in the LanderSimulator class. This class begins by defining some constants that will be important to the simulation.
The simulator sets the values to reasonable starting values in the following constructor:
the altitude will decrease; (3). If thrust is applied during this turn, then decrease the fuel by one and increase the velocity by the THRUST constant; (4). Terminal velocity must be imposed as it cannot fall or ascend faster than
the terminal velocity. The following line makes sure that the lander is not ascending faster than the terminal velocity; (5). The following line makes sure that the altitude does not drop below zero. It is important to prevent the simulation of the craft hitting so hard that it goes underground.
In addition to the simulation code, the LanderSimulator also provides two utility functions. The first calculates the score and should only be called after the spacecraft lands. This method is shown here.
Calculating the Score
The PilotScore class implements the code necessary for the neural network to fly the spacecraft. This class also calculates the final score after the craft has landed. This class is shown in Listing 6.1.
(1). As you can see from the following line, the PilotScore class implements the CalculateScore interface which is used by both Encog simulated annealing and genetic algorithms to determine how effective a neural network is at solving the given problem. A low score could be either bad or good depending on the problem; (2). The CalculateScore interface requires two methods. This first method is named calculateNetworkScore. This method accepts a neural network and returns a double that represents the score of the network; (3). The second method returns a value to indicate if the score should be minimized. For this example we would like to maximize the score. As a result the shouldMinimize method returns false.
Flying the Spacecraft
This section shows how the neural network actually flies the spacecraft. The neural network will be fed environmental information such as fuel remaining, altitude and current velocity. The neural network will then output a single value that will indicate if the neural network wishes to thrust. The NeuralPilot class performs this flight.
The NeuralPilot constructor sets up the pilot to fly the spacecraft. The constructor is passed a network to fly the spacecraft, as well as a Boolean that indicates if telemetry should be tracked to the screen.
The neural pilot will have three input neurons and one output neuron. These three input neurons will communicate the following three fields to the neural network.
These three input fields will produce one output field that indicates if the neural pilot would like to fire the thrusters.
To normalize these three fields, define them as three NormalizedField objects. First, set up the fuel.
Next velocity and altitude are set up.
For this example, the primary purpose of flying the spacecraft is to receive a score. The scorePilot method calculates this score. It will simulate a flight from the point that the spacecraft is dropped from the orbiter to the point that it lands. The scorePilot method calculates this score:
We will now look at how to train the neural pilot.
Training the Neural Pilot
This example can train the neural pilot using either a genetic algorithm or simulated annealing. Encog treats both genetic algorithms and simulated annealing very similarly. On one hand, you can simply provide a training set and use simulated annealing or you can use a genetic algorithm just as in a propagation network. We will see an example of this later in the chapter as we apply these two techniques to the XOR problem. This will show how similar they can be to propagation training.
On the other hand, genetic algorithms and simulated annealing can do something that propagation training cannot. They can allow you to train without a training set. It is still supervised training since a scoring class is used, as developed earlier in this chapter. However, it still does not need to training data input. Rather, the neural network needs input on how good of a job it is doing. If you can provide this scoring function, simulated annealing or a genetic algorithm can train the neural network. Both methods will be discussed in the coming sections, beginning with a genetic algorithm.
What is a Genetic Algorithm
Genetic algorithms attempt to simulate Darwinian evolution to create a better neural network. The neural network is reduced to an array of double variables. This array becomes the genetic sequence. The genetic algorithm begins by creating a population of random neural networks. All neural networks in this population have the same structure, meaning they have the same number of neurons and layers. However, they all have different random weights.
These neural networks are sorted according their “scores.” Their scores are provided by the scoring method as discussed in the last section. In the case of the neural pilot, this score indicates how softly the ship landed. The top neural networks are selected to “breed.” The bottom neural networks “die.” When two networks breed, nature is simulated by splicing their DNA. In this case, splices are taken from the double array from each network and spliced together to create a new offspring neural network. The offspring neural networks take up the places vacated by the dying neural networks.
Some of the offspring will be “mutated.” That is, some of the genetic material will be random and not from either parent. This introduces needed variety into the gene pool and simulates the natural process of mutation. The population is sorted and the process begins again. Each iteration provides one cycle. As you can see, there is no need for a training set. All that is needed is an object to score each neural network. Of course you can use training sets by simply providing a scoring object that uses a training set to score each network.
Using a Genetic Algorithm
Using the genetic algorithm is very easy and uses the NeuralGeneticAlgorithm class to do this. The NeuralGeneticAlgorithm class implements the MLTrain interface. Therefore, once constructed, it is used in the same way as any other Encog training class. The following code creates a new NeuralGeneticAlgorithm to train the neural pilot.
The value of 500 specifies the population size. Larger populations will train better, but will take more memory and processing time. The 0.1 is used to mutate 10% of the offspring. The 0.25 value is used to choose the mating population from the top 25% of the population. Now that the trainer is set up, the neural network is trained just like any Encog training object. Here we only iterate 50 times. This is usually enough to produce a skilled neural pilot. Below is the code snippet on the training part:
What is Simulated Annealing
Simulated annealing can also be used to train the neural pilot. Simulated annealing is similar to a genetic algorithm in that it needs a scoring object. However, it works quite differently internally. Simulated annealing simulates the metallurgical process of annealing. Annealing is the process by which a very hot molten metal is slowly cooled. This slow cooling process causes the metal to produce a strong, consistent molecular structure. Annealing is a process that produces metals less likely to fracture or shatter.
A similar process can be performed on neural networks. To implement simulated annealing, the neural network is converted to an array of double values. This is exactly the same process as was done for the genetic algorithm.Randomness is used to simulate the heat and cooling effect. While the neural network is still really “hot,” the neural network’s existing weights increase in speed. As the network cools, this randomness slows down. Only changes that produce a positive effect on the network’s score are kept.
Using Simulated Annealing
To use simulated annealing to train the neural pilot, pass the argument anneal on the command line when running this example. It is very simple for the example to use annealing rather than a genetic algorithm. They both use the same scoring function and are interchangeable. The following lines of code make use of the simulated annealing algorithm for this example.
Using the Training Set Score Class
Training sets can also be used with genetic algorithms and simulated annealing. Used this way, simulated annealing and genetic algorithms are a little different than propagation training based on usage. There is no scoring function when used this way. You simply use the TrainingSetScore object, which takes the training set and uses it to score the neural network.
Generally resilient propagation will outperform genetic algorithms or simulated annealing when used in this way. Genetic algorithms or simulated annealing really excel when using a scoring method instead of a training set. Furthermore, simulated annealing can sometimes to push backpropagation out of a local minimum. The Hello World application could easily be modified to use a genetic algorithm or simulated annealing. To change the above example to use a genetic algorithm, a few lines must be added. The following lines create a training set-based genetic algorithm. First, create a TrainingSetScore object.