程式扎記

Preface
This chapter will show how to construct feedforward and simple recurrent neural networks with Encog and how to save these neural networks for later use. Both of these neural network types are created using the BasicNetwork and BasicLayer classes. In addition to these two classes, activation functions are also used. The role of activation functions will be discussed as well.

• Constructing a Neural Network
• Activation Functions
• Encog Persistence
• Using the Encog Analyst from Code

Neural networks can take a considerable amount of time to train. Because of this it is important to save your neural networks. Encog neural networks can be persisted using Java’s built-in serialization. This persistence can also be achieved by writing the neural network to an EG file, a cross-platform text file. This chapter will introduce both forms of persistence; In the last chapter, the Encog Analyst was used to automatically normalize data. The Encog Analyst can also automatically create neural networks based on CSV data. This chapter will show how to use the Encog analyst to create neural networks from code.

Constructing a Neural Network
A simple neural network can quickly be created using BasicLayer and BasicNetwork objects. The following code creates several BasicLayer objects with a default hyperbolic tangent activation function.

view plaincopy to clipboardprint?
import org.encog.neural.networks.BasicNetwork  
import org.encog.neural.networks.layers.BasicLayer  
  
BasicNetwork network = new BasicNetwork();  
network.addLayer (new BasicLayer(2)) ;  
network.addLayer (new BasicLayer(3)) ;  
network.addLayer(new BasicLayer(1)) ;  
network.getStructure().finalizeStructure();  
network.reset()  

This network will have an input layer of two neurons, a hidden layer with three neurons and an output layer with a single neuron. To use an activation function other than the hyperbolic tangent function, use code similar to the following:

view plaincopy to clipboardprint?
import org.encog.engine.network.activation.ActivationSigmoid  
import org.encog.neural.networks.BasicNetwork  
import org.encog.neural.networks.layers.BasicLayer  
  
BasicNetwork network = new BasicNetwork();  
// null:No activation function  
// true:Has bias  
// 2: Two neuron  
network.addLayer (null, true, new BasicLayer(2)) ;   
network.addLayer (new BasicLayer(new ActivationSigmoid(), true, 3)) ;  
network.addLayer(new BasicLayer(new ActivationSigmoid(), false, 1)) ;  
network.getStructure().finalizeStructure();  
network.reset()  

The sigmoid activation function is passed to the addLayer calls for the hidden and output layer. The true value that was also introduced specifies that the BasicLayer should have a bias neuron. The output layer does not have bias neurons, and the input layer does not have an activation function. This is because the bias neuron affects the next layer, and the activation function affects data coming from the previous layer.

Unless Encog is being used for something very experimental, always use a bias neuron. Bias neurons allow the activation function to shift off the origin of zero. This allows the neural network to produce a zero value even when the inputs are not zero. The following URL provides a more mathematical justification for the importance of bias neurons: http://www.heatonresearch.com/wiki/Bias

Activation functions are attached to layers and used to scale data output from a layer. Encog applies a layer’s activation function to the data that the layer is about to output. If an activation function is not specified for BasicLayer, the hyperbolic tangent activation will be defaulted. It is also possible to create context layers. A context layer can be used to create an Elman or Jordan style neural networks (RNN). The following code could be used to create an Elman neural network.

view plaincopy to clipboardprint?
BasicLayer input, hidden  
BasicNetwork network = new BasicNetwork()  
network.addLayer(input = new BasicLayer(1))  
network.addLayer(hidden = new BasicLayer(2))  
network.addLayer(new BasicLayer(1))  
input.setContextFedBy(hidden)  
hidden.setContextFedBy()  
network.getStructure().finalizeStructure();  
network.reset()  

Notice the hidden.setContextFedBy line? This creates a context link from the output layer to the hidden layer. The hidden layer will always be fed the output from the last iteration. This creates an Elman style neural network. Elman and Jordan networks will be introduced in Chapter 7.

The Role of Activation Functions
The last section illustrated how to assign activation functions to layers. Activation functions are used by many neural network architectures to scale the output from layers. Encog provides many different activation functions that can be used to construct neural networks. The next sections will introduce these activation functions. Activation functions are attached to layers and are used to scale data output from a layer. Encog applies a layer’s activation function to the data that the layer is about to output. If an activation function is not specified for BasicLayer, the hyperbolic tangent activation will be the defaulted. All classes that serve as activation functions must implement the ActivationFunction interface.

Activation functions play a very important role in training neural networks. Propagation training, which will be covered in the next chapter, requires than an activation function have a valid derivative. Not all activation functions have valid derivatives. Determining if an activation function has a derivative may be an important factor in choosing an activation function.

Encog Activation Functions
The next sections will explain each of the activation functions supported by Encog. There are several factors to consider when choosing an activation function. Firstly, it is important to consider how the type of neural network being used dictates the activation function required. Secondly, consider the necessity of training the neural network using propagation. Propagation training requires an activation function that provides a derivative. Finally, consider the range of numbers to be used. Some activation functions deal with only positive numbers or numbers in a particular range.

ActivationBiPolar
The ActivationBiPolar activation function is used with neural networks that require bipolar values. Bipolar values are either true or false. A true value is represented by a bipolar value of 1; a false value is represented by a bipolar value of -1. The bipolar activation function ensures that any numbers passed to it are either -1 or 1. The ActivationBiPolar function does this with the following code:

view plaincopy to clipboardprint?
i f (d [ i ] > 0) {  
  d[ i ] = 1 ;  
} else {  
  d[ i ] = −1;  
}  

As shown above, the output from this activation is limited to either -1 or 1. This sort of activation function is used with neural networks that require bipolar output from one layer to the next. There is no derivative function for bipolar, so this activation function cannot be used with propagation training.

Activation Competitive
The ActivationCompetitive function is used to force only a select group of neurons to win. The winner is the group of neurons with the highest output. The outputs of each of these neurons are held in the array passed to this function. The size of the winning neuron group is definable. The function will first determine the winners. All non-winning neurons will be set to zero. The winners will all have the same value, which is an even division of the sum of the winning outputs. This function begins by creating an array that will track whether each neuron has already been selected as one of the winners. The number of winners is also counted.

view plaincopy to clipboardprint?
final boolean[] winners = new boolean[x.length]  
double sumWinners = 0  

Let's check the code snippet of this activation function to know how it works:

view plaincopy to clipboardprint?
// find the desired number of winners  
for(int i=0; i0]; i++)  // (1)  
{  
    double maxFound = Double.NEGATIVE_INFINITY;  
    int winner = -1;  
    for(int j=start; j// (2)  
    {  
        if(!winners[j] && (x[j] > maxFound))  // (3)  
        {  
            winner = j;  
            maxFound = x[j];  
        }  
    }  
        // (4)  
    sumWinners += maxFound;  
    winners[winner] = true;  
}  
  
// adjust weights for winners and non-winners (5)  
for(int i=start; i
{  
    if(winners[i])  
    {  
        x[i] = x[i] / sumWinners;  
    }  
    else  
    {  
        x[i] = 0.0;   
    }  
}  

(1). First, loop maxWinners a number of times to find that number of winners; (2) Now, one winner must be determined. Loop over all of the neuron outputs and find the one with the highest output; (3) If this neuron has not already won and it has the maximum output, it might be a winner if no other neuron has a higher activation. (4). Keep the sum of the winners that were found and mark this neuron as a winner. Marking it a winner will prevent it from being chosen again. The sum of the winning outputs will ultimately be divided among the winners. (5). Now that the correct number of winners is determined, the values must be adjusted for winners and non-winners. The non-winners will all be set to zero. The winners will share the sum of the values held by all winners.

This sort of an activation function can be used with competitive, learning neural networks such as the self-organizing map. This activation function has no derivative, so it cannot be used with propagation training.

ActivationLinear
The ActivationLinear function is really no activation function at all. It simply implements the linear function. The linear function as Equation 4.1: f(x) = x. The graph of the linear function is a simple line, as seen in Figure 4.1.

The Java implementation for the linear activation function is very simple. It does nothing. The input is returned as it was passed.

view plaincopy to clipboardprint?
public final void activationFunction( final double [ ] x , final int start , final int size ) {}  

The linear function is used primarily for specific types of neural networks that have no activation function, such as the self-organizing map. The linear activation function has a constant derivative of one, so it can be used with propagation training. Linear layers are sometimes used by the output layer of a propagation-trained feedforward neural network.

ActivationLOG
The ActivationLog activation function uses an algorithm based on the log function. The following shows how this activation function is calculated.

This produces a curve similar to the hyperbolic tangent activation function, which will be discussed later in this chapter. The graph for the logarithmic activation function is shown in Figure 4.2.

The logarithmic activation function can be useful to prevent saturation. A hidden node of a neural network is considered saturated when, on a given set of inputs, the output is approximately 1 or -1 in most cases. This can slow training significantly. This makes the logarithmic activation function a possible choice when training is not successful using the hyperbolic tangent activation function.

As illustrated in Figure 4.2, the logarithmic activation function spans both positive and negative numbers. This means it can be used with neural networks where negative number output is desired. Some activation functions, such as the sigmoid activation function will only produce positive output. The logarithmic activation function does have a derivative, so it can be used with propagation training.

ActivationSigmoid
The ActivationSigmoid activation function should only be used when positive number output is expected because the ActivationSigmoid function will only produce positive output. The equation for the ActivationSigmoid function can be seen in Equation 4.3.

The ActivationSigmoid function will move negative numbers into the positive range. This can be seen in Figure 4.3, which shows the graph of the sigmoid function.

The ActivationSigmoid function is a very common choice for feedforward and simple recurrent neural networks. However, it is imperative that the training data does not expect negative output numbers. If negative numbers are required, the hyperbolic tangent activation function may be a better solution.

ActivationSoftMax
The ActivationSoftMax activation function will scale all of the input values so that the sum will equal one. The ActivationSoftMax activation function is sometimes used as a hidden layer activation function. The activation function begins by summing the natural exponent of all of the neuron outputs.

view plaincopy to clipboardprint?
double sum = 0 ;  
for ( int i = 0 ; i < d.length ; i++) {  
    d [ i ] = BoundMath.exp (d [ i ] ) ;  
    sum += d [ i ] ;  
}  

The output from each of the neurons is then scaled according to this sum. This produces outputs that will sum to 1.

view plaincopy to clipboardprint?
for ( int i = start ; i < start + size ; i++) {  
    x [ i ] = x [ i ] / sum;  
}  

The ActivationSoftMax is typically used in the output layer of a neural network for classification.

ActivationTANH
The ActivationTANH activation function uses the hyperbolic tangent function. The hyperbolic tangent activation function is probably the most commonly used activation function as it works with both negative and positive numbers. The hyperbolic tangent function is the default activation function for Encog. The equation for the hyperbolic tangent activation function can be seen in Equation 4.4.

The fact that the hyperbolic tangent activation function accepts both positive and negative numbers can be seen in Figure 4.4, which shows the graph of the hyperbolic tangent function.

The hyperbolic tangent function is a very common choice for feedforward and simple recurrent neural networks. The hyperbolic tangent function has a derivative so it can be used with propagation training.

Encog Persistence
It can take considerable time to train a neural network and it is important to take measures to guarantee your work is saved once the network has been trained. Encog provides several means for this data to be saved, with two primary ways to store Encog data objects. Encog offers file-based persistence or Java’s own persistence.

Java provides its own means to serialize objects and is called Java serialization. Java serialization allows many different object types to be written to a stream, such as a disk file. Java serialization for Encog works the same way as with any Java object using Java serialization. Every important Encog object that should support serialization implements the Serializable interface. Java serialization is a quick way to store an Encog object. However, it has some important limitations. The files created with Java serialization can only be used by Encog for Java; they will be incompatible with Encog for .Net or Encog for Silverlight. Further, Java serialization is directly tied to the underlying objects. As a result, future versions of Encog may not be compatible with your serialized files.

To create universal files that will work with all Encog platforms, consider the Encog EG format. The EG format stores neural networks as flat text files ending in the extension .EG. This chapter will introduce both methods of Encog persistence, beginning with Encog EG persistence. The chapter will end by exploring how a neural network is saved in an Encog persistence file.

Using Encog EG Persistence
Encog EG persistence files are the native file format for Encog and are stored with the extension .EG. The Encog Workbench uses the Encog EG to process files. This format can be exchanged over different operating systems and Encog platforms, making it the choice format choice for an Encog application. This section begins by looking at an XOR example that makes use of Encog’s EG files. Later, this same example will be used for Java serialization.

Using Encog EG Persistence
Encog EG persistence is very easy to use. The EncogDirectoryPersistence class is used to load and save objects from an Encog EG file. The following is a good example of Encog EG persistence.

This example is made up of two primary methods. The first method, trainAndSave, trains a neural network and then saves it to an Encog EG file. The second method, loadAndEvaluate, loads the Encog EG file and evaluates it. This proves that the Encog EG file was saved correctly. The main method simply calls these two in sequence. We will begin by examining the trainAndSave method.

view plaincopy to clipboardprint?
import org.encog.ml.data.MLDataSet  
import org.encog.ml.data.basic.BasicMLDataSet  
import org.encog.ml.train.MLTrain  
import org.encog.neural.networks.BasicNetwork  
import org.encog.neural.networks.layers.BasicLayer  
import org.encog.neural.networks.training.propagation.resilient.ResilientPropagation  
import org.encog.persist.EncogDirectoryPersistence  
  
public class XOR{  
    def static FILENAME = "xor_network.eg"  
    def static INPUT = new double[4][2]  
    def static IDEAL = new double[4][1]  
    static{  
        INPUT[0][0] = 0; INPUT[0][1] = 0; IDEAL[0][0] = 0  
        INPUT[1][0] = 0; INPUT[1][1] = 1; IDEAL[1][0] = 1  
        INPUT[2][0] = 1; INPUT[2][1] = 0; IDEAL[2][0] = 1  
        INPUT[3][0] = 1; INPUT[3][1] = 1; IDEAL[3][0] = 0  
    }  
}  
  
  
def void trainAndSave()  
{  
    // (1)  
    printf("Training XOR network to under 1%% error rate.\b")  
    BasicNetwork network = new BasicNetwork()  
    network.addLayer(new BasicLayer(2))  
    network.addLayer(new BasicLayer(6))  
    network.addLayer(new BasicLayer(1))  
    network.getStructure().finalizeStructure()  
    network.reset()  
      
    // (2)  
    MLDataSet trainingSet = new BasicMLDataSet(XOR.INPUT, XOR.IDEAL)  
      
    // (3)  
    final MLTrain train = new ResilientPropagation(network, trainingSet)  
      
    // (4)  
    while(true)  
    {  
        train.iteration()  
        if(train.getError() < 0.009) break  
    }  
      
    // (5)  
    double e = network.calculateError(trainingSet)  
    printf("Network trained to error=%.03f\n", e)  
      
    // (6)  
    printf("Saving network\n")  
    EncogDirectoryPersistence.saveObject(new File(XOR.FILENAME), network);  
}  

(1). This method begins by creating a basic neural network to be trained with the XOR operator. It is a simple three-layer feedforward neural network; (2). A training set is created that contains the expected outputs and inputs for the XOR operator; (3). This neural network will be trained using resilient propagation (RPROP); (4). RPROP iterations are performed until the error rate is very small. Training will be covered in the next chapter. For now, training is a means to verify that the error remains the same after a network reload; (5). Once the network has been trained, display the final error rate. The neural network can now be saved; (6). The network can now be saved to a file. Only one Encog object is saved per file. This is done using the saveObject method of the EncogDirectoryPersistence class.

Now that the Encog EG file has been created, load the neural network back from the file to ensure it still performs well using the loadAndEvaluate method.

view plaincopy to clipboardprint?
def void loadAndEvaluate()  
{  
    // (1)  
    printf("Loading network\n")  
    BasicNetwork network = (BasicNetwork)EncogDirectoryPersistence.loadObject(new File(XOR.FILENAME))  
      
    // (2)  
    MLDataSet trainingSet = new BasicMLDataSet(XOR.INPUT, XOR.IDEAL)  
      
    // (3)  
    double e = network.calculateError(trainingSet)  
    printf("Loaded network's error is (should be same as above): %.03f\n", e)  
}  

(1)/(2). Now that the collection has been constructed, load the network named network that was saved earlier. It is important to evaluate the neural network to prove that it is still trained. To do this, create a training set for the XOR operator; (3). Calculate the error for the given training data and this error is displayed and should be the same as before the network was saved.

Using Java Serialization
It is also possible to use standard Java serialization with Encog neural networks and training sets. Encog EG persistence is much more flexible than Java serialization. However, there are cases a neural network can simply be saved to a platform-dependant binary file. This example shows how to use Java serialization with Encog. The example begins by calling the trainAndSave method (The only difference will at step 6).

view plaincopy to clipboardprint?
def void trainAndSave()  
{  
    // (1)  
    printf("Training XOR network to under 1%% error rate.\b")  
        ...  
    // (6)  
    printf("Saving network\n")  
    //EncogDirectoryPersistence.saveObject(new File(XOR.FILENAME), network);  
    SerializeObject.save(new File(XOR.SER_OBJ_NAME), network)  
}  

Regular Java Serialization code can be used to save the network or the SerializeObject class can be used. This utility class provides a save method that will write any single serializable object to a binary file. Now that the binary serialization file is created, load the neural network back from the file to see if it still performs well. This is performed by the loadAndEvaluate method (The only difference will at step 1).

view plaincopy to clipboardprint?
def void loadAndEvaluate()  
{  
    // (1)  
    printf("Loading network\n")  
    //BasicNetwork network = (BasicNetwork)EncogDirectoryPersistence.loadObject(new File(XOR.FILENAME))  
    BasicNetwork network = (BasicNetwork)SerializeObject.load(new File(XOR.SER_OBJ_NAME))  
        ...  
}  

程式扎記

標籤

2016年12月11日星期日

[ NNF For Java ] Constructing Neural Networks in Java (Ch4)

1 則留言:

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2016年12月11日 星期日