程式扎記: [ NNF For Java ] Using Image Data (Ch9)

Preface

• Processing Images
• Finding the Bounds
• Downsampling
• Using the Image Dataset

Using neural networks to recognize images is a very common task. This chapter will explore how to use images with Encog. By using the same feedforward neural networks as seen in earlier chapters, neural networks can be designed to recognize certain images. Specialized datasets ease the process of getting image data into the neural network.

This chapter will introduce the ImageMLDataSet. This class can accept a list of images that will be loaded and processed into an Encog-friendly form. The ImageMLDataSet is based upon the BasicMLDataSet, which is really just an array of double values for input and idea. The ImageMLDataSet simply adds special functions to load images into arrays of doubles.

There are several important issues to consider when loading image data into a neural network. The ImageMLDataSet takes care of two important aspects of this. The first aspect is detecting boundaries on what is to be recognized. The second is downsampling where images are usually formatted in high-resolution and must be downsampled to a consistent lower resolution to be fed to the neural network.

Finding the Bounds
An image is a rectangular region that represents the data important to the neural network. Only a part of the image may be useful. Ideally, the actual image the neural network must recognize is equal to the entire physical image - rather than just a portion of the original image. Such is the case with Figure 9.1.

As you can see in the above figure, the letter “X” was drawn over nearly the entire physical image. This image would require minimal, if any, boundary detection. Images will not always be so perfectly created. Consider the image presented in Figure 9.2.

Here the letter “X” is scaled differently than in the previous image and is also off-center. To properly recognize it, we must find the bounds of the second letter “X.” Figure 9.3 shows a bounding box around the letter “X.” Only data inside of the bounding box will be used to recognize the image.

As you can see, the bounds have been detected for the letter “X.” The bounding box signifies that only data inside of that box will be recognized. Now the “X” is in approximately the same orientation as Figure 9.1.

Downsampling an Image
Even with bounding boxes, images may not be consistently sized. The letter “X” in Figure 9.3 is considerably smaller than Figure 9.1. When recognizing the image, we will draw a grid over the image and line up each grid cell to an input neuron. To do this, the images must be consistently sized. Further, most images have too high a resolution to be used with a neural network. Downsampling solves both of these problems by reducing the image resolution and scaling all images to a consistent size. To see this in action, consider Figure 9.4. This figure shows the Encog logo at full resolution.

Figure 9.5 shows this same image downsampled.

Do you notice the grid-like pattern? It has been reduced to 32x32 pixels. These pixels would form the input to a neural network. This neural network would require 1,024 input neurons, if the network were to only look at the intensity of each square. Looking at the intensity limits the neural network to see in “black and white.” If you would like the neural network to seen in color, then it is necessary to provide red, green and blue (RGB) values for each of these pixels. This would mean three input neurons for each pixel, which would push the input neuron count to 3,072.

The Encog image dataset provides both boundary detection, as well as RGB and intensity downsampling. In the next section, the Encog image dataset will be introduced.

What to Do With the Output Neurons
The output neurons should represent the groups that these images will fall into. For example, if writing an OCR application, use one output neuron for every character to be recognized. Equilateral encoding is also useful in this respect, as discussed in Chapter 2 “Obtaining Data for Encog.” Supervised training also requires ideal output data for each image. For a simple OCR, there might be 26 output neurons, one for each letter of the alphabet. These ideal outputs train the neural network for what the image actually is. Whether training is supervised or unsupervised training, the output neurons will relay how the neural network interpreted each image.

Using the Encog Image Dataset
Before instantiating an ImageMLDataSet object, a downsampled object must be created. This object is a tool for Encog to use to perform the downsample. All Encog downsample objects must implement the interface Downsample. Encog currently supports two downsample classes, listed below:

• RGBDownsample
• SimpleIntensityDownsample

The SimpleIntensityDownsample does not take color into consideration. It simply calculates the brightness or darkness of a pixel. The number of input neurons will be height multiplied by the width, as there is only one input neuron needed per pixel. The RGBDownsample is more advanced than SimpleIntensityDownsample. This downsample object converts to the resolution that you specify and turns every pixel into a three-color (RGB) input. The total number of input neuron values produced by this object will be height times width times three. The following code instantiates a SimpleIntensityDownsample object. This object will be used to create the training set.

view plaincopy to clipboardprint?
Downsample downsample = new SimpleIntensityDownsample();  

Now that a downsample object is created, it is time to use an ImageMLDataSet class. It must be instantiated with several parameters. The following code does this.

view plaincopy to clipboardprint?
this.training = new ImageMLDataSet(this.downsample, false, 1, -1);  

The parameters 1 and -1 specify the range to which the colors will be normalized, either by the intensity color or the three individual RGB colors. The false value means that the dataset should not attempt to detect the edges. If this value were true, Encog would attempt to detect the edges.

The current Encog edge detection is not very advanced. It looks for one consistent color around the sides of an image and attempts to remove as much of that region as it can. More advanced edge detection will likely be built into future versions of Encog. If advanced edge detection is necessary, it is best to trim the images before sending them to the ImageMLDataSet object. Now that the ImageMLDataSet object has been created, it is time to add some images. To add images to this dataset, an ImageMLData object must be created for each image. The following lines of code will add one image from a file.
final MLData ideal = new BasicMLData(this.outputCount);

view plaincopy to clipboardprint?
final int idx = pair.getIdentity();  
for (int i = 0; i < this.outputCount; i++) {  
    if (i == idx) {  
        ideal.setData(i, 1);  
    } else {  
        ideal.setData(i, -1);  
    }  
}  
  
final Image img = ImageIO.read(pair.getFile());  
final ImageMLData data = new ImageMLData(img);  
this.training.add(data, ideal);  

The image is loaded from a file using the Java ImageIO class, which reads images from files. Any valid Java image object can be used by the dataset. The ideal output should be specified when using supervised training. With unsupervised training, this parameter can be omitted. Once the ImageMLData object is instantiated, it is added to the dataset. These steps are repeated for every image to be added.

Once all of the images are loaded, they are ready to be downsampled. To downsample the images call the downsample method.

view plaincopy to clipboardprint?
this.training.downsample(this.downsampleHeight, this.downsampleWidth);  

Specify the downsample height and width. All of the images will be downsampled to this size. After calling the downsample method, the training data is generated and can train a neural network.

Image Recognition Example
We will now see how to tie all Encog image classes together into an example. A generic image recognition program will serve as an example and could easily become the foundation of a much more complex image recognition program. This example is driven from a script file. Listing 9.1 shows the type of script file that might drive this program.

The syntax used by this script file is very simple. There is a command, followed by a colon. This command is followed by a comma-separated list of parameters. Each parameter is a name-value pair that is also separated by a colon. There are five commands in all: CreateTraining, Input, Network, Train and WhatIs.

The CreateTraining command creates a new training set. To do so, specify the downsample height, width, and type - either RGB or Brightness; The Input command inputs a new image for training. Each input command
specifies the image as well as the identity of the image. Multiple images can have the same identity. For example, the above script could have provided a second image of a dime by causing the second Input command to also have the identity of “dime."; The Network command creates a new neural network for training and recognition. Two parameters specify the size of the first and second hidden layers. If you do not wish to have a second hidden layer, specify zero for the hidden2 parameter; The Train command trains the neural network and mode specifies either console or GUI training. The minutes parameter specifies how many minutes are required to train the network. This parameter is only used with console training; for GUI training this parameter should be set to zero. The strategy tells the training algorithm how many cycles to wait to reset the neural network if the error level has not dropped below the specified amount; The WhatIs command accepts an image and tries to recognize it. The example will print the identity of the image that it thought was most similar.

We will now take a look at the image recognition example. This example can be found at ImageNeuralNetwork. Some of the code in the above example deals with parsing the script file and arguments. Because string parsing is not really the focus of this book, we will focus on how each command is carried out and how the neural network is constructed. The next sections discuss how each of these commands is implemented.

Creating the Training Set
The CreateTraining command is implemented by the processCreateTraining method. This method is shown here.

view plaincopy to clipboardprint?
private void processCreateTraining() {  
    // (1)  
    final String strWidth = getArg("width");  
    final String strHeight = getArg("height");  
    final String strType = getArg("type");  
  
    // (2)  
    this.downsampleHeight = Integer.parseInt(strHeight);  
    this.downsampleWidth = Integer.parseInt(strWidth);  
  
    // (3)  
    if (strType.equals("RGB")) {  
        this.downsample = new RGBDownsample();  
    } else {  
        this.downsample = new SimpleIntensityDownsample();  
    }  
  
    // (4)  
    this.training = new ImageMLDataSet(this.downsample, false, 1, -1);  
    System.out.println("Training set created");  
}  

(1). The CreateTraining command takes three parameters. The following lines read these parameters; (2). The width and height parameters are both integers and need to be parsed; (3). We must now create the downsample object. If the mode is RGB, use RGBDownsample. Otherwise, use SimpleIntensityDownsample; (4). The ImageMLDataSet can now be created.

Now that the training set is created, we can input images. The next section describes how this is done.

Inputting an Image
The Input command is implemented by the processInput method. This method is shown here.

view plaincopy to clipboardprint?
private void processInput() throws IOException {  
    // (1)  
    final String image = getArg("image");  
    final String identity = getArg("identity");  
  
    // (2)  
    final int idx = assignIdentity(identity);  
    final File file = new File(image);  
  
    // (3)  
    this.imageList.add(new ImagePair(file, idx));  
  
    // (4)  
    System.out.println("Added input image:" + image);  
}  

(1). The Input command takes two parameters. The following lines read these parameters; (2). The identity is a text string that represents what the image is. We track the number of unique identities and assign an increasing number to each. These unique identities will form the neural network’s output layer. Each unique identity will be assigned an output neuron. When images are presented to the neural network later, the output neuron with the highest output will represent the image identity to the network. The assignIdentity method is a simple method that assigns this increasing count and maps the identity strings to their neuron index; (3). A File object is created to hold the image. This will later be used to also read the image. At this point we do not wish to actually load the individual images. We will simply make note of the image by saving an ImagePair object. The ImagePair object links the image to its output neuron index number. The ImagePair class is not built into Encog. Rather, it is a structure used by this example to map the images; (4). Finally, we display a message that tells us that the image has been added.

Once all the images are added, the number of output neurons is apparent and we can create the actual neural network. Creating the neural network is explained in the next section.

Creating the Network
The Network command is implemented by the processNetwork method. This method is shown here.

view plaincopy to clipboardprint?
private void processNetwork() throws IOException {  
    System.out.println("Downsampling images...");  
    // (1)  
    for (final ImagePair pair : this.imageList) {  
        // (2)  
        final MLData ideal = new BasicMLData(this.outputCount);  
        // (3)  
        final int idx = pair.getIdentity();  
        for (int i = 0; i < this.outputCount; i++) {  
            if (i == idx) {  
                ideal.setData(i, 1);  
            } else {  
                ideal.setData(i, -1);  
            }  
        }  
        // (4)  
        final Image img = ImageIO.read(pair.getFile());  
        // (5)  
        final ImageMLData data = new ImageMLData(img);  
        this.training.add(data, ideal);  
    }  
    // (6)  
    final String strHidden1 = getArg("hidden1");  
    final String strHidden2 = getArg("hidden2");  
    final int hidden1 = Integer.parseInt(strHidden1);  
    final int hidden2 = Integer.parseInt(strHidden2);  
    // (7)  
    this.training.downsample(this.downsampleHeight, this.downsampleWidth);  
    // (8)  
    this.network = EncogUtility.simpleFeedForward(this.training  
            .getInputSize(), hidden1, hidden2,  
            this.training.getIdealSize(), true);  
    // (9)  
    System.out.println("Created network: " + this.network.toString());  
}  

(1). Begin by downsampling the images. Loop over every ImagePair previously created; (2). Create a new BasicMLData to hold the ideal output for each output neuron; (3). The output neuron that corresponds to the identity of the image currently being trained will be set to 1. All other output neurons will be set to -1; (4). The input data for this training set item will be the downsampled image. First, load the image into a Java Image object; (5). Create an ImageMLData object to hold this image and add it to the training set; (6). There are two parameters provided to the Network command that specify the number of neurons in each of the two hidden layers. If the second hidden layer has no neurons, there is a single hidden layer; (7). We are now ready to downsample all of the images; (8). Finally, the new neural network is created according to the specified parameters. The final true parameter specifies that we would like to use a hyperbolic tangent activation function; (9). Once the network is created, report its completion by printing a message.

Now that the network has been created, it can be trained. Training is handled in the next section.

Training the Network
The Train command is implemented by the processTrain method. This method is shown here.

view plaincopy to clipboardprint?
private void processTrain() throws IOException {  
    // (1)  
    final String strMode = getArg("mode");  
    final String strMinutes = getArg("minutes");  
    final String strStrategyError = getArg("strategyerror");  
    final String strStrategyCycles = getArg("strategycycles");  
    // (2)  
    System.out.println("Training Beginning... Output patterns="  
            + this.outputCount);  
    // (3)  
    final double strategyError = Double.parseDouble(strStrategyError);  
    final int strategyCycles = Integer.parseInt(strStrategyCycles);  
  
    // (4)  
    final ResilientPropagation train = new ResilientPropagation(this.network, this.training);  
    train.addStrategy(new ResetStrategy(strategyError, strategyCycles));  
  
    // (5)  
    if (strMode.equalsIgnoreCase("gui")) {  
        TrainingDialog.trainDialog(train, this.network, this.training);  
    } else {  
        final int minutes = Integer.parseInt(strMinutes);  
        EncogUtility.trainConsole(train, this.network, this.training,  
                minutes);  
    }  
    // (6)  
    System.out.println("Training Stopped...");  
}  

(1). The Train command takes four parameters. The following lines read these parameters; (2). Once the parameters are read, display a message stating that training has begun; (3). Parse the two strategy parameters; (4). The neural network is initialized to random weight and threshold values. Sometimes the random set of weights and thresholds will cause the neural network training to stagnate. In this situation, reset a new set of random values and begin training again. Training is initiated by creating a new ResilientPropagation trainer. RPROP training was covered in Chapter 5 “Propagation Training.”

Encog allows training strategies to be added to handle situations such as this. One particularly useful training strategy is the ResetStrategy, which takes two parameters. The first states the minimum error that the network must achieve before it will be automatically reset to new random values. The second parameter specifies the number of cycles that the network is allowed to achieve this error rate. If the specified number of cycles is reached and the network is not at the required error rate, the weights and thresholds will be randomized. Encog supports a number of different training strategies. Training strategies enhance whatever training method in use. They allow minor adjustments as training progresses. Encog supports the following strategies:

• Greedy
• HybridStrategy
• ResetStrategy
• SmartLearningRate
• SmartMomentum
• StopTrainingStrategy

The Greedy strategy only allows a training iteration to save its weight and threshold changes if the error rate was improved. The HybridStrategy allows a backup training method to be used if the primary training method stagnates. The hybrid strategy was explained in Chapter 7 “Other Neural Network Types.” The ResetStrategy resets the network if it stagnates. The SmartLearningRate and SmartMomentum strategies are used with backpropagation training to attempt to automatically adjust momentum and learning rate. The StopTrainingStrategy stops training if it has reached a certain level.

(5). If we are truing using the GUI, then we must use trainDialog, otherwise we should use trainConsole; (6). The program will indicate that training has stopped by displaying a message. The training process stops when it is canceled by the dialog or, in the case of GUI mode, has been canceled.

Once the neural network is trained, it is ready to recognize images. This is discussed in the next section.

Recognizing Images
The WhatIs command is implemented by the processWhatIs method. This method is shown here.

view plaincopy to clipboardprint?
public void processWhatIs() throws IOException {  
    // (1)  
    final String filename = getArg("image");  
    // (2)  
    final File file = new File(filename);  
    final Image img = ImageIO.read(file);  
    final ImageMLData input = new ImageMLData(img);  
    // (3)  
    input.downsample(this.downsample, false, this.downsampleHeight, this.downsampleWidth, 1, -1);  
    // (4)  
    final int winner = this.network.winner(input);  
    // (5)  
    System.out.println("What is: " + filename + ", it seems to be: " + this.neuron2identity.get(winner));  
}  

(1). The WhatIs command takes one parameter. The following lines read this parameter; (2). The image is then loaded into an ImageMLData object; (3). The image is downsampled to the correct dimensions; (4). The downsampled image is presented to the neural network, which chooses the “winner” neuron. The winning neuron is the neuron with the greatest output for the pattern that was presented. This is simple “one-of” normalization as discussed in Chapter 2. Chapter 2 also introduced equilateral normalization, which could also be used; (5). Finally, we display the pattern recognized by the neural network.

This example demonstrated a simple script-based image recognition program. This application could easily be used as the starting point for other more advanced image recognition applications. One very useful extension to this application may be the ability to load and save the trained neural network.

Supplement
* NNF For Java Ch8 - Using Temporal Data

程式扎記

標籤

2016年12月28日星期三

[ NNF For Java ] Using Image Data (Ch9)

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2016年12月28日 星期三