程式扎記: [ NNF For Java ] Introduction to Neural Networks for Java (Ch1)

標籤

2016年12月1日 星期四

[ NNF For Java ] Introduction to Neural Networks for Java (Ch1)


Source From Here 
When to use Neural Networks 
With neural networks defined, it must be determined when or when not to use them. Knowing when not to use something is just as important as knowing how to use it. To understand these objectives, we will identify what sort of problems Encog is adept at solving. A significant goal of this book is explain how to construct Encog neural networks and when to use them. Neural network programmers must understand which problems are well-suited for neural network solutions and which are not. An effective neural network programmer also knows which neural network structure, if any, is most applicable to a given problem. This section begins by identifying which problems that are not conducive to a neural network solution. 

Problems Not Suited to a Neural Network Solution 
Programs that are easily written as flowcharts are not ideal applications for neural networks. If your program consists of well-defined steps, normal programming techniques will suffice. Another criterion to consider is whether program logic is likely to change. One of the primary features of neural networks is the ability to learn. If the algorithm used to solve your problem is an unchanging business rule, there is no reason to use a neural network. In fact, a neural network might be detrimental to your application if it attempts to find a better solution and begins to diverge from the desired process. Unexpected results will likely occur. 

Finally, neural networks are often not suitable for problems that require a clearly traceable path to solution. A neural network can be very useful for solving the problem for which it was trained, but cannot explain its reasoning. The neural network knows something because it was trained to know it. However, a neural network cannot explain the series of steps followed to derive the answer. 

Problems Suited to a Neural Network 
Although there are many problems for which neural networks are not well suited, there are also many problems for which a neural network solution is quite useful. In addition, neural networks can often solve problems with fewer lines of code than traditional programming algorithms. It is important to understand which problems call for a neural network approach. Neural networks are particularly useful for solving problems that cannot be expressed as a series of steps. This may include recognizing patterns, classification, series prediction and data mining. 

Pattern recognition is perhaps the most common use for neural networks. For this type of problem, the neural network is presented a pattern in the form of an image, a sound or other data. The neural network then attempts to determine if the input data matches a pattern that it has been trained to recognize. The remainder of this textbook will examine many examples of how to use neural networks to recognize patterns. 

Classification is a process that is closely related to pattern recognition. A neural network trained for classification is designed to classify input samples into groups. These groups may be fuzzy and lack clearly-defined boundaries. Alternatively, these groups may have quite rigid boundaries. 

Source 
Data Classification 
Classification attempts to determine what class the input data falls into. Classification is usually a supervised training operation, meaning the user provides data and expected results to the neural network. For data classification, the expected result is identification of the data class. Supervised neural networks are always trained with known data. During training, the networks are evaluated on how well they classify known data. The hope is that the neural network, once trained, will be able to classify unknown data as well. 

Fisher’s Iris Dataset is an example of classification. This is a dataset that contains measurements of Iris flowers. This is one of the most famous datasets and is often used to evaluate machine learning methods. The full dataset is available at the following URL (http://www.heatonresearch.com/wiki/Iris Data Set). Below is small sampling from the Iris data set. 
  1. ” Sepal Length ” , ” Sepal Width” , ” Petal Length ” , ” Petal Width” , ” Species”  
  2. 5 . 1 , 3 . 5 , 1 . 4 , 0 . 2 , ” s e t o s a ”  
  3. 4 . 9 , 3 . 0 , 1 . 4 , 0 . 2 , ” s e t o s a ”  
  4. 4 . 7 , 3 . 2 , 1 . 3 , 0 . 2 , ” s e t o s a ”  
  5. . . .  
  6. 7 . 0 , 3 . 2 , 4 . 7 , 1 . 4 , ” v e r s i c o l o r ”  
  7. 6 . 4 , 3 . 2 , 4 . 5 , 1 . 5 , ” v e r s i c o l o r ”  
  8. 6 . 9 , 3 . 1 , 4 . 9 , 1 . 5 , ” v e r s i c o l o r ”  
  9. . . .  
  10. 6 . 3 , 3 . 3 , 6 . 0 , 2 . 5 , ” v i r g i n i c a ”  
  11. 5 . 8 , 2 . 7 , 5 . 1 , 1 . 9 , ” v i r g i n i c a ”  
  12. 7 . 1 , 3 . 0 , 5 . 9 , 2 . 1 , ” v i r g i n i c a ”  
For classification, the neural network is instructed that, given the sepal length- /width and the petal length/width, the species of the flower can be determined. The species is the class. A class is usually a non-numeric data attribute and as such, membership in the class must be well-defined. For the Iris data set, there are three different types of Iris. If a neural network is trained on three types of Iris, it cannot be expected to identify a rose. All members of the class must be known at the time of training. 

Regression Analysis 
In the last section, we learned how to use data to classify data. Often the desired output is not simply a class, but a number. Consider the calculation of an automobile’s miles per gallon (MPG). Provided data such as the engine size and car weight, the MPG for the specified car may be calculated. Consider the following sample data for five cars: 
  1. ”mpg” , ” c y l i n d e r s ” , ” displacement ” , ” horsepower ” , ”weight ” , ”a c c e l e r a t i o n ” , ”model year ” , ” o r i g i n ” , ” car name”  
  2. 1 8 . 0 , 8 , 3 0 7 . 0 , 1 3 0 . 0 , 3 5 0 4 . , 1 2 . 0 , 7 0 , 1 , ” c h e v r o l e t c h e v e l l e malibu ”  
  3. 1 5 . 0 , 8 , 3 5 0 . 0 , 1 6 5 . 0 , 3 6 9 3 . , 1 1 . 5 , 7 0 , 1 , ” buick s ky l a rk 320 ”  
  4. 1 8 . 0 , 8 , 3 1 8 . 0 , 1 5 0 . 0 , 3 4 3 6 . , 1 1 . 0 , 7 0 , 1 , ”plymouth s a t e l l i t e ”  
  5. 1 6 . 0 , 8 , 3 0 4 . 0 , 1 5 0 . 0 , 3 4 3 3 . , 1 2 . 0 , 7 0 , 1 , ”amc r e b e l s s t ”  
  6. 1 7 . 0 , 8 , 3 0 2 . 0 , 1 4 0 . 0 , 3 4 4 9 . , 1 0 . 5 , 7 0 , 1 , ” f o rd t o r i n o ”  
  7. . . .  
For more information, the entirety of this dataset may be found at: http://www.heatonresearch.com/wiki/MPG_Data_Set 

The idea of regression is to train the neural network with input data about the car. However, using regression, the network will not produce a class. The neural network is expected to provide the miles per gallon that the specified car would likely get. It is also important to note that not use every piece of data in the above file will be used. The columns “car name” and “origin” are not used. The name of a car has nothing to do with its fuel efficiency and is therefore excluded. Likewise the origin does not contribute to this equation. The origin is a numeric value that specifies what geographic region the car was produced in. While some regions do focus on fuel efficiency, this piece of data is far too broad to be useful. 

Clustering 
Another common type of analysis is clustering. Unlike the previous two analysis types, clustering is typically unsupervised. Either of the datasets from the previous two sections could be used for clustering. The difference is that clustering analysis would not require the user to provide the species in the case of the Iris dataset, or the MPG number for the MPG dataset. The clustering algorithm is expected to place the data elements into clusters that correspond to the species or MPG. 

For clustering, the machine learning method simply looks at the data and attempts to place that data into a number of clusters. The number of clusters expected must be defined ahead of time. If the number of clusters changes, the clustering machine learning method will need to be retrained. Clustering is very similar to classification, with its output being a cluster, which is similar to a class. However, clustering differs from regression as it does not provide a number. So if clustering were used with the MPG dataset, the output would need to be a cluster that the car falls into. Perhaps each cluster would specify the varying level of fuel efficiency for the vehicle. Perhaps the clusters would group the cars into clusters that demonstrated some relationship that had not yet been noticed. 

Source 
Structuring a Neural Network 
Now the three major problem models for neural networks are identified, it is time to examine how data is actually presented to the neural network. This section focuses mainly on how the neural network is structured to accept data items and provide output. The following chapter will detail how to normalize the data prior to being presented to the neural network. 

Neural networks are typically layered with an input and output layer at minimum. There may also be hidden layers. Some neural network types are not broken up into any formal layers beyond the input and output layer. However, the input layer and output layer will always be present and may be incorporated in the same layer. We will now examine the input layer, output layer and hidden layers. 

Understanding the Input Layer 
The input layer is the first layer in a neural network. This layer, like all layers, contains a specific number of neurons. The neurons in a layer all contain similar properties. Typically, the input layer will have one neuron for each attribute that the neural network will use for classification, regression or clustering. 

Consider the previous examples. The Iris dataset has four input neurons. These neurons represent the petal width/length and the sepal width/length. The MPG dataset has more input neurons. The number of input neurons does not always directly correspond to the number of attributes and some attributes will take more than one neuron to encode. This encoding process, called normalization, will be covered in the next chapter. 

The number of neurons determines how a layer’s input is structured. For each input neuron, one double value is stored. For example, the following array could be used as input to a layer that contained five neurons. 
  1. double [ ] input = new double [ 5 ] ;  
The input to a neural network is always an array of the type double. The size of this array directly corresponds to the number of neurons on the input layer. Encog uses the MLData interface to define classes that hold these arrays. The array above can be easily converted into an MLData object with the following line of code. 
  1. MLData data = new BasicMLData ( input ) ;  
The MLData interface defines any “array like” data that may be presented to Encog. Input must always be presented to the neural network inside of a MLData object. The BasicMLData class implements the MLData interface. However, the BasicMLData class is not the only way to provide Encog with data. Other implementations of MLData are used for more specialized types of data. 

The BasicMLData class simply provides a memory-based data holder for the neural network data. Once the neural network processes the input, a MLData-based class will be returned from the neural network’s output layer. The output layer is discussed in the next section. 

Understanding the Output Layer 
The output layer is the final layer in a neural network. This layer provides the output after all previous layers have processed the input. The output from the output layer is formatted very similarly to the data that was provided to the input layer. The neural network outputs an array of doubles. The neural network wraps the output in a class based on the MLData interface. Most of the built-in neural network types return a BasicMLData class as the output. However, future and third party neural network classes may return different classes based other implementations of the MLData interface. 

Neural networks are designed to accept input (an array of doubles) and then produce output (also an array of doubles). Determining how to structure the input data and attaching meaning to the output are the two main challenges of adapting a problem to a neural network. The real power of a neural network comes from its pattern recognition capabilities. The neural network should be able to produce the desired output even if the input has been slightly distorted. 

Regression neural networks typically produce a single output neuron that provides the numeric value produced by the neural network. Multiple output neurons may exist if the same neural network is supposed to predict two or more numbers for the given inputs; Classification produce one or more output neurons, depending on how the output class was encoded. There are several different ways to encode classes. This will be discussed in greater detail in the next chapter; Clustering is setup similarly as the output neurons identify which data belongs to what cluster. 


Hidden Layers 
As previously discussed, neural networks contain and input layer and an output layer. Sometimes the input layer and output layer are the same, but are most often two separate layers. Additionally, other layers may exist between the input and output layers and are called hidden layers. These hidden layers are simply inserted between the input and output layers. The hidden layers can also take on more complex structures

The only purpose of the hidden layers is to allow the neural network to better produce the expected output for the given input. Neural network programming involves first defining the input and output layer neuron counts. Once it is determined how to translate the programming problem into the input and output neuron counts, it is time to define the hidden layers. 

The hidden layers are very much a “black box.” The problem is defined in terms of the neuron counts for the hidden and output layers. How the neural network produces the correct output is performed in part by hidden layers. Once the structure of the input and output layers is defined, the hidden layer structure that optimally learns the problem must also be defined. 

The challenge is to avoid creating a hidden structure that is either too complex or too simple. Too complex of a hidden structure will take too long to train. Too simple of a hidden structure will not learn the problem. A good starting point is a single hidden layer with a number of neurons equal to twice the input layer. Depending on this network’s performance, the hidden layer’s number of neurons is either increased or decreased. 

Developers often wonder how many hidden layers to use. Some research has indicated that a second hidden layer is rarely of any value. Encog is an excellent way to perform a trial and error search for the most optimal hidden layer configuration. For more information see the following URL: http://www.heatonresearch.com/wiki/Hidden_Layers 

Some neural networks have no hidden layers, with the input layer directly connected to the output layer. Further, some neural networks have only a single layer in which the single layer is self-connected. These connections permit the network to learn. Contained in these connections, called synapses, are individual weight matrixes. These values are changed as the neural network learns. The next chapter delves more into weight matrixes. 

Video 
Using a Neural Network 
This section will detail how to structure a neural network for a very simple problem: to design a neural network that can function as an XOR operator. Learning the XOR operator is a frequent “first example” when demonstrating the architecture of a new neural network. Just as most new programming languages are first demonstrated with a program that simply displays “Hello World,” neural networks are frequently demonstrated with the XOR operator. Learning the XOR operator is sort of the “Hello World” application for neural networks. 

The XOR Operator and Neural Networks 
The XOR operator is one of common Boolean logical operators. The other two are the AND and OR operators. For each of these logical operators, there are four different combinations. All possible combinations for the AND operator are shown below. 
  1. 0 AND 0 = 0  
  2. 1 AND 0 = 0  
  3. 0 AND 1 = 0  
  4. 1 AND 1 = 1  
This should be consistent with how you learned the AND operator for computer programming. As its name implies, the AND operator will only return true, or one, when both inputs are true. The OR operator behaves as follows: 
  1. 0 OR 0 = 0  
  2. 1 OR 0 = 1  
  3. 0 OR 1 = 1  
  4. 1 OR 1 = 1  
This also should be consistent with how you learned the OR operator for computer programming. For the OR operator to be true, either of the inputs must be true. The “exclusive or” (XOR) operator is less frequently used in computer programming. XOR has the same output as the OR operator, except for the case where both inputs are true. The possible combinations for the XOR operator are shown here. 
  1. 0 XOR 0 = 0  
  2. 1 XOR 0 = 1  
  3. 0 XOR 1 = 1  
  4. 1 XOR 1 = 0  
As you can see, the XOR operator only returns true when both inputs differ. The next section explains how to structure the input, output and hidden layers for the XOR operator. 

Structuring a Neural Network for XOR 
There are two inputs to the XOR operator and one output. The input and output layers will be structured accordingly. The input neurons are fed the following double values: 
  1. 0.0 , 0.0  
  2. 1.0 , 0.0  
  3. 0.0 , 1.0  
  4. 1.0 , 1.0  
These values correspond to the inputs to the XOR operator, shown above. The one output neuron is expected to produce the following double values: 
  1. 0.0  
  2. 1.0  
  3. 1.0  
  4. 0.0  
This is one way that the neural network can be structured. This method allows a simple Feedforward neural network to learn the XOR operator. The feedforward neural network, also called a Perceptron, is one of the first neural network architectures that we will learn. There are other ways that the XOR data could be presented to the neural network. Later in this book, two examples of recurrent neural networks will be explored including Elman and Jordan styles of neural networks. These methods would treat the XOR data as one long sequence, basically concatenating the truth table for XOR together, resulting in one long XOR sequence, such as: 
  1. 0.0 , 0.00.0  
  2. 1.0 , 0.01.0  
  3. 0.0 , 1.01.0  
  4. 1.0 , 1.00.0  
The line breaks are only for readability; the neural network treats XOR as a long sequence. By using the data above, the network has a single input neuron and a single output neuron. The input neuron is fed one value from the list above and the output neuron is expected to return the next value. This shows that there are often multiple ways to model the data for a neural network. How the data is modeled will greatly influence the success of a neural network. If one particular model is not working, another should be considered. The next step is to format the XOR data for a feedforward neural network. 

Because the XOR operator has two inputs and one output, the neural network follows suit. Additionally, the neural network has a single hidden layer with two neurons to help process the data. The choice for two neurons in the hidden layer is arbitrary and often results in trial and error. The XOR problem is simple and two hidden neurons are sufficient to solve it. A diagram for this network is shown in Figure 1.1. 


There are four different types of neurons in the above network. These are summarized below: 
• Input Neurons: I1, I2
• Output Neuron: O1
• Hidden Neurons: H1, H2
• Bias Neurons: B1, B2

The input, output and hidden neurons were discussed previously. The new neuron type seen in this diagram is the bias neuron. A bias neuron always outputs a value of 1 and never receives input from the previous layer. In a nutshell, bias neurons allow the neural network to learn patterns more effectively. They serve a similar function to the hidden neurons. Without bias neurons, it is very hard for the neural network to output a value of one when the input is zero. This is not so much a problem for XOR data, but it can be for other data sets. To read more about their exact function, visit the following URL: http://www.heatonresearch.com/wiki/Bias 

Now look at the code used to produce a neural network that solves the XOR operator. The complete code is included with the Encog examples and can be found at the following location: org.encog.examples.neural.xor.XORHelloWorld. The example begins by creating the neural network seen in Figure 1.1. The code needed to create this network is relatively simple: 
  1. package demo  
  2.   
  3. import org.encog.Encog;  
  4. import org.encog.engine.network.activation.ActivationSigmoid;  
  5. import org.encog.ml.data.MLData;  
  6. import org.encog.ml.data.MLDataPair;  
  7. import org.encog.ml.data.MLDataSet;  
  8. import org.encog.ml.data.basic.BasicMLDataSet;  
  9. import org.encog.neural.networks.BasicNetwork;  
  10. import org.encog.neural.networks.layers.BasicLayer;  
  11. import org.encog.neural.networks.training.propagation.resilient.ResilientPropagation;  
  12.   
  13.   
  14. BasicNetwork network = new BasicNetwork ();  
  15. network.addLayer (new BasicLayer (nulltrue , 2)) ;  // null:No activation function, true: Has bias neuron; 2: neuron   
  16. network.addLayer (new BasicLayer (new ActivationSigmoid() , true , 3)) ;  
  17. network.addLayer (new BasicLayer(new ActivationSigmoid() , false , 1)) ;  
  18. network.getStructure().finalizeStructure() ;  
  19. network.reset() ;  
In the above code, a BasicNetwork is being created. Three layers are added to this network. The first layer, which becomes the input layer, has two neurons. The hidden layer is added second and has two neurons also. Lastly, the output layer is added and has a single neuron. Finally, the finalizeStructure method is called to inform the network that no more layers are to be added. The call to reset randomizes the weights in the connections between these layers. 

Neural networks always begin with random weight values. A process called training refines these weights to values that will provide the desired output. Because neural networks always start with random values, very different results occur from two runs of the same program. Some random weights provide a better starting point than others. Sometimes random weights will be far enough off that the network will fail to learn. In this case, the weights should be randomized again and the process restarted. 

You will also notice the ActivationSigmoid class in the above code. This specifies the neural network to use the sigmoid activation function. Activation functions will be covered in Chapter 4. The activation functions are only placed on the hidden and output layer; the input layer does not have an activation function. If an activation function were specified for the input layer, it would have no effect. 

Each layer also specifies a boolean value. This boolean value specifies if bias neurons are present on a layer or not. The output layer, as shown in Figure 1.1, does not have a bias neuron as input and hidden layers do. This is because a bias neuron is only connected to the next layer. The output layer is the final layer, so there is no need for a bias neuron. If a bias neuron was specified on the output layer, it would have no effect. 

These weights make up the long-term memory of the neural network. Some neural networks also contain context layers which give the neural network a short-term memory as well. The neural network learns by modifying these weight values. This is also true of the Elman and Jordan neural networks. Now that the neural network has been created, it must be trained. Training is the process where the random weights are refined to produce output closer to the desired output. Training is discussed in the next section. 

Training a Neural Network 
To train the neural network, a MLDataSet object is constructed. This object contains the inputs and the expected outputs. To construct this object, two arrays are created. The first array will hold the input values for the XOR operator. The second array will hold the ideal outputs for each of four corresponding input values. These will correspond to the possible values for XOR. To review, the four possible values are as follows: 
  1. 0 XOR 0 = 0  
  2. 1 XOR 0 = 1  
  3. 0 XOR 1 = 1  
  4. 1 XOR 1 = 0  
First, construct an array to hold the four input values to the XOR operator using a two dimensional double array. This array is as follows: 
  1. def XOR_INPUT =   
  2. [[0.0 , 0.0],  
  3. [1.0 , 0.0],  
  4. [0.0 , 1.0],  
  5. [1.0 , 1.0]] as double[][]  
Likewise, an array must be created for the expected outputs for each of the input values. This array is as follows: 
  1. def XOR_IDEAL =   
  2. [[0.0],  
  3. [1.0],  
  4. [1.0],  
  5. [0.0]] as double [][]  
Even though there is only one output value, a two-dimensional array must still be used to represent the output. If there is more than one output neuron, additional columns are added to the array. Now that the two input arrays are constructed, a MLDataSet object must be created to hold the training set. This object is created as follows: 
  1. // create training data  
  2. MLDataSet trainingSet = new BasicMLDataSet(XOR_INPUT, XOR_IDEAL);  
Now that the training set has been created, the neural network can be trained. Training is the process where the neural network’s weights are adjusted to better produce the expected output. Training will continue for many iterations until the error rate of the network is below an acceptable level. First, a training object must be created. Encog supports many different types of training. 

For this example Resilient Propagation (RPROP) training is used. RPROP is perhaps the best general-purpose training algorithm supported by Encog. Other training techniques are provided as well as certain problems are solved better with certain training techniques. The following code constructs a RPROP trainer: 
  1. // train the neural network  
  2. final ResilientPropagation train = new ResilientPropagation(network, trainingSet);  
All training classes implement the MLTrain interface. The RPROP algorithm is implemented by the ResilientPropagation class, which is constructed above. Once the trainer is constructed, the neural network should be trained. Training the neural network involves calling the iteration method on the MLTrain class until the error is below a specific value. The error is the degree to which the neural network output matches the desired output. 
  1. int epoch = 1;  
  2. while(true) {  
  3.     train.iteration();  
  4.     println("Epoch #" + epoch + " Error:" + train.getError());  
  5.     epoch++;  
  6.     if(train.getError() < 0.01break  
  7. }   
  8. train.finishTraining();  
The above code loops through as many iterations, or epochs, as it takes to get the error rate for the neural network to be below 1%. Once the neural network has been trained, it is ready for use. The next section will explain how to use a neural network. 

Executing a Neural Network 
Making use of the neural network involves calling the compute method on the BasicNetwork class. Here we loop through every training set value and display the output from the neural network: 
  1. // test the neural network  
  2. println("Neural Network Results:");  
  3. for(MLDataPair pair: trainingSet ) {  
  4.     final MLData output = network.compute(pair.getInput());  
  5.     println(pair.getInput().getData(0) + "," + pair.getInput().getData(1)  
  6.             + ", actual=" + output.getData(0) + ",ideal=" + pair.getIdeal().getData(0));  
  7. }  
The compute method accepts an MLData class and also returns another MLData object. The returned object contains the output from the neural network, which is displayed to the user. With the program run, the training results are first displayed. For each epoch, the current error rate is displayed. 
Epoch #1 Error:0.27173924828852564
Epoch #2 Error:0.2560089991807858
Epoch #3 Error:0.25074292754681765
Epoch #4 Error:0.2558958564543226
Epoch #5 Error:0.2514551021544388
Epoch #6 Error:0.25094609288671044
Epoch #7 Error:0.2514666849457718
Epoch #8 Error:0.2505670667125269
...

The error starts at 25% at epoch 1. By epoch 107, the training dropped below 1% and training stops. Because neural network was initialized with random weights, it may take different numbers of iterations to train each time the program is run. Additionally, though the final error rate may be different, it should always end below 1%. Finally, the program displays the results from each of the training items as follows: 
Neural Network Results:
0.0,0.0, actual=0.07915414274445526,ideal=0.0
1.0,0.0, actual=0.8619315893915233,ideal=1.0
0.0,1.0, actual=0.9749658247320185,ideal=1.0
1.0,1.0, actual=0.0933274330904267,ideal=0.0

As you can see, the network has not been trained to give the exact results. This is normal. Because the network was trained to 1% error, each of the results will also be within generally 1% of the expected value. Because the neural network is initialized to random values, the final output will be different on second run of the program.

沒有留言:

張貼留言

網誌存檔

關於我自己

我的相片
Where there is a will, there is a way!