Preface
The Encog Workbench is a GUI application that enables many different machine learning tasks without writing Java or C# code. The Encog Workbench itself is written in Java, but generates files that can be used with any Encog framework. The Encog Workbench is distributed as a single self-executing JAR file. On most operating systems, the Encog Workbench JAR file is started simply by double-clicking. This includes Microsoft Windows, Macintosh and some variants of Linux. To start from the command line, the following command is used. (encog-workbench-3.3.0-release.zip)
Depending on the version of Encog, the above JAR file might have a different name. No matter the version, the file will have “encog-workbench” and “executable” somewhere in its name. No other JAR files are necessary for the workbench as all third-party JAR files were are placed inside this JAR.
Structure of the Encog Workbench
Before studying how the Encog Workbench is actually used, we will learn about its structure. The workbench works with a project directory that holds all of the files needed for a project. The Encog Workbench project contains no subdirectories. Also, if a subdirectory is added into an Encog Workbench project, it simply becomes another independent project. There is also no main “project file” inside an Encog Workbench project. Often a readme.txt or readme.html file is placed inside of an EncogWorkbench project to explain what to do with the project. However, this file is included at the discretion of the project creator.
There are several different file types that might be placed in an Encog workbench project. These files are organized by their file extension. The extension of a file is how the Encog Workbench knows what to do with that file. The following extensions are recognized by the Encog Workbench:
The following sections will discuss the purpose of each file type.
Workbench CSV Files
An acronym for “comma separated values,” CSV files hold tabular data. However, CSV files are not always “comma separated.” This is especially true in parts of the world that use a decimal comma instead of a decimal point. The CSV files used by Encog can be based on a decimal comma. In this case, a semicolon (;) should be used as the field separator. CSV files may also have headers to define what each column of the CSV file means. Column headers are optional, but very much suggested. Column headers name the attributes and provide consistency across the both the CSV files created by Encog and provided by the user.
A CSV file defines the data used by Encog. Each row in the CSV file defines a training set element and each column defines an attribute. If a particular attribute is not known for a training set element, then the “?” character should be placed in that row/column. Encog deals with missing values in various ways. This is discussed later in this chapter in the Encog analyst discussion.
A CSV file cannot be used to directly train a neural network, but must first be converted into an EGB file. To convert a CSV file to an EGB file, right-click the CSV file and choose “Export to Training (EGB).” EGB files nicely define what columns are input and ideal data, while CSV files do not offer any distinction. Rather, CSV files might represent raw data provided by the user. Additionally, some CSV files are generated by Encog as raw user data is processed.
Workbench EG Files
Encog EG files store a variety of different object types, but in themselves are simply text files. All data inside of EG files is stored with decimal points and comma separator, regardless of the geographic region in which Encog is running. While CSV files can be formatted according to local number formatting rules, EG files cannot. This is to keep EG files consistent across all Encog platforms.
The following object types are stored in EG files.
The Encog workbench will display the object type of any EG file that is located in the project directory. An Encog EG file only stores one object per file. If multiple objects are to be stored, they must be stored in separate EG files.
Workbench EGA Files
Encog Analyst script files, or EGA files, hold instructions for the Encog analyst. These files hold statistical information about what a CSV file is designed to analyze. EGA files also hold script information that describes how to process raw data. EGA files are executable by the workbench. A full discussion of the EGA file and every possible configuration/script item is beyond the scope of this book. However, a future book will be dedicated to the Encog Analyst. Additional reference information about the Encog Analyst script file can be found here: http://www.heatonresearch.com/wiki/EGA_File
Later in this chapter, we will create an EGA file to analyze the iris dataset.
Workbench EGB Files
Encog binary files, or EGB files, hold training data. As previously discussed, CSV files are typically converted to EGB for Encog. This data is stored in a platform-independent binary format. Because of this, EGB files are read much faster than a CSV file. Additionally, the EGB file internally contains the number of input and ideal columns present in the file. CSV files must be converted to EGB files prior to training. To convert a CSV file to an EGB file, right-click the selected CSV file and choose “Export to Training (EGB).”
Workbench Image Files
The Encog workbench does not directly work with image files at this point, but can be displayed by double-clicking. The Encog workbench is capable of displaying PNG, JPG and GIF files.
Workbench Text Files
Encog Workbench does not directly use text files. However, text files are a means of storing instructions for project file users. For instance, a readme.txt file can be added to a project and displayed inside of the analyst. The Encog Workbench can display both text and HTML files.
A Simple XOR Example
There are many different ways that the Encog Workbench can be used. The Encog Analyst can be used to create projects that include normalization, training and analysis. However, all of the individual neural network parts can also manually created and trained. If the data is already normalized, Encog Analyst may not be necessary. In this section we will see how to use the Encog Workbench without the Encog Analyst by creating a simple XOR neural network. The XOR dataset does not require any normalization as itis already in the 0 to 1 range.
Creating a New Project
First create a new project by launching the Encog Workbench. Once the Encog Workbench starts up, the options of creating a new project, opening an existing project or quitting will appear. Choose to create a new project and name it “XOR.” This will create a new empty folder named XOR. You will now see the Encog Workbench in Figure 3.1.
Figure 3.1: The Encog Workbench
This is the basic layout of the Encog Workbench. There are three main areas. The tall rectangle on the left is where all project files are shown. Currently this project has no files. You can also see the log output and status information. The rectangle just above the log output is where documents are opened. The look of the Encog Workbench is very much like IDE and should be familiar to developers.
Generate Training Data
The next step is to obtain training data. There are several ways to do this. First, Encog Workbench supports drag and drop. For instance, CSVs can be dragged from the operating system and dropped into the project as a copy, leaving the original file unchanged. These files will then appear in the project tree. The Encog Workbench comes with a number of built-in training sets. Additionally, it can download external data such as stock prices and even sunspot information. The sunspot information can be used for time-series prediction experiments.
The Encog Workbench also has a built-in XOR training set. To access it, choose Tools->Generate Training Data. This will open the “Create Training Data” dialog. Choose “XOR Training Set” and name it “xor.csv.” Your new CSV file will appear in the project tree. If you double-click the “xor.csv” file, you will see the following training data in Listing 3.1:
Listing 3.1: XOR Training Data
It is important to note that the file does have headers. This must be specified when the EGB file is generated.
Create a Neural Network
Now that the training data has been created, a neural network should be created learn the XOR data. To create a neural network, choose “File->New File.” Then choose “Machine Learning Method” and name the neural network “xor.eg.” Choose “Feedforward Neural Network.” This will display the dialog shown in Figure 3.2:
Make sure to fill in the dialog exactly as above. There should be two input neurons, one output neuron and a single hidden layer with two neurons. Choose both activation functions to be sigmoid. Once the neural network is created, it will appear on the project tree.
Train the Neural Network
It is now time to train the neural network. The neural network that you see currently is untrained. To easily determine if the neural network is untrained, double-click the EG file that contains the neural network. This will show Figure 3.3.
Figure 3.3: Editing the Network
This screen shows some basic stats on the neural network. To see more detail, select the “Visualize” button and choose “Network Structure.” This will show Figure 3.4.
Figure 3.4: Network Structure
The input and output neurons are shown from the structure view. All of the connections between with the hidden layer and bias neurons are also visible. The bias neurons, as well as the hidden layer, help the neural network to learn. With this complete, it is time to actually train the neural network. Begin by closing the histogram visualization and the neural network. There should be no documents open inside of the workbench.
Right-click the “xor.csv” training data. Choose “Export to Training (EGB).” Fill in two input neurons and one output neuron on the dialog that appears. On the next dialog, be sure to specify that there are headers. Once this is complete, an EGB file will be added to the project tree. This will result in three files: an EG file, an EGB file and a CSV file.
To train the neural network, choose “Tools->Train.” This will open a dialog to choose the training set and machine learning method. Because there is only one EG file and one EGB file, this dialog should default to the correct values. Leave the “Load to Memory” checkbox clicked. As this is such a small training set, there is no reason to not load to memory.
There are many different training methods to choose from. For this example, choose “Propagation - Resilient.” Accept all default parameters for this training type. Once this is complete, the training progress tab will appear. Click “Start” to begin training. Training will usually finish in under a second. However, if the training continues for several seconds, the training may need to be reset by clicking the drop list titled “Because a neural network starts with random weights, training times will vary. On a small neural network such as XOR, the weights can potentially be bad enough that the network never trains. If this is the case, simply reset the network as it trains.
Evaluate the Neural Network
There are two ways to evaluate the neural network. The first is to simply calculate the neural network error by choosing “Tools->Evaluate Network.” You will be prompted for the machine learning method and training data to use. This will show you the neural network error when evaluated against the specified training set. For this example, the error will be a percent. When evaluating this percent, the lower the percent the better. Other machine learning methods may generate an error as a number or other value.
For a more advanced evaluation, choose “Tools->Validation Chart.” This will result in an output similar to Figure 3.5.
Figure 3.5: Validation Chart for XOR
This graphically depicts how close the neural network’s computation matches the ideal value (validation). As shown in this example, they are extremely close.
Using the Encog Analyst
In the last section we used the Workbench with a simple data set that did not need normalization. In this section we will use the Encog Analyst to work with a more complex data set - the iris data set that has already been demonstrated several times. The normalization procedure is already explored. However, this will provide an example of how to normalize and produce a neural network for it using the Encog Analyst.
The iris dataset is built into the Encog Workbench, so it is easy to create a dataset for it. Create a new Encog Workbench project as described in the previous section. Name this new project “Iris.” To obtain the iris data set, choose “Tools->Generate Training Data.” Choose the “Iris Dataset” and name it “iris.csv.” Right-click the “iris.csv” file and choose “Analyst Wizard.” This will bring up a dialog like Figure 3.6.
Figure 3.6: Encog Analyst Wizard
You can accept most default values. However, “Target Field” and “CSV File Headers” fields should be changed. Specify “species” as the target and indicate that there are headers. The other two tabs should remain unchanged. Click “OK” and the wizard will generate an EGA file. This exercise also gave the option to show how to deal with missing values. While the iris dataset has no missing values, this is not the case with every dataset. The default action is to discard them. However, you can also choose to average them out.
Double click this EGA file to see its contents as in Figure 3.7.
Figure 3.7: Edit an EGA File
From this tab you can execute the EGA file. Click “Execute” and a status dialog will be displayed. From here, click “Start” to begin the process. The entire execution should take under a minute on most computers.
This process will also create a number of files. The complete list of files, in this project is:
If you change the EGA script file or use different options for the wizard, you may have different steps. To see how the network performed, open the iris output.csv file. You will see Listing 3.2.
Listing 3.2: Evaluation of the Iris Data
This illustrates how the neural network attempts to predict what iris species each row belongs to. As you can see, it is correct for all of the rows shown here. These are data items that the neural network was not originally trained with.
Encog Analyst Reports
This section will discuss how the Encog Workbench can also produce several Encog Analyst reports. To produce these reports, open the EGA file as seen in Figure 3.7. Clicking the “Visualize” button gives you several visualization options. Choose either a “Range Report” or “Scatter Plot.” Both of these are discussed in the next sections.
Range Report
The range report shows the ranges of each of the attributes that are used to perform normalization by the Encog Analyst. Figure 3.8 shows the beginning of the range report.
Figure 3.8: Encog Analyst Range Report
This is only the top portion. Additional information is available by scrolling down.
Scatter Plot
It is also possible to display a scatter plot to view the relationship between two or more attributes. When choosing to display a scatter plot, Encog Analyst will prompt you to choose which attributes to relate. If you choose just two, you are shown a regular scatter plot. If you choose all four, you will be shown a multivariate scatter plot as seen in Figure 3.9.
Figure 3.9: Encog Analyst Multivariate Scatter Plot Report
This illustrates how four variables relate. To see how to variables relate, choose two squares on the diagonal. Follow the row and column on each and the square that intersects is the relationship between those two attributes. It is also important to note that the triangle formed above the diagonal is the mirror image (reverse) of the triangle below the diagonal.
The Encog Workbench is a GUI application that enables many different machine learning tasks without writing Java or C# code. The Encog Workbench itself is written in Java, but generates files that can be used with any Encog framework. The Encog Workbench is distributed as a single self-executing JAR file. On most operating systems, the Encog Workbench JAR file is started simply by double-clicking. This includes Microsoft Windows, Macintosh and some variants of Linux. To start from the command line, the following command is used. (encog-workbench-3.3.0-release.zip)
Depending on the version of Encog, the above JAR file might have a different name. No matter the version, the file will have “encog-workbench” and “executable” somewhere in its name. No other JAR files are necessary for the workbench as all third-party JAR files were are placed inside this JAR.
Structure of the Encog Workbench
Before studying how the Encog Workbench is actually used, we will learn about its structure. The workbench works with a project directory that holds all of the files needed for a project. The Encog Workbench project contains no subdirectories. Also, if a subdirectory is added into an Encog Workbench project, it simply becomes another independent project. There is also no main “project file” inside an Encog Workbench project. Often a readme.txt or readme.html file is placed inside of an EncogWorkbench project to explain what to do with the project. However, this file is included at the discretion of the project creator.
There are several different file types that might be placed in an Encog workbench project. These files are organized by their file extension. The extension of a file is how the Encog Workbench knows what to do with that file. The following extensions are recognized by the Encog Workbench:
The following sections will discuss the purpose of each file type.
Workbench CSV Files
An acronym for “comma separated values,” CSV files hold tabular data. However, CSV files are not always “comma separated.” This is especially true in parts of the world that use a decimal comma instead of a decimal point. The CSV files used by Encog can be based on a decimal comma. In this case, a semicolon (;) should be used as the field separator. CSV files may also have headers to define what each column of the CSV file means. Column headers are optional, but very much suggested. Column headers name the attributes and provide consistency across the both the CSV files created by Encog and provided by the user.
A CSV file defines the data used by Encog. Each row in the CSV file defines a training set element and each column defines an attribute. If a particular attribute is not known for a training set element, then the “?” character should be placed in that row/column. Encog deals with missing values in various ways. This is discussed later in this chapter in the Encog analyst discussion.
A CSV file cannot be used to directly train a neural network, but must first be converted into an EGB file. To convert a CSV file to an EGB file, right-click the CSV file and choose “Export to Training (EGB).” EGB files nicely define what columns are input and ideal data, while CSV files do not offer any distinction. Rather, CSV files might represent raw data provided by the user. Additionally, some CSV files are generated by Encog as raw user data is processed.
Workbench EG Files
Encog EG files store a variety of different object types, but in themselves are simply text files. All data inside of EG files is stored with decimal points and comma separator, regardless of the geographic region in which Encog is running. While CSV files can be formatted according to local number formatting rules, EG files cannot. This is to keep EG files consistent across all Encog platforms.
The following object types are stored in EG files.
The Encog workbench will display the object type of any EG file that is located in the project directory. An Encog EG file only stores one object per file. If multiple objects are to be stored, they must be stored in separate EG files.
Workbench EGA Files
Encog Analyst script files, or EGA files, hold instructions for the Encog analyst. These files hold statistical information about what a CSV file is designed to analyze. EGA files also hold script information that describes how to process raw data. EGA files are executable by the workbench. A full discussion of the EGA file and every possible configuration/script item is beyond the scope of this book. However, a future book will be dedicated to the Encog Analyst. Additional reference information about the Encog Analyst script file can be found here: http://www.heatonresearch.com/wiki/EGA_File
Later in this chapter, we will create an EGA file to analyze the iris dataset.
Workbench EGB Files
Encog binary files, or EGB files, hold training data. As previously discussed, CSV files are typically converted to EGB for Encog. This data is stored in a platform-independent binary format. Because of this, EGB files are read much faster than a CSV file. Additionally, the EGB file internally contains the number of input and ideal columns present in the file. CSV files must be converted to EGB files prior to training. To convert a CSV file to an EGB file, right-click the selected CSV file and choose “Export to Training (EGB).”
Workbench Image Files
The Encog workbench does not directly work with image files at this point, but can be displayed by double-clicking. The Encog workbench is capable of displaying PNG, JPG and GIF files.
Workbench Text Files
Encog Workbench does not directly use text files. However, text files are a means of storing instructions for project file users. For instance, a readme.txt file can be added to a project and displayed inside of the analyst. The Encog Workbench can display both text and HTML files.
A Simple XOR Example
There are many different ways that the Encog Workbench can be used. The Encog Analyst can be used to create projects that include normalization, training and analysis. However, all of the individual neural network parts can also manually created and trained. If the data is already normalized, Encog Analyst may not be necessary. In this section we will see how to use the Encog Workbench without the Encog Analyst by creating a simple XOR neural network. The XOR dataset does not require any normalization as itis already in the 0 to 1 range.
Creating a New Project
First create a new project by launching the Encog Workbench. Once the Encog Workbench starts up, the options of creating a new project, opening an existing project or quitting will appear. Choose to create a new project and name it “XOR.” This will create a new empty folder named XOR. You will now see the Encog Workbench in Figure 3.1.
Figure 3.1: The Encog Workbench
This is the basic layout of the Encog Workbench. There are three main areas. The tall rectangle on the left is where all project files are shown. Currently this project has no files. You can also see the log output and status information. The rectangle just above the log output is where documents are opened. The look of the Encog Workbench is very much like IDE and should be familiar to developers.
Generate Training Data
The next step is to obtain training data. There are several ways to do this. First, Encog Workbench supports drag and drop. For instance, CSVs can be dragged from the operating system and dropped into the project as a copy, leaving the original file unchanged. These files will then appear in the project tree. The Encog Workbench comes with a number of built-in training sets. Additionally, it can download external data such as stock prices and even sunspot information. The sunspot information can be used for time-series prediction experiments.
The Encog Workbench also has a built-in XOR training set. To access it, choose Tools->Generate Training Data. This will open the “Create Training Data” dialog. Choose “XOR Training Set” and name it “xor.csv.” Your new CSV file will appear in the project tree. If you double-click the “xor.csv” file, you will see the following training data in Listing 3.1:
- "op1","op2","result"
- 0,0,0
- 1,0,1
- 0,1,1
- 1,1,0
It is important to note that the file does have headers. This must be specified when the EGB file is generated.
Create a Neural Network
Now that the training data has been created, a neural network should be created learn the XOR data. To create a neural network, choose “File->New File.” Then choose “Machine Learning Method” and name the neural network “xor.eg.” Choose “Feedforward Neural Network.” This will display the dialog shown in Figure 3.2:
Make sure to fill in the dialog exactly as above. There should be two input neurons, one output neuron and a single hidden layer with two neurons. Choose both activation functions to be sigmoid. Once the neural network is created, it will appear on the project tree.
Train the Neural Network
It is now time to train the neural network. The neural network that you see currently is untrained. To easily determine if the neural network is untrained, double-click the EG file that contains the neural network. This will show Figure 3.3.
Figure 3.3: Editing the Network
This screen shows some basic stats on the neural network. To see more detail, select the “Visualize” button and choose “Network Structure.” This will show Figure 3.4.
Figure 3.4: Network Structure
The input and output neurons are shown from the structure view. All of the connections between with the hidden layer and bias neurons are also visible. The bias neurons, as well as the hidden layer, help the neural network to learn. With this complete, it is time to actually train the neural network. Begin by closing the histogram visualization and the neural network. There should be no documents open inside of the workbench.
Right-click the “xor.csv” training data. Choose “Export to Training (EGB).” Fill in two input neurons and one output neuron on the dialog that appears. On the next dialog, be sure to specify that there are headers. Once this is complete, an EGB file will be added to the project tree. This will result in three files: an EG file, an EGB file and a CSV file.
To train the neural network, choose “Tools->Train.” This will open a dialog to choose the training set and machine learning method. Because there is only one EG file and one EGB file, this dialog should default to the correct values. Leave the “Load to Memory” checkbox clicked. As this is such a small training set, there is no reason to not load to memory.
There are many different training methods to choose from. For this example, choose “Propagation - Resilient.” Accept all default parameters for this training type. Once this is complete, the training progress tab will appear. Click “Start” to begin training. Training will usually finish in under a second. However, if the training continues for several seconds, the training may need to be reset by clicking the drop list titled “Because a neural network starts with random weights, training times will vary. On a small neural network such as XOR, the weights can potentially be bad enough that the network never trains. If this is the case, simply reset the network as it trains.
Evaluate the Neural Network
There are two ways to evaluate the neural network. The first is to simply calculate the neural network error by choosing “Tools->Evaluate Network.” You will be prompted for the machine learning method and training data to use. This will show you the neural network error when evaluated against the specified training set. For this example, the error will be a percent. When evaluating this percent, the lower the percent the better. Other machine learning methods may generate an error as a number or other value.
For a more advanced evaluation, choose “Tools->Validation Chart.” This will result in an output similar to Figure 3.5.
Figure 3.5: Validation Chart for XOR
This graphically depicts how close the neural network’s computation matches the ideal value (validation). As shown in this example, they are extremely close.
Using the Encog Analyst
In the last section we used the Workbench with a simple data set that did not need normalization. In this section we will use the Encog Analyst to work with a more complex data set - the iris data set that has already been demonstrated several times. The normalization procedure is already explored. However, this will provide an example of how to normalize and produce a neural network for it using the Encog Analyst.
The iris dataset is built into the Encog Workbench, so it is easy to create a dataset for it. Create a new Encog Workbench project as described in the previous section. Name this new project “Iris.” To obtain the iris data set, choose “Tools->Generate Training Data.” Choose the “Iris Dataset” and name it “iris.csv.” Right-click the “iris.csv” file and choose “Analyst Wizard.” This will bring up a dialog like Figure 3.6.
Figure 3.6: Encog Analyst Wizard
You can accept most default values. However, “Target Field” and “CSV File Headers” fields should be changed. Specify “species” as the target and indicate that there are headers. The other two tabs should remain unchanged. Click “OK” and the wizard will generate an EGA file. This exercise also gave the option to show how to deal with missing values. While the iris dataset has no missing values, this is not the case with every dataset. The default action is to discard them. However, you can also choose to average them out.
Double click this EGA file to see its contents as in Figure 3.7.
Figure 3.7: Edit an EGA File
From this tab you can execute the EGA file. Click “Execute” and a status dialog will be displayed. From here, click “Start” to begin the process. The entire execution should take under a minute on most computers.
This process will also create a number of files. The complete list of files, in this project is:
If you change the EGA script file or use different options for the wizard, you may have different steps. To see how the network performed, open the iris output.csv file. You will see Listing 3.2.
Listing 3.2: Evaluation of the Iris Data
This illustrates how the neural network attempts to predict what iris species each row belongs to. As you can see, it is correct for all of the rows shown here. These are data items that the neural network was not originally trained with.
Encog Analyst Reports
This section will discuss how the Encog Workbench can also produce several Encog Analyst reports. To produce these reports, open the EGA file as seen in Figure 3.7. Clicking the “Visualize” button gives you several visualization options. Choose either a “Range Report” or “Scatter Plot.” Both of these are discussed in the next sections.
Range Report
The range report shows the ranges of each of the attributes that are used to perform normalization by the Encog Analyst. Figure 3.8 shows the beginning of the range report.
Figure 3.8: Encog Analyst Range Report
This is only the top portion. Additional information is available by scrolling down.
Scatter Plot
It is also possible to display a scatter plot to view the relationship between two or more attributes. When choosing to display a scatter plot, Encog Analyst will prompt you to choose which attributes to relate. If you choose just two, you are shown a regular scatter plot. If you choose all four, you will be shown a multivariate scatter plot as seen in Figure 3.9.
Figure 3.9: Encog Analyst Multivariate Scatter Plot Report
This illustrates how four variables relate. To see how to variables relate, choose two squares on the diagonal. Follow the row and column on each and the square that intersects is the relationship between those two attributes. It is also important to note that the triangle formed above the diagonal is the mirror image (reverse) of the triangle below the diagonal.
沒有留言:
張貼留言