2017年2月10日 星期五

[TensorFlow] Tutorials 01 - Simple Linear Model

Source From Here (01_Simple_Linear_Model.ipynb
Introduction 
This tutorial demonstrates the basic workflow of using TensorFlow with a simple linear model. After loading the so-called MNIST data-set with images of hand-written digits, we define and optimize a simple mathematical model in TensorFlow. The results are then plotted and discussed. 

You should be familiar with basic linear algebra, Python and the Jupyter Notebook editor. It also helps if you have a basic understanding of Machine Learning and classification. (The whole sample code below is c1.py

Imports 
>>> import tensorflow as tf
>>> import numpy as np
>>> from sklearn.metrics import confusion_matrix
>>> import matplotlib.pyplot as plt


Load Data 
The MNIST data-set is about 12 MB and will be downloaded automatically if it is not located in the given path. 
>>> from tensorflow.examples.tutorials.mnist import input_data
>>> data = input_data.read_data_sets("data/MINST", one_hot=True)
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting data/MINST/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting data/MINST/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting data/MINST/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/MINST/t10k-labels-idx1-ubyte.gz

The MNIST data-set has now been loaded and consists of 70,000 images and associated labels (i.e. classifications of the images). The data-set is split into 3 mutually exclusive sub-sets. We will only use the training and test-sets in this tutorial. 
>>> print("- Training-set:\t\t{}".format(len(data.train.labels)))
- Training-set: 55000
>>> print("- Test-set:\t\t{}".format(len(data.test.labels)))
- Test-set: 10000
>>> print("- Validation-set:\t{}".format(len(data.validation.labels)))
- Validation-set: 5000

One-Hot Encoding 
The data-set has been loaded as so-called One-Hot encoding. This means the labels have been converted from a single number to a vector whose length equals the number of possible classes. All elements of the vector are zero except for the i'th element which is one and means the class is i. For example, the One-Hot encoded labels for the first 5 images in the test-set are: 

We also need the classes as single numbers for various comparisons and performance measures, so we convert the One-Hot encoded vectors to single number by taking the index of the highest element. Note that the word "class" is a keyword used in Python so we need to use the name 'cls' instead. 
>>> data.test.cls = np.array([label.argmax() for label in data.test.labels])
>>> data.test.cls[0:5]
array([7, 2, 1, 0, 4])

The class for the first image is 7, which corresponds to a One-Hot encoded vector where all elements are zero except for the element with index 7. 

Data dimensions 
The data dimension are used in several places in the source-code below. In computer programming, it is generally best to use variables and constants rather than having to hard-code specific numbers ever time that the number is used. This means the numbers only have to be changed in one single place. Ideally these would be inferred from the data that has been read, but here we just write the numbers below: 
>>> img_size = 28 # We know that MNIST images are 28 pixels in each dimension.
>>> img_size_flat = img_size * img_size # Images are stored in one-dimensional arrays of this length.
>>> img_shape = (img_size, img_size)
>>> num_classes = 10 # Number of classes, one class for each of 10 digits.

Helper-function for plotting images 
Function used to plot 9 images in a 3x3 grid, and writing the true and predicted classes below each image: 
  1. def plot_images(images, cls_true, cls_pred=None):  
  2.     assert len(images) == len(cls_true) == 9  
  3.     # Create figure with 3x3 sub-plots.  
  4.     fig, axes = plt.subplots(33)  
  5.     fig.subplots_adjust(hspace=0.3, wspace=0.3)  
  6.     for i, ax in enumerate(axes.flat):  
  7.         # Plot image.  
  8.         ax.imshow(images[i].reshape(img_shape), cmap='binary')  
  9.         if cls_pred is None:  
  10.             xlabel = "True: {0}".format(cls_true[i])  
  11.         else:  
  12.             xlabel = "True: {0}, Pred: {1}".format(cls_true[i], cls_pred[i])  
  13.         ax.set_xlabel(xlabel)  
  14.         # Remove ticks from the plot.  
  15.         ax.set_xticks([])  
  16.         ax.set_yticks([])  
Plot a few images to see if data is correct 
>>> images = data.test.images[0:9] # Get the first 10 images from the test-set.
>>> cls_true = data.test.cls[0:9] # Get the true classes for those images.
>>> plot_images(images=images, cls_true=cls_true) # Plot the images and labels using our helper function above.


TensorFlow Graph 
The entire purpose of TensorFlow is to have a so-called computational graph that can be executed much more efficiently than if the same calculations were to be performed directly in Python. TensorFlow can be more efficiently than NumPy because TensorFlow knows the entire computation graph that must be executed, while NumPy only knows the computation of a single mathematical operation at a time. 

TensorFlow can also automatically calculate the gradients that are needed to optimize the variables of the graph so as to make the model perform better. This is because the graph is a combination of simple mathematical expressions so the gradient of the entire graph can be calculated using the train-rule for derivatives. 

TensorFlow can also take advantage of multi-core CPUs as well as GPUs - and Google has even built special chips just for TensorFlow which are called TPUs (Tensor Processing Units) and are even faster than GPUs. 

A TensorFlow graph consists of the following parts which will be detailed below: 
* Placeholder variables used to change the input to the graph
* Model variables that are going to be optimized so as to make the model perform better.
* The model which is essentially just a mathematical function that calculates some output given the input in the placeholder variables and the model variables.
* A cost measure that can be used to guide the optimization of the variables.
* An optimization method which updates the variables of the model.


Placeholder variables 
Placeholder variables serve as the input to the graph that we may change each time we execute the graph. We call this feeding the placeholder variables and it is demonstrated further below. First we define the placeholder variable for the input images. This allows us to change the images that are input to the TensorFlow graph. This is so-called tensor, which just means that it is a multi-dimensional vector or matrix. The data-type is set to float32 and the shape is set to [None, img_size_flat], where None means that the tensor may hold an arbitrary number of images with each image being a vector of length img_size_flat
>>> x = tf.placeholder(tf.float32, [None, img_size_flat])
>>> x
<tf.Tensor 'Placeholder:0' shape=(?, 784) dtype=float32>

Next we have the placeholder variable for the true labels associated with the images that were input in the placeholder variable x. The shape of this placeholder variable is [None, num_classes] which means it may hold an arbitrary number of labels and each label is a vector of length num_classes which is 10 in this case. 
>>> y_true = tf.placeholder(tf.float32, [None, num_classes])
>>> y_true

Finally we have the placeholder variable for the true class of each image in the placeholder variable x. These are integers and the dimensionality of this placeholder variable is set to [None] which means the placeholder variable is a one-dimensional vector of arbitrary length. 
>>> y_true_cls = tf.placeholder(tf.int64, [None])
>>> y_true_cls

Variables to be optimized 
Apart from the placeholder variables that were defined above and which serve as feeding input data into the moddel, there are also some model variables that must be changed by TensorFlow so as to make the model perform better on the training data. 

The first variable that must be optimized is called weights and is defined here as a TensorFlow variable that must be initialized with zeros and whose shape is [img_size_flat, num_classes], so it is a 2-dimensional tensor (or matrix) with img_size_flat rows and num_classes columns. 
>>> weights = tf.Variable(tf.zeros([img_size_flat, num_classes]))
>>> weights

The second variable that must be optimized is called biases and is defined as 1-dimensional tensor (or vector) of length num_classes
>>> biases = tf.Variable(tf.zeros([num_classes]))
>>> biases

Model 
This simple mathematical model multiples the images in the placeholder variabl x with the weights and then adds the biases. The result is a matrix of shape [num_images, num_classes] because x has shpe [num_images, img_size_flat] and weights has shape [img_size_flat, num_classes], so the multiplication of those two matrices is a matrix with shape [num_images, num_classes] and then the biases vector is added to each row of that matrix. 

Note that the name logits is typical TensorFlow terminology, but other people may call the variable something else. 
>>> logits = tf.matmul(x, weights) + biases
>>> logits

Now logits is a matrix with num_images rows and num_classes columns, where the element of the i'th row and j'th column is an estimate of how likely the i'th input image is to be of the j'th class. 

However, these estimates are a bit rough and difficult to interpret because the numbers may be very small or large, so we want to normalize them so that each row of the logits matrix sums to one, and each element is limited between zero and one. This is calculated using the so-called softmax function and the result is stored in y_pred
>>> y_pred = tf.nn.softmax(logits)
>>> y_pred

The predicted class can be calculated from the y_pred matrix by taking the index of the largest element in each row. 
>>> y_pred_cls = tf.argmax(y_pred, dimension=1)
>>> y_pred_cls

Cost-function to be optimized 
To make the model better at classifying the input images, we must somehow change the variable for weights and biases. To do this we first need to know how well the model currently performs by comparing the predicted output of the model y_pred to the desired output y_true

The cross-entropy is a performance measure used in classification. The cross-entropy is a continuous function that is always positive and if the predicted output of the model exactly matches the desired output then the cross-entropy equals zero. The goal of optimization is therefore to minimize the cross-entropy so it gets as close to zero as possible by changing the weights and biases of the model. 

TensorFlow has a built-in function for calculating the cross-entropy. Note that it uses the values of the logits beause it also calculate the softmax internally. 
>>> cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=y_true)
>>> cross_entropy

We have now calculated the cross-entropy for each of the image classifications so we have a measure of how well the model performs on each image individually. But in order to use the cross-entropy to guide the optimization of the models's variables we need a single scalar value, so we simply take the advantage of the cross-entropy for all the image classifications. 
>>> cost = tf.reduce_mean(cross_entropy)
>>> cost

Optimization method 
Now that we have a cost measure that must be minimized, we can then create an optimizer. In this case it is the basic from Gradient Descent where the step-size is set to 0.5. Note that optimization is not performed at this point. In fact, nothing is calculated at all, we just add the optimizer-object to the TensorFlow graph for later execution. 
>>> optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.5).minimize(cost)
>>> optimizer

Performance measures 
We need a few more performance measures to display the progress to the user. There is a vector of booleans whether the predicted class equals the true class of each image. 
>>> correct_prediction = tf.equal(y_pred_cls, y_true_cls)
>>> correct_prediction

This calculates the classification accuracy by first type-casting the vector of booleans to floats, so that False becomes 0 and True becomes 1, and then calculating the average of these numbers. 
>>> accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
>>> accuracy

TensorFlow Run 

Create TensorFlow session 
Once the TensorFlow graph has been created, we have to create a TensorFlow session which is used to execute the graph: 
>>> session = tf.Session()
>>> session

Initialize variables 
The variables for weights and biases must be initialized before we start optimizing them: 
>>> session.run(tf.global_variables_initializer())

Helper-function to perform optimization iterations 
There are 50,000 images in the training-set. It takes a long time to calculate the gradient of the model using all these images. We therefore use Stochastic Gradient Descent which only uses a small batch of images in each iteration of the optimizer. 
  1. batch_size = 100  
Function for performing a number of optimization iterations so as to gradually improve the weights and biases of the model. In each iteration, a new batch of data is selected from the training-set and then TensorFlow executes the optimizer using those training samples. 
  1. def optimize(num_iterations, batch_size=100):  
  2.     for i in range(num_iterations):  
  3.         # Get a batch of training examples.  
  4.         # x_batch now holds a batch of images and  
  5.         # y_true_batch are the true labels for those_images.  
  6.         x_batch, y_true_batch = data.train.next_batch(batch_size)  
  7.   
  8.         # Put the batch into a dict with the proper names  
  9.         # for placeholder variables in the TensorFlow graph.  
  10.         # Note that the placeholder for y_true_cls is not set  
  11.         # because it is not used during training.  
  12.         feed_dict_train = {x: x_batch, y_true: y_true_batch}  
  13.   
  14.   
  15.         # Run the optimizer using this batch of training data.  
  16.         # TensorFlow assigns the variables in feed_dict_train  
  17.         # to the placeholder variables and then runs the optimizer.  
  18.         session.run(optimizer, feed_dict_train)  
Helper-functions to show performance 
Dict with the test-set data to be used as input to the TensorFlow graph. Note that we must use the correct names for the placeholder variables in the TensorFlow graph: 
  1. feed_dict_test = {x: data.test.images, y_true: data.test.labels, y_true_cls: data.test.cls}  
Function for printing the classification accuracy on the test-set: 
  1. def print_accuracy(accuracy=accuracy):  
  2.     # Use TensorFlow to compute the accuracy.  
  3.     acc = session.run(accuracy, feed_dict_test)  
  4.     # Print the accuracy.  
  5.     print("Accuracy on test-set: {0:.1%}".format(acc))  
Function for printing and plotting the confusion matrix using scikit-learn: 
  1. def print_confusion_matrix():  
  2.     # Get the true classifications for the test-set  
  3.     cls_true = data.test.cls  
  4.     # Get the predicted classifications for the test-set  
  5.     cls_pred = session.run(y_pred_cls, feed_dict=feed_dict_test)  
  6.     # Get the confusion matrix using sklearn.  
  7.     cm = confusion_matrix(y_true=cls_true, y_pred=cls_pred)  
  8.     # Print the confusion matrix as text.  
  9.     print(cm)  
Function for plotting examples of images from the test-set that have been mis-classified. 
  1. def plot_example_errors():  
  2.     # Use TensorFlow to get a list of boolean values  
  3.     # whether each test-image has been correctly classified,  
  4.     # and a list for the predicted class of each image.  
  5.     correct, cls_pred = session.run([correct_prediction, y_pred_cls],  
  6.                                     feed_dict=feed_dict_test)  
  7.   
  8.     # Negate the boolean array.  
  9.     incorrect = (correct == False)  
  10.       
  11.     # Get the images from the test-set that have been  
  12.     # incorrectly classified.  
  13.     images = data.test.images[incorrect]  
  14.       
  15.     # Get the predicted classes for those images.  
  16.     cls_pred = cls_pred[incorrect]  
  17.   
  18.     # Get the true classes for those images.  
  19.     cls_true = data.test.cls[incorrect]  
  20.       
  21.     # Plot the first 9 images.  
  22.     plot_images(images=images[0:9],  
  23.                 cls_true=cls_true[0:9],  
  24.                 cls_pred=cls_pred[0:9])  
Helper-function to plot the model weights 
Function for plotting the weights of the model. 10 images are plotted, one for each digit that the model is trained to recognize. 
  1. def plot_weights():  
  2.     # Get the values for the weights from the TensorFlow variable.  
  3.     w = session.run(weights)  
  4.       
  5.     # Get the lowest and highest values for the weights.  
  6.     # This is used to correct the colour intensity across  
  7.     # the images so they can be compared with each other.  
  8.     w_min = np.min(w)  
  9.     w_max = np.max(w)  
  10.   
  11.     # Create figure with 3x4 sub-plots,  
  12.     # where the last 2 sub-plots are unused.  
  13.     fig, axes = plt.subplots(34)  
  14.     fig.subplots_adjust(hspace=0.3, wspace=0.3)  
  15.   
  16.     for i, ax in enumerate(axes.flat):  
  17.         # Only use the weights for the first 10 sub-plots.  
  18.         if i<10:  
  19.             # Get the weights for the i'th digit and reshape it.  
  20.             # Note that w.shape == (img_size_flat, 10)  
  21.             image = w[:, i].reshape(img_shape)  
  22.   
  23.             # Set the label for the sub-plot.  
  24.             ax.set_xlabel("Weights: {0}".format(i))  
  25.   
  26.             # Plot the image.  
  27.             ax.imshow(image, vmin=w_min, vmax=w_max, cmap='seismic')  
  28.   
  29.         # Remove ticks from each sub-plot.  
  30.         ax.set_xticks([])  
  31.         ax.set_yticks([])  
Performance before any optimization 
The accuracy on the test-set is 9.8%. This is because the model has only been initialized and not optimized at all, so it always predicts that the image shows a zero digit, as demonstrated in the plot below, and it turns out that 9.8% of the images in the test-set happens to be zero digits. 
>>> print_accuracy()
Accuracy on test-set: 9.8%

>>> plot_example_errors()

Performance after 1 optimization iteration 
Already after a single optimization iteration, the model has increased its accuracy on the test-set to 40.7% up from 9.8%. This means that it mis-classifies the images about 6 out of 10 times, as demonstrated on a few examples below. 
>>> optimize(num_iterations=1)
>>> print_accuracy()
Accuracy on test-set: 40.7%
>>> plot_example_errors()

The weights can also be plotted as shown below. Positive weights are red and negative weights are blue. These weights can be intuitively understood as image-filters. For example, the weights used to determine if an image shows a zero-digit have a positive reaction (red) to an image of a circle, and have a negative reaction (blue) to images with content in the centre of the circle. 

Note that the weights mostly look like the digits they're supposed to recognize. This is because only one optimization iteration has been performed so the weights are only trained on 100 images. After training on several thousand images, the weights become more difficult to interpret because they have to recognize many variations of how digits can be written. 


Performance after 10 optimization iterations 
>>> optimize(num_iterations=9) # We have already performed 1 iteration.
>>> print_accuracy()
Accuracy on test-set: 78.2%
>>> plot_example_errors()


>>> plot_weights()

Performance after 1000 optimization iterations 
After 1000 optimization iterations, the model only mis-classifies about one in ten images. As demonstrated below, some of the mis-classifications are justified because the images are very hard to determine with certainty even for humans, while others are quite obvious and should have been classified correctly by a good model. But this simple model cannot reach much better performance and more complex models are therefore needed. 
>>> optimize(num_iterations=990) # We have already performed 10 iteration.
>>> print_accuracy()
Accuracy on test-set: 92.1%
>>> plot_example_errors()

The model has now been trained for 1000 optimization iterations, with each iteration using 100 images from the training-set. Because of the great variety of the images, the weights have now become difficult to interpret and we may doubt whether the model truly understands how digits are composed from lines, or whether the model has just memorized many different variations of pixels. 
>>> plot_weights()

We can also print and plot the so-called confusion matrix which lets us see more details about the mis-classifications. For example, it shows that images actually depicting a 5 have sometimes been mis-classified as all other possible digits, but mostly either 3, 6 or 8. 


We are now done using TensorFlow, so we close the session to release its resources. 
>>> session.close()

Supplement 
TensorFlow Tutorial #02 Convolutional Neural Network (02_Convolutional_Neural_Network.ipynb)

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...