This time, we will be using Pytorch to train MNIST handwritten digits. Compared to FastAI, it involes more steps, but it is easier compared to using Python without using any 3rd party library. If you are curious, check out first version and second version. I wrote second version without using any class just for fun.

First, we import libraries.

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda
import matplotlib.pyplot as plt

Then we grab MNIST data with torchvision datasets. We can tell Pytorch how to manipulate the dataset by giving details.

  • root: Where to store the data. We are storing it in data directory.
  • train: Whether to grab training dataset or testing dataset. Given True value, training_data is a training dataset from MNIST. On the other hand, test_data is a testing dataset from MNIST.
  • download: Whether to download if data is not already in root. We passed True to download the dataset.
  • transform: What to do with data. We are converting our images of handwritten digits into Pytorch tensors so that we can train our model.
training_data = datasets.MNIST(
    root='data',
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.MNIST(
    root='data',
    train=False,
    download=True,
    transform=ToTensor()
)
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz
Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz
Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz
Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz
Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw

/usr/local/lib/python3.7/dist-packages/torchvision/datasets/mnist.py:498: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:180.)
  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)

If GPU is available, we use it to speed up the training. We can think of CPUs as sports cars and GPUs as semi-trailer trucks. Although sports cars are fast, it is not very efficient to carry big loads. On the other hand, smi-trailer trucks can easily transport freights at long distances.

We do not need to use any GPU here as we are not training a big dataset.

device = 'cuda' if torch.cuda.is_available() else 'cpu'
device
'cpu'

Now, we look at each data to see how they look. We squeeze each tensor, which means to get rid of any dimension with 1 This is just like normal shapes for images (28 x 28).

training_data[0][0].shape
torch.Size([1, 28, 28])
training_data[0][0].squeeze().shape
torch.Size([28, 28])
plt.imshow(training_data[0][0].squeeze(), cmap="gray");

Because training_data is composed of 60,000 tuples of images and labels, we can get the second value of the tuple to get the label.

training_data[0][1]
5
len(training_data)
60000

This is how we can look at multiple images at once. We can change size of our figure by changing the figsize, or cols and rows to change number of columns and rows.

figure = plt.figure(figsize=(8,8))
cols, rows = 4, 2
for i in range(1, cols * rows + 1):
    sample_idx = torch.randint(len(training_data), size=(1,)).item()
    img, label = training_data[sample_idx]
    figure.add_subplot(rows, cols, i)
    plt.title(label)
    plt.axis("off")
    plt.imshow(img.squeeze(), cmap="gray")
plt.show()

After exploring with data and find out that each image has a correct label, we can move on to creating a dataloader. A dataloader divides our data by a given batch_size and hands out each one to our model for training. So our train_dataloader will have 64 images per batch, which makes a total of 157 batches.

train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)
len(test_dataloader)
157

We can setup our model now. This is a simple model with two linear layers and one relu after flattening the input. Flattening an input basically means changing an image of size 28 * 28 into an image of size 784. Then linear layers extract features from images. For example, first linear layer extracts 512 features from 784. Also, it changes a shape of an output activation (calculated numbers) into 512 as well. Then, second linear layer does the same thing and changes into 10. These 10 activations are used predict what number it is (from 0 to 9).

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

Feed our model into GPU now if there is one.

model = NeuralNetwork().to(device)
model
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=10, bias=True)
  )
)

Here are some hyperparameters we set for our model. These can be seen as levers that affect performance of the model.

  • lr is a learning rate. It controls how fast our model learns.
  • bs is a batch size. This sets how many images are grouped together in one batch. After one batch, parameters (weights and biases) are updated.
  • epochs set how many times our model gets to see the whole dataset. The more epochs, the more likely our model is going to learn from the dataset. However, too many epochs will overfit our model. This can be seen as a student memorizing all the answers to the test after taking bunch of practice tests. As the student does not learn any material by keep memorizing answers, our model is just memorizing answers for the training set. As a result, the student won't do well on a new test, and our model won't do well on a testing data.
lr = 1e-3
bs = 64
epochs = 5
loss_fn = nn.CrossEntropyLoss()

We use an optimizer to update our parameters. By using stochastic gradient descent, it can automatically reduce the loss.

optimizer = torch.optim.SGD(model.parameters(), lr=lr)

Here is how we train our data and test our model. First, we grab xb (a batch of images) and yb (a batch of labels) from train_dataloader. We make a prediction and find a loss. Then, we zero our gradient and do a gradient descent, and update our weights and biases.

def train_data(model):
    for xb, yb in train_dataloader:
        preds = model(xb)
        loss = loss_fn(preds, yb)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    loss = loss.item()
    print(f"Train loss: {loss:>7f}")

Testing is almost the same as traing except for updating the parameters. Instead of updating our parameters, we grab a loss and a number of correct predictions. Then we divide loss by number of batches and divide number of corrects by size of our test dataset.

def test_data(model):
    num_batches = len(test_dataloader)
    size = len(test_dataloader.dataset)
    test_loss, corrects = 0, 0

    with torch.no_grad():
        for xb, yb in test_dataloader:
            preds = model(xb)
            test_loss += loss_fn(preds, yb).item()
            corrects += (preds.argmax(1) == yb).type(torch.float).sum().item()
    
    test_loss /= num_batches
    # test_loss = lo
    corrects /= size
    print(f"Test loss: \n Accuracy: {(100*corrects):>0.1f}%, Avg loss: {test_loss:>8f} \n")
for t in range(4):
    train_data(model)
    test_data(model)
Train loss: 2.105006
Test loss: 
 Accuracy: 65.3%, Avg loss: 2.078157 

Train loss: 1.838440
Test loss: 
 Accuracy: 73.9%, Avg loss: 1.783291 

Train loss: 1.515890
Test loss: 
 Accuracy: 76.9%, Avg loss: 1.447766 

Train loss: 1.216192
Test loss: 
 Accuracy: 79.7%, Avg loss: 1.162565 

Now we are done training our model. As we can see, there is more code required to train our model with Pytorch compared to using FastAI. But knowing Pytorch, we can guess how FastAI is implemented. We can also customize FastAI with Pytorch if we want to. With those tools, we can make great things!