How to train MNIST with FastAI
After reading chapter 4 of Deep Learning for Coders with Fastai and PyTorch: AI Applications Without a PhD, AKA "Fastbook", I can train very easily with FastAI.
!pip install -Uqq fastbook
Although it is discouraged to import *
in python progarmming environment, in deep learning enviornment, it is actually encouraged. Rather than importing libraries as needed one by one, it is easier to load everything needed before start exploring. It is better to have it and not need it than need it and not have it.
from fastai.vision.all import *
We are going to use MNIST handwritten data. With FastAI, it is very easy to download data into our path.
path = untar_data(URLs.MNIST)
path.ls()
Now that we have data, we need a datablock, which is a template for how data should be processed.
Here is how our template is made:
blocks=(ImageBlock, CategoryBlock)
means inputs are images and labels are multiple categories.get_items=get_image_files
specifies it is taking image files.splitter=RandomSplitter(seed=42)
randomly sets aside 20 percent of whole dataset for validation so that we can check for overfitting. Although MNIST dataset already has validation set, we do not have to use as suggested.get_y=parent_label
specifies how our data gets labels from the data. In this dataset, each image's parent directory informs us what kind of digit it is.
digits = DataBlock(blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(seed=42),
get_y=parent_label)
dls = digits.dataloaders(path)
Now that we have the dataloaders, we can take a look at the data with dls.showbatch()
.
dls.show_batch()
It looks good. Each image has a correct label. It is time to train our model with the data. Instead of making our models from scratch, we will use pretrained model because we can save time and resources. With cnn_learner
, we use resnet18 and set our metrics as error_rate. Then we fine_tune
our model, which means we remove the last layer of resnet18 and replace it with our custom one, which will categorize what kind of digit it is. Also, this last layer, which is also called 'head', is the only layer we are training. All other layers remain the same.
learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(2)
FastAI will use GPU automatically if it is available. On a google colab GPU server, it took about six minutes to train with error rate close to 1%.
It is very easy to get started with FastAI because everything is already tuned for best practices without us trying to come up with everything in the beginning. When first training a model, this can be a quick baseline for us to compare with. With this baseline, we can figure out how more complex model is performing.