Open In Colab   Open in Kaggle

Transfer Learning

By Neuromatch Academy

Content creators: Jama Hussein Mohamud & Alex Hernandez-Garcia

Production editors: Saeed Salehi, Spiros Chavlis

Our 2021 Sponsors, including Presenting Sponsor Facebook Reality Labs


Objective

One desired capability for machines is the ability to transfer the knowledge (features) learned on one domain to another This can potentially save compute time, enable training when data is scarce, and even improve performance. Unfortunately, there is no single recipe for transfer learning and instead multiple options are possible and much remains to be well understood. In this project, you will explore how transfer learning works in different scenarios.


Setup

# imports
import os
import gc
import csv
import glob
import torch
import multiprocessing

import numpy as np
import pandas as pd
import torch.nn as nn
import matplotlib.pyplot as plt

import torch.optim as optim
import torch.nn.functional as F
import torch.backends.cudnn as cudnn
from torch.autograd import Variable

import torchvision
import torchvision.transforms as transforms

Set random seed

Executing set_seed(seed=seed) you are setting the seed

# @title Set random seed

# @markdown Executing `set_seed(seed=seed)` you are setting the seed

# for DL its critical to set the random seed so that students can have a
# baseline to compare their results to expected results.
# Read more here: https://pytorch.org/docs/stable/notes/randomness.html

# Call `set_seed` function in the exercises to ensure reproducibility.
import random
import torch

def set_seed(seed=None, seed_torch=True):
  if seed is None:
    seed = np.random.choice(2 ** 32)
  random.seed(seed)
  np.random.seed(seed)
  if seed_torch:
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

  print(f'Random seed {seed} has been set.')

# In case that `DataLoader` is used
def seed_worker(worker_id):
  worker_seed = torch.initial_seed() % 2**32
  np.random.seed(worker_seed)
  random.seed(worker_seed)

Set device (GPU or CPU)

# @title Set device (GPU or CPU)

# inform the user if the notebook uses GPU or CPU.

def set_device():
  device = "cuda" if torch.cuda.is_available() else "cpu"
  if device != "cuda":
    print("WARNING: For this notebook to perform best, "
        "if possible, in the menu under `Runtime` -> "
        "`Change runtime type.`  select `GPU` ")
  else:
    print("GPU is enabled in this notebook.")

  return device

Random seeds

If you want to obtain reproducible results, it is a good practice to set seeds for the random number generators of the various libraries

set_seed(seed=2021)
device = set_device()
Random seed 2021 has been set.
GPU is enabled in this notebook.

Training hyperparameters

Here we set some general training hyperparameters such as the learning rate, batch size, etc. as well as other training options such as including data augmentation (torchvision_transforms).

# hyper-parameters
use_cuda = torch.cuda.is_available()
best_acc = 0  # best test accuracy
start_epoch = 0  # start from epoch 0 or last checkpoint epoch
batch_size = 128
max_epochs = 15  # Please change this to 200
max_epochs_target = 10
base_learning_rate = 0.1
torchvision_transforms = True  # True/False if you want use torchvision augmentations

Data

Source dataset

We will train the source model using CIFAR-100 data set from PyTorch, but with small tweaks we can get any other data we are interested in.

Note that the data set is normalised by substracted the mean and dividing by the standard deviation (pre-computed) of the training set. Also, if torchvision_transforms is True, data augmentation will be applied during training.

Download and prepare Data

# @markdown Download and prepare Data
print('==> Preparing data..')
def percentageSplit(full_dataset, percent = 0.0):
  set1_size = int(percent * len(full_dataset))
  set2_size = len(full_dataset) - set1_size
  final_dataset, _ = torch.utils.data.random_split(full_dataset, [set1_size, set2_size])
  return final_dataset


# CIFAR100 normalizing
mean = [0.5071, 0.4866, 0.4409]
std = [0.2673, 0.2564, 0.2762]

# CIFAR10 normalizing
# mean = (0.4914, 0.4822, 0.4465)
# std = (0.2023, 0.1994, 0.2010)

# torchvision transforms
transform_train = transforms.Compose([])
if torchvision_transforms:
  transform_train.transforms.append(transforms.RandomCrop(32, padding=4))
  transform_train.transforms.append(transforms.RandomHorizontalFlip())

transform_train.transforms.append(transforms.ToTensor())
transform_train.transforms.append(transforms.Normalize(mean, std))

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])

trainset = torchvision.datasets.CIFAR100(
  root='./CIFAR100', train=True, download=True, transform=transform_train)

testset = torchvision.datasets.CIFAR100(
  root='./CIFAR100', train=False, download=True, transform=transform_test)
==> Preparing data..
Downloading https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz to ./CIFAR100/cifar-100-python.tar.gz
Extracting ./CIFAR100/cifar-100-python.tar.gz to ./CIFAR100
Files already downloaded and verified

CIFAR-100

CIFAR-100 is a data set of 50,000 colour (RGB) training images and 10,000 test images, of size 32 x 32 pixels. Each image is labelled as 1 of 100 possible classes.

The data set is stored as a custom torchvision.datasets.cifar.CIFAR object. You can check some of its properties with the following code:

print(f"Object type: {type(trainset)}")
print(f"Training data shape: {trainset.data.shape}")
print(f"Test data shape: {testset.data.shape}")
print(f"Number of classes: {np.unique(trainset.targets).shape[0]}")
Object type: <class 'torchvision.datasets.cifar.CIFAR100'>
Training data shape: (50000, 32, 32, 3)
Test data shape: (10000, 32, 32, 3)
Number of classes: 100

Data loaders

A dataloader is an optimized data iterator that provides functionality for efficient shuffling, transformation and batching of the data.

Dataloader

##@title Dataloader
num_workers = multiprocessing.cpu_count()

print(f'----> number of workers: {num_workers}')

trainloader = torch.utils.data.DataLoader(
    trainset, batch_size=batch_size, shuffle=True, num_workers=num_workers)
testloader = torch.utils.data.DataLoader(
    testset, batch_size=batch_size, shuffle=False, num_workers=num_workers)
----> number of workers: 2

Architecture: ResNet

ResNet is a family of network architectures whose main property is that the network is organised as a stack of residual blocks. Residual blocks consist of a stack of layers whose output is added the input, making a shortcut connection.

See the original paper for more details.

ResNet is just a popular choice out of many others, but data augmentation works well in general. We just picked ResNet for illustration purposes.

ResNet model in PyTorch

# @title ResNet model in PyTorch

class BasicBlock(nn.Module):
  """ResNet in PyTorch.
      Reference:
      [1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
        Deep Residual Learning for Image Recognition. arXiv:1512.03385
  """

  expansion = 1

  def __init__(self, in_planes, planes, stride=1):
    super(BasicBlock, self).__init__()
    self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
    self.bn1 = nn.BatchNorm2d(planes)
    self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)
    self.bn2 = nn.BatchNorm2d(planes)

    self.shortcut = nn.Sequential()
    if stride != 1 or in_planes != self.expansion*planes:
        self.shortcut = nn.Sequential(
            nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False),
            nn.BatchNorm2d(self.expansion*planes)
        )

  def forward(self, x):
    out = F.relu(self.bn1(self.conv1(x)))
    out = self.bn2(self.conv2(out))
    out += self.shortcut(x)
    out = F.relu(out)
    return out


class Bottleneck(nn.Module):
  expansion = 4

  def __init__(self, in_planes, planes, stride=1):
    super(Bottleneck, self).__init__()
    self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
    self.bn1 = nn.BatchNorm2d(planes)
    self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
    self.bn2 = nn.BatchNorm2d(planes)
    self.conv3 = nn.Conv2d(planes, self.expansion*planes, kernel_size=1, bias=False)
    self.bn3 = nn.BatchNorm2d(self.expansion*planes)

    self.shortcut = nn.Sequential()
    if stride != 1 or in_planes != self.expansion*planes:
        self.shortcut = nn.Sequential(
            nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False),
            nn.BatchNorm2d(self.expansion*planes)
        )

  def forward(self, x):
    out = F.relu(self.bn1(self.conv1(x)))
    out = F.relu(self.bn2(self.conv2(out)))
    out = self.bn3(self.conv3(out))
    out += self.shortcut(x)
    out = F.relu(out)
    return out


class ResNet(nn.Module):
  def __init__(self, block, num_blocks, num_classes=100):
    super(ResNet, self).__init__()
    self.in_planes = 64

    self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
    self.bn1 = nn.BatchNorm2d(64)
    self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
    self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
    self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
    self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
    self.linear = nn.Linear(512*block.expansion, num_classes)

  def _make_layer(self, block, planes, num_blocks, stride):
    strides = [stride] + [1]*(num_blocks-1)
    layers = []
    for stride in strides:
      layers.append(block(self.in_planes, planes, stride))
      self.in_planes = planes * block.expansion
    return nn.Sequential(*layers)

  def forward(self, x):
    out = F.relu(self.bn1(self.conv1(x)))
    out = self.layer1(out)
    out = self.layer2(out)
    out = self.layer3(out)
    out = self.layer4(out)
    out = F.avg_pool2d(out, 4)
    out = out.view(out.size(0), -1)
    out = self.linear(out)
    return out


def ResNet18():
  return ResNet(BasicBlock, [2, 2, 2, 2])


def ResNet34():
  return ResNet(BasicBlock, [3, 4, 6, 3])


def ResNet50():
  return ResNet(Bottleneck, [3, 4, 6, 3])

Test on random data

# Load the Model
net = ResNet18()
print('-----> verify if model is run on random data')
y = net(Variable(torch.randn(1,3,32,32)))
print('model loaded')

result_folder = './results/'
if not os.path.exists(result_folder):
    os.makedirs(result_folder)

logname = result_folder + net.__class__.__name__ + '_pretrain' + '.csv'

if use_cuda:
  net.cuda()
  net = torch.nn.DataParallel(net)
  print('Using', torch.cuda.device_count(), 'GPUs.')
  cudnn.benchmark = True
  print('Using CUDA..')
-----> verify if model is run on random data
model loaded
Using 1 GPUs.
Using CUDA..

Set up training

Set loss function and optimizer

We use the cross entropy loss, commonly used for classification, and stochastic gradient descent (SGD) as optimizer, with momentum and weight decay.

# Optimizer and criterion

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=base_learning_rate, momentum=0.9, weight_decay=1e-4)

Train and test loops

# Training & Test functions

def train(net, epoch, use_cuda=True):
  print('\nEpoch: %d' % epoch)
  net.train()
  train_loss = 0
  correct = 0
  total = 0
  for batch_idx, (inputs, targets) in enumerate(trainloader):
    if use_cuda:
      inputs, targets = inputs.cuda(), targets.cuda()

    optimizer.zero_grad()
    inputs, targets = Variable(inputs), Variable(targets)
    outputs = net(inputs)
    loss = criterion(outputs, targets)
    loss.backward()
    optimizer.step()

    train_loss += loss.item()
    _, predicted = torch.max(outputs.data, 1)
    total += targets.size(0)
    correct += predicted.eq(targets.data).cpu().sum()

    if batch_idx % 500 == 0:
      print(batch_idx, len(trainloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
          % (train_loss/(batch_idx+1), 100.*correct/total, correct, total))
  return (train_loss/batch_idx, 100.*correct/total)


def test(net, epoch, outModelName, use_cuda=True):
  global best_acc
  net.eval()
  test_loss, correct, total = 0, 0, 0
  with torch.no_grad():
    for batch_idx, (inputs, targets) in enumerate(testloader):
      if use_cuda:
        inputs, targets = inputs.cuda(), targets.cuda()

      outputs = net(inputs)
      loss = criterion(outputs, targets)

      test_loss += loss.item()
      _, predicted = torch.max(outputs.data, 1)
      total += targets.size(0)
      correct += predicted.eq(targets.data).cpu().sum()

      if batch_idx % 200 == 0:
        print(batch_idx, len(testloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
            % (test_loss/(batch_idx+1), 100.*correct/total, correct, total))

  # Save checkpoint.
  acc = 100.*correct/total
  if acc > best_acc:
    best_acc = acc
    checkpoint(net, acc, epoch, outModelName)
  return (test_loss/batch_idx, 100.*correct/total)

Auxiliary functions

  • checkpoint(): Store checkpoints of the model

  • adjust_learning_rate(): Decreases the learning rate (learning rate decay) at certain epochs of training.

# checkpoint & adjust_learning_rate
def checkpoint(model, acc, epoch, outModelName):
  # Save checkpoint.
  print('Saving..')
  state = {
      'state_dict': model.state_dict(),
      'acc': acc,
      'epoch': epoch,
      'rng_state': torch.get_rng_state()
  }
  if not os.path.isdir('checkpoint'):
      os.mkdir('checkpoint')
  torch.save(state, f'./checkpoint/{outModelName}.t7')

def adjust_learning_rate(optimizer, epoch):
  """decrease the learning rate at 100 and 150 epoch"""
  lr = base_learning_rate
  if epoch <= 9 and lr > 0.1:
    # warm-up training for large minibatch
    lr = 0.1 + (base_learning_rate - 0.1) * epoch / 10.
  if epoch >= 100:
    lr /= 10
  if epoch >= 150:
    lr /= 10
  for param_group in optimizer.param_groups:
    param_group['lr'] = lr

Train the model

This is the loop where the model is trained for max_epochs epochs.

# Start training
outModelName = 'pretrain'
if not os.path.exists(logname):
  with open(logname, 'w') as logfile:
      logwriter = csv.writer(logfile, delimiter=',')
      logwriter.writerow(['epoch', 'train loss', 'train acc', 'test loss', 'test acc'])

for epoch in range(start_epoch, max_epochs):
  adjust_learning_rate(optimizer, epoch)
  train_loss, train_acc = train(net, epoch, use_cuda=use_cuda)
  test_loss, test_acc = test(net, epoch, outModelName, use_cuda=use_cuda)
  with open(logname, 'a') as logfile:
    logwriter = csv.writer(logfile, delimiter=',')
    logwriter.writerow([epoch, train_loss, train_acc.item(), test_loss, test_acc.item()])
  print(f'Epoch: {epoch} | train acc: {train_acc} | test acc: {test_acc}')
Epoch: 0
0 391 Loss: 4.748 | Acc: 0.781% (1/128)
0 79 Loss: 3.545 | Acc: 12.500% (16/128)
Saving..
Epoch: 0 | train acc: 8.527999877929688 | test acc: 13.369999885559082

Epoch: 1
0 391 Loss: 3.597 | Acc: 16.406% (21/128)
0 79 Loss: 3.157 | Acc: 23.438% (30/128)
Saving..
Epoch: 1 | train acc: 18.392000198364258 | test acc: 21.829999923706055

Epoch: 2
0 391 Loss: 2.932 | Acc: 26.562% (34/128)
0 79 Loss: 2.450 | Acc: 39.844% (51/128)
Saving..
Epoch: 2 | train acc: 27.016000747680664 | test acc: 31.079999923706055

Epoch: 3
0 391 Loss: 2.649 | Acc: 35.938% (46/128)
0 79 Loss: 2.134 | Acc: 39.844% (51/128)
Saving..
Epoch: 3 | train acc: 35.84000015258789 | test acc: 35.70000076293945

Epoch: 4
0 391 Loss: 2.153 | Acc: 41.406% (53/128)
0 79 Loss: 1.911 | Acc: 49.219% (63/128)
Saving..
Epoch: 4 | train acc: 42.827999114990234 | test acc: 43.619998931884766

Epoch: 5
0 391 Loss: 1.878 | Acc: 50.000% (64/128)
0 79 Loss: 2.149 | Acc: 43.750% (56/128)
Saving..
Epoch: 5 | train acc: 48.87200164794922 | test acc: 45.380001068115234

Epoch: 6
0 391 Loss: 1.814 | Acc: 51.562% (66/128)
0 79 Loss: 1.847 | Acc: 46.875% (60/128)
Saving..
Epoch: 6 | train acc: 53.59000015258789 | test acc: 50.310001373291016

Epoch: 7
0 391 Loss: 1.514 | Acc: 56.250% (72/128)
0 79 Loss: 1.568 | Acc: 51.562% (66/128)
Saving..
Epoch: 7 | train acc: 57.35200119018555 | test acc: 54.209999084472656

Epoch: 8
0 391 Loss: 1.194 | Acc: 62.500% (80/128)
0 79 Loss: 1.403 | Acc: 59.375% (76/128)
Saving..
Epoch: 8 | train acc: 60.61600112915039 | test acc: 57.20000076293945

Epoch: 9
0 391 Loss: 1.124 | Acc: 69.531% (89/128)
0 79 Loss: 1.339 | Acc: 64.844% (83/128)
Saving..
Epoch: 9 | train acc: 63.55400085449219 | test acc: 58.900001525878906

Epoch: 10
0 391 Loss: 1.013 | Acc: 72.656% (93/128)
0 79 Loss: 1.225 | Acc: 66.406% (85/128)
Epoch: 10 | train acc: 65.91999816894531 | test acc: 58.83000183105469

Epoch: 11
0 391 Loss: 0.971 | Acc: 64.844% (83/128)
0 79 Loss: 1.491 | Acc: 63.281% (81/128)
Epoch: 11 | train acc: 68.05000305175781 | test acc: 57.560001373291016

Epoch: 12
0 391 Loss: 1.028 | Acc: 70.312% (90/128)
0 79 Loss: 1.358 | Acc: 63.281% (81/128)
Saving..
Epoch: 12 | train acc: 69.99600219726562 | test acc: 60.099998474121094

Epoch: 13
0 391 Loss: 0.699 | Acc: 82.812% (106/128)
0 79 Loss: 1.299 | Acc: 63.281% (81/128)
Saving..
Epoch: 13 | train acc: 71.9739990234375 | test acc: 60.220001220703125

Epoch: 14
0 391 Loss: 0.768 | Acc: 75.781% (97/128)
0 79 Loss: 1.182 | Acc: 74.219% (95/128)
Saving..
Epoch: 14 | train acc: 73.69400024414062 | test acc: 62.90999984741211

Transfer learning

Re-use the trained model to improve training on a different data set

Delete variables from the previous model

# delete the backbone network
delete = True
if delete:
  del net
  del trainset
  del testset
  del trainloader
  del testloader
  gc.collect()

Target dataset

We will now use CIFAR-10 as target data set. Again, with small tweaks we can get any other data we are interested in.

CIFAR-10 is very similar to CIFAR-100, but it contains only 10 classes instead of 100.

# Target domain Data
print('==> Preparing target domain data..')

# CIFAR10 normalizing
mean = (0.4914, 0.4822, 0.4465)
std = (0.2023, 0.1994, 0.2010)
num_classes = 10
lr = 0.0001

# torchvision transforms
transform_train = transforms.Compose([])
if torchvision_transforms:
    transform_train.transforms.append(transforms.RandomCrop(32, padding=4))
    transform_train.transforms.append(transforms.RandomHorizontalFlip())

transform_train.transforms.append(transforms.ToTensor())
transform_train.transforms.append(transforms.Normalize(mean, std))

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])

trainset = torchvision.datasets.CIFAR10(
    root='./CIFAR10', train=True, download=True, transform=transform_train)

testset = torchvision.datasets.CIFAR10(
    root='./CIFAR10', train=False, download=True, transform=transform_test)
==> Preparing target domain data..
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./CIFAR10/cifar-10-python.tar.gz
Extracting ./CIFAR10/cifar-10-python.tar.gz to ./CIFAR10
Files already downloaded and verified

Select a subset of the data

To simulate a lower data regime, where transfer learning can be useful.

Choose percentage from the trainset. Set percent = 1.0 to use the whole train data

percent = 0.6

trainset = percentageSplit(trainset, percent = percent)
print('size of the new trainset: ', len(trainset))
size of the new trainset:  30000

Dataloaders

As before

# Dataloader
num_workers = multiprocessing.cpu_count()

print(f'----> number of workers: {num_workers}')

trainloader = torch.utils.data.DataLoader(
    trainset, batch_size=batch_size, shuffle=True, num_workers=num_workers)
testloader = torch.utils.data.DataLoader(
    testset, batch_size=batch_size, shuffle=False, num_workers=num_workers)
----> number of workers: 2

Load pre-trained model

Load the checkpoint of the model previously trained on CIFAR-100

model = ResNet18()

checkpointPath = '/content/checkpoint/pretrain.t7'

print(' ===> loading pretrained model from: ', checkpointPath)
if os.path.isfile(checkpointPath):
  state_dict = torch.load(checkpointPath)
  best_acc = state_dict['acc']
  print('Best Accuracy:', best_acc)
  if "state_dict" in state_dict:
      state_dict = state_dict["state_dict"]
  # remove prefixe "module."
  state_dict = {k.replace("module.", ""): v for k, v in state_dict.items()}
  for k, v in model.state_dict().items():
    if k not in list(state_dict):
      print('key "{}" could not be found in provided state dict'.format(k))
    elif state_dict[k].shape != v.shape:
      print('key "{}" is of different shape in model and provided state dict'.format(k))
      state_dict[k] = v
  msg = model.load_state_dict(state_dict, strict=False)
  print("Load pretrained model with msg: {}".format(msg))
else:
  raise Exception('No pretrained weights found')
 ===> loading pretrained model from:  /content/checkpoint/pretrain.t7
Best Accuracy: tensor(62.9100)
Load pretrained model with msg: <All keys matched successfully>

Freeze model parameters

In transfer learning, we usually do not re-train all the weights of the model, but only a subset of them, for instance the last layer. Here we first freeze all the parameters of the model, and we will unfreeze one layer below.

# Freeze the model parameters, you can also freeze some layers only

for param in model.parameters():
  param.requires_grad = False

Loss function, optimizer and unfreeze last layer

num_ftrs = model.linear.in_features
model.linear = nn.Linear(num_ftrs, num_classes)

model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(
    model.linear.parameters(),
    lr=lr,
    momentum=0.9,
    weight_decay=1e-4,
)

Check number of parameters

We can calculate the number of total parameters and the number of trainable parameters, that is those that will be updated during training. Since we have freezed most of the parameters, the number of training parameters should be much smaller.

total_params = sum(p.numel() for p in model.parameters())
trainable_total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print('Total Parameters:', total_params, 'Trainable parameters: ', trainable_total_params)
Total Parameters: 11173962 Trainable parameters:  5130

Train the target model

outModelName = 'finetuned'
logname = result_folder + model.__class__.__name__ + f'_{outModelName}.csv'

if not os.path.exists(logname):
  with open(logname, 'w') as logfile:
    logwriter = csv.writer(logfile, delimiter=',')
    logwriter.writerow(['epoch', 'train loss', 'train acc', 'test loss', 'test acc'])

for epoch in range(start_epoch, max_epochs_target):
  adjust_learning_rate(optimizer, epoch)
  train_loss, train_acc = train(model, epoch, use_cuda=use_cuda)
  test_loss, test_acc = test(model, epoch, outModelName, use_cuda=use_cuda)
  with open(logname, 'a') as logfile:
    logwriter = csv.writer(logfile, delimiter=',')
    logwriter.writerow([epoch, train_loss, train_acc.item(), test_loss, test_acc.item()])
  print(f'Epoch: {epoch} | train acc: {train_acc} | test acc: {test_acc}')
Epoch: 0
0 235 Loss: 2.302 | Acc: 16.406% (21/128)
0 79 Loss: 0.630 | Acc: 79.688% (102/128)
Saving..
Epoch: 0 | train acc: 71.086669921875 | test acc: 75.08999633789062

Epoch: 1
0 235 Loss: 0.757 | Acc: 72.656% (93/128)
0 79 Loss: 0.619 | Acc: 79.688% (102/128)
Saving..
Epoch: 1 | train acc: 75.86000061035156 | test acc: 76.54000091552734

Epoch: 2
0 235 Loss: 0.666 | Acc: 75.781% (97/128)
0 79 Loss: 0.640 | Acc: 78.125% (100/128)
Saving..
Epoch: 2 | train acc: 77.04000091552734 | test acc: 76.55000305175781

Epoch: 3
0 235 Loss: 0.579 | Acc: 81.250% (104/128)
0 79 Loss: 0.577 | Acc: 79.688% (102/128)
Saving..
Epoch: 3 | train acc: 77.56999969482422 | test acc: 77.2300033569336

Epoch: 4
0 235 Loss: 0.661 | Acc: 78.125% (100/128)
0 79 Loss: 0.613 | Acc: 76.562% (98/128)
Saving..
Epoch: 4 | train acc: 77.6866683959961 | test acc: 77.44999694824219

Epoch: 5
0 235 Loss: 0.627 | Acc: 80.469% (103/128)
0 79 Loss: 0.626 | Acc: 80.469% (103/128)
Epoch: 5 | train acc: 78.163330078125 | test acc: 77.37999725341797

Epoch: 6
0 235 Loss: 0.602 | Acc: 77.344% (99/128)
0 79 Loss: 0.607 | Acc: 78.125% (100/128)
Saving..
Epoch: 6 | train acc: 78.42333221435547 | test acc: 78.02999877929688

Epoch: 7
0 235 Loss: 0.537 | Acc: 75.781% (97/128)
0 79 Loss: 0.608 | Acc: 79.688% (102/128)
Saving..
Epoch: 7 | train acc: 78.49333190917969 | test acc: 78.1500015258789

Epoch: 8
0 235 Loss: 0.578 | Acc: 75.781% (97/128)
0 79 Loss: 0.650 | Acc: 76.562% (98/128)
Epoch: 8 | train acc: 78.15333557128906 | test acc: 76.83000183105469

Epoch: 9
0 235 Loss: 0.583 | Acc: 77.344% (99/128)
0 79 Loss: 0.616 | Acc: 77.344% (99/128)
Saving..
Epoch: 9 | train acc: 78.66999816894531 | test acc: 78.20999908447266

Plot results

# title plot results
results = pd.read_csv(f'/content/results/ResNet_{outModelName}.csv', sep =',')
results.head()
epoch train loss train acc test loss test acc
0 0 0.819690 71.086670 0.713260 75.089996
1 1 0.680940 75.860001 0.674431 76.540001
2 2 0.650245 77.040001 0.675883 76.550003
3 3 0.638555 77.570000 0.652776 77.230003
4 4 0.630500 77.686668 0.666428 77.449997
train_accuracy = results['train acc'].values
test_accuracy = results['test acc'].values

print(f'Average Accuracy over {max_epochs_target} epochs:', sum(test_accuracy)//len(test_accuracy))
print(f'best accuraccy over {max_epochs_target} epochs:', max(test_accuracy))
Average Accuracy over 10 epochs: 77.0
best accuraccy over 10 epochs: 78.20999908447266
figureName = 'figure' # change figure name

plt.plot(results['epoch'].values, train_accuracy, label='train')
plt.plot(results['epoch'].values, test_accuracy, label='test')
plt.xlabel('Number of epochs')
plt.ylabel('Accuracy')
plt.title(f'Train/Test Accuracy curve for {max_epochs} epochs')
plt.savefig(f'/content/results/{figureName}.png')
plt.legend()
plt.show()
../../_images/transfer_learning_60_0.png