Transfer Learning

Transfer Learning#

By Neuromatch Academy

Content creators: Jama Hussein Mohamud & Alex Hernandez-Garcia

Production editors: Saeed Salehi, Spiros Chavlis

Objective#

One desired capability for machines is the ability to transfer the knowledge (features) learned on one domain to another This can potentially save compute time, enable training when data is scarce, and even improve performance. Unfortunately, there is no single recipe for transfer learning and instead multiple options are possible and much remains to be well understood. In this project, you will explore how transfer learning works in different scenarios.

Setup#

# imports
import os
import gc
import csv
import glob
import torch
import multiprocessing

import numpy as np
import pandas as pd
import torch.nn as nn
import matplotlib.pyplot as plt

import torch.optim as optim
import torch.nn.functional as F
import torch.backends.cudnn as cudnn
from torch.autograd import Variable

import torchvision
import torchvision.transforms as transforms

Set random seed#

Executing set_seed(seed=seed) you are setting the seed

Set device (GPU or CPU)#

Random seeds#

If you want to obtain reproducible results, it is a good practice to set seeds for the random number generators of the various libraries

set_seed(seed=2021)
device = set_device()

Random seed 2021 has been set.
GPU is enabled in this notebook.

Training hyperparameters#

Here we set some general training hyperparameters such as the learning rate, batch size, etc. as well as other training options such as including data augmentation (torchvision_transforms).

# hyper-parameters
use_cuda = torch.cuda.is_available()
best_acc = 0  # best test accuracy
start_epoch = 0  # start from epoch 0 or last checkpoint epoch
batch_size = 128
max_epochs = 15  # Please change this to 200
max_epochs_target = 10
base_learning_rate = 0.1
torchvision_transforms = True  # True/False if you want use torchvision augmentations

Data#

Source dataset#

We will train the source model using CIFAR-100 data set from PyTorch, but with small tweaks we can get any other data we are interested in.

Note that the data set is normalised by substracted the mean and dividing by the standard deviation (pre-computed) of the training set. Also, if torchvision_transforms is True, data augmentation will be applied during training.

Download and prepare Data

==> Preparing data..
Downloading https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz to ./CIFAR100/cifar-100-python.tar.gz

Extracting ./CIFAR100/cifar-100-python.tar.gz to ./CIFAR100
Files already downloaded and verified

CIFAR-100#

CIFAR-100 is a data set of 50,000 colour (RGB) training images and 10,000 test images, of size 32 x 32 pixels. Each image is labelled as 1 of 100 possible classes.

The data set is stored as a custom torchvision.datasets.cifar.CIFAR object. You can check some of its properties with the following code:

print(f"Object type: {type(trainset)}")
print(f"Training data shape: {trainset.data.shape}")
print(f"Test data shape: {testset.data.shape}")
print(f"Number of classes: {np.unique(trainset.targets).shape[0]}")

Object type: <class 'torchvision.datasets.cifar.CIFAR100'>
Training data shape: (50000, 32, 32, 3)
Test data shape: (10000, 32, 32, 3)
Number of classes: 100

Data loaders#

A dataloader is an optimized data iterator that provides functionality for efficient shuffling, transformation and batching of the data.

Dataloader#

----> number of workers: 2

Architecture: ResNet#

ResNet is a family of network architectures whose main property is that the network is organised as a stack of residual blocks. Residual blocks consist of a stack of layers whose output is added the input, making a shortcut connection.

See the original paper for more details.

ResNet is just a popular choice out of many others, but data augmentation works well in general. We just picked ResNet for illustration purposes.

ResNet model in PyTorch#

Show code cell source Hide code cell source

# @title ResNet model in PyTorch

class BasicBlock(nn.Module):
  """ResNet in PyTorch.
      Reference:
      [1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
        Deep Residual Learning for Image Recognition. arXiv:1512.03385
  """

  expansion = 1

  def __init__(self, in_planes, planes, stride=1):
    super(BasicBlock, self).__init__()
    self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
    self.bn1 = nn.BatchNorm2d(planes)
    self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)
    self.bn2 = nn.BatchNorm2d(planes)

    self.shortcut = nn.Sequential()
    if stride != 1 or in_planes != self.expansion*planes:
        self.shortcut = nn.Sequential(
            nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False),
            nn.BatchNorm2d(self.expansion*planes)
        )

  def forward(self, x):
    out = F.relu(self.bn1(self.conv1(x)))
    out = self.bn2(self.conv2(out))
    out += self.shortcut(x)
    out = F.relu(out)
    return out


class Bottleneck(nn.Module):
  expansion = 4

  def __init__(self, in_planes, planes, stride=1):
    super(Bottleneck, self).__init__()
    self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
    self.bn1 = nn.BatchNorm2d(planes)
    self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
    self.bn2 = nn.BatchNorm2d(planes)
    self.conv3 = nn.Conv2d(planes, self.expansion*planes, kernel_size=1, bias=False)
    self.bn3 = nn.BatchNorm2d(self.expansion*planes)

    self.shortcut = nn.Sequential()
    if stride != 1 or in_planes != self.expansion*planes:
        self.shortcut = nn.Sequential(
            nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False),
            nn.BatchNorm2d(self.expansion*planes)
        )

  def forward(self, x):
    out = F.relu(self.bn1(self.conv1(x)))
    out = F.relu(self.bn2(self.conv2(out)))
    out = self.bn3(self.conv3(out))
    out += self.shortcut(x)
    out = F.relu(out)
    return out


class ResNet(nn.Module):
  def __init__(self, block, num_blocks, num_classes=100):
    super(ResNet, self).__init__()
    self.in_planes = 64

    self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
    self.bn1 = nn.BatchNorm2d(64)
    self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
    self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
    self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
    self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
    self.linear = nn.Linear(512*block.expansion, num_classes)

  def _make_layer(self, block, planes, num_blocks, stride):
    strides = [stride] + [1]*(num_blocks-1)
    layers = []
    for stride in strides:
      layers.append(block(self.in_planes, planes, stride))
      self.in_planes = planes * block.expansion
    return nn.Sequential(*layers)

  def forward(self, x):
    out = F.relu(self.bn1(self.conv1(x)))
    out = self.layer1(out)
    out = self.layer2(out)
    out = self.layer3(out)
    out = self.layer4(out)
    out = F.avg_pool2d(out, 4)
    out = out.view(out.size(0), -1)
    out = self.linear(out)
    return out


def ResNet18():
  return ResNet(BasicBlock, [2, 2, 2, 2])


def ResNet34():
  return ResNet(BasicBlock, [3, 4, 6, 3])


def ResNet50():
  return ResNet(Bottleneck, [3, 4, 6, 3])

Test on random data#

# Load the Model
net = ResNet18()
print('-----> verify if model is run on random data')
y = net(Variable(torch.randn(1,3,32,32)))
print('model loaded')

result_folder = './results/'
if not os.path.exists(result_folder):
    os.makedirs(result_folder)

logname = result_folder + net.__class__.__name__ + '_pretrain' + '.csv'

if use_cuda:
  net.cuda()
  net = torch.nn.DataParallel(net)
  print('Using', torch.cuda.device_count(), 'GPUs.')
  cudnn.benchmark = True
  print('Using CUDA..')

-----> verify if model is run on random data
model loaded
Using 1 GPUs.
Using CUDA..

Set up training#

Set loss function and optimizer#

We use the cross entropy loss, commonly used for classification, and stochastic gradient descent (SGD) as optimizer, with momentum and weight decay.

# Optimizer and criterion

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=base_learning_rate, momentum=0.9, weight_decay=1e-4)

Train and test loops#

# Training & Test functions

def train(net, epoch, use_cuda=True):
  print('\nEpoch: %d' % epoch)
  net.train()
  train_loss = 0
  correct = 0
  total = 0
  for batch_idx, (inputs, targets) in enumerate(trainloader):
    if use_cuda:
      inputs, targets = inputs.cuda(), targets.cuda()

    optimizer.zero_grad()
    inputs, targets = Variable(inputs), Variable(targets)
    outputs = net(inputs)
    loss = criterion(outputs, targets)
    loss.backward()
    optimizer.step()

    train_loss += loss.item()
    _, predicted = torch.max(outputs.data, 1)
    total += targets.size(0)
    correct += predicted.eq(targets.data).cpu().sum()

    if batch_idx % 500 == 0:
      print(batch_idx, len(trainloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
          % (train_loss/(batch_idx+1), 100.*correct/total, correct, total))
  return (train_loss/batch_idx, 100.*correct/total)


def test(net, epoch, outModelName, use_cuda=True):
  global best_acc
  net.eval()
  test_loss, correct, total = 0, 0, 0
  with torch.no_grad():
    for batch_idx, (inputs, targets) in enumerate(testloader):
      if use_cuda:
        inputs, targets = inputs.cuda(), targets.cuda()

      outputs = net(inputs)
      loss = criterion(outputs, targets)

      test_loss += loss.item()
      _, predicted = torch.max(outputs.data, 1)
      total += targets.size(0)
      correct += predicted.eq(targets.data).cpu().sum()

      if batch_idx % 200 == 0:
        print(batch_idx, len(testloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
            % (test_loss/(batch_idx+1), 100.*correct/total, correct, total))

  # Save checkpoint.
  acc = 100.*correct/total
  if acc > best_acc:
    best_acc = acc
    checkpoint(net, acc, epoch, outModelName)
  return (test_loss/batch_idx, 100.*correct/total)

Auxiliary functions#

checkpoint(): Store checkpoints of the model
adjust_learning_rate(): Decreases the learning rate (learning rate decay) at certain epochs of training.

# checkpoint & adjust_learning_rate
def checkpoint(model, acc, epoch, outModelName):
  # Save checkpoint.
  print('Saving..')
  state = {
      'state_dict': model.state_dict(),
      'acc': acc,
      'epoch': epoch,
      'rng_state': torch.get_rng_state()
  }
  if not os.path.isdir('checkpoint'):
      os.mkdir('checkpoint')
  torch.save(state, f'./checkpoint/{outModelName}.t7')

def adjust_learning_rate(optimizer, epoch):
  """decrease the learning rate at 100 and 150 epoch"""
  lr = base_learning_rate
  if epoch <= 9 and lr > 0.1:
    # warm-up training for large minibatch
    lr = 0.1 + (base_learning_rate - 0.1) * epoch / 10.
  if epoch >= 100:
    lr /= 10
  if epoch >= 150:
    lr /= 10
  for param_group in optimizer.param_groups:
    param_group['lr'] = lr

Train the model#

This is the loop where the model is trained for max_epochs epochs.

# Start training
outModelName = 'pretrain'
if not os.path.exists(logname):
  with open(logname, 'w') as logfile:
      logwriter = csv.writer(logfile, delimiter=',')
      logwriter.writerow(['epoch', 'train loss', 'train acc', 'test loss', 'test acc'])

for epoch in range(start_epoch, max_epochs):
  adjust_learning_rate(optimizer, epoch)
  train_loss, train_acc = train(net, epoch, use_cuda=use_cuda)
  test_loss, test_acc = test(net, epoch, outModelName, use_cuda=use_cuda)
  with open(logname, 'a') as logfile:
    logwriter = csv.writer(logfile, delimiter=',')
    logwriter.writerow([epoch, train_loss, train_acc.item(), test_loss, test_acc.item()])
  print(f'Epoch: {epoch} | train acc: {train_acc} | test acc: {test_acc}')

Epoch: 0
0 391 Loss: 4.748 | Acc: 0.781% (1/128)
0 79 Loss: 3.545 | Acc: 12.500% (16/128)
Saving..
Epoch: 0 | train acc: 8.527999877929688 | test acc: 13.369999885559082

Epoch: 1
0 391 Loss: 3.597 | Acc: 16.406% (21/128)
0 79 Loss: 3.157 | Acc: 23.438% (30/128)
Saving..
Epoch: 1 | train acc: 18.392000198364258 | test acc: 21.829999923706055

Epoch: 2
0 391 Loss: 2.932 | Acc: 26.562% (34/128)
0 79 Loss: 2.450 | Acc: 39.844% (51/128)
Saving..
Epoch: 2 | train acc: 27.016000747680664 | test acc: 31.079999923706055

Epoch: 3
0 391 Loss: 2.649 | Acc: 35.938% (46/128)
0 79 Loss: 2.134 | Acc: 39.844% (51/128)
Saving..
Epoch: 3 | train acc: 35.84000015258789 | test acc: 35.70000076293945

Epoch: 4
0 391 Loss: 2.153 | Acc: 41.406% (53/128)
0 79 Loss: 1.911 | Acc: 49.219% (63/128)
Saving..
Epoch: 4 | train acc: 42.827999114990234 | test acc: 43.619998931884766

Epoch: 5
0 391 Loss: 1.878 | Acc: 50.000% (64/128)
0 79 Loss: 2.149 | Acc: 43.750% (56/128)
Saving..
Epoch: 5 | train acc: 48.87200164794922 | test acc: 45.380001068115234

Epoch: 6
0 391 Loss: 1.814 | Acc: 51.562% (66/128)
0 79 Loss: 1.847 | Acc: 46.875% (60/128)
Saving..
Epoch: 6 | train acc: 53.59000015258789 | test acc: 50.310001373291016

Epoch: 7
0 391 Loss: 1.514 | Acc: 56.250% (72/128)
0 79 Loss: 1.568 | Acc: 51.562% (66/128)
Saving..
Epoch: 7 | train acc: 57.35200119018555 | test acc: 54.209999084472656

Epoch: 8
0 391 Loss: 1.194 | Acc: 62.500% (80/128)
0 79 Loss: 1.403 | Acc: 59.375% (76/128)
Saving..
Epoch: 8 | train acc: 60.61600112915039 | test acc: 57.20000076293945

Epoch: 9
0 391 Loss: 1.124 | Acc: 69.531% (89/128)
0 79 Loss: 1.339 | Acc: 64.844% (83/128)
Saving..
Epoch: 9 | train acc: 63.55400085449219 | test acc: 58.900001525878906

Epoch: 10
0 391 Loss: 1.013 | Acc: 72.656% (93/128)
0 79 Loss: 1.225 | Acc: 66.406% (85/128)
Epoch: 10 | train acc: 65.91999816894531 | test acc: 58.83000183105469

Epoch: 11
0 391 Loss: 0.971 | Acc: 64.844% (83/128)
0 79 Loss: 1.491 | Acc: 63.281% (81/128)
Epoch: 11 | train acc: 68.05000305175781 | test acc: 57.560001373291016

Epoch: 12
0 391 Loss: 1.028 | Acc: 70.312% (90/128)
0 79 Loss: 1.358 | Acc: 63.281% (81/128)
Saving..
Epoch: 12 | train acc: 69.99600219726562 | test acc: 60.099998474121094

Epoch: 13
0 391 Loss: 0.699 | Acc: 82.812% (106/128)
0 79 Loss: 1.299 | Acc: 63.281% (81/128)
Saving..
Epoch: 13 | train acc: 71.9739990234375 | test acc: 60.220001220703125

Epoch: 14
0 391 Loss: 0.768 | Acc: 75.781% (97/128)
0 79 Loss: 1.182 | Acc: 74.219% (95/128)
Saving..
Epoch: 14 | train acc: 73.69400024414062 | test acc: 62.90999984741211

Transfer learning#

Re-use the trained model to improve training on a different data set#

Delete variables from the previous model#

# delete the backbone network
delete = True
if delete:
  del net
  del trainset
  del testset
  del trainloader
  del testloader
  gc.collect()

Target dataset#

We will now use CIFAR-10 as target data set. Again, with small tweaks we can get any other data we are interested in.

CIFAR-10 is very similar to CIFAR-100, but it contains only 10 classes instead of 100.

# Target domain Data
print('==> Preparing target domain data..')

# CIFAR10 normalizing
mean = (0.4914, 0.4822, 0.4465)
std = (0.2023, 0.1994, 0.2010)
num_classes = 10
lr = 0.0001

# torchvision transforms
transform_train = transforms.Compose([])
if torchvision_transforms:
    transform_train.transforms.append(transforms.RandomCrop(32, padding=4))
    transform_train.transforms.append(transforms.RandomHorizontalFlip())

transform_train.transforms.append(transforms.ToTensor())
transform_train.transforms.append(transforms.Normalize(mean, std))

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])

trainset = torchvision.datasets.CIFAR10(
    root='./CIFAR10', train=True, download=True, transform=transform_train)

testset = torchvision.datasets.CIFAR10(
    root='./CIFAR10', train=False, download=True, transform=transform_test)

==> Preparing target domain data..
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./CIFAR10/cifar-10-python.tar.gz

Extracting ./CIFAR10/cifar-10-python.tar.gz to ./CIFAR10
Files already downloaded and verified

Select a subset of the data#

To simulate a lower data regime, where transfer learning can be useful.

Choose percentage from the trainset. Set percent = 1.0 to use the whole train data

percent = 0.6

trainset = percentageSplit(trainset, percent = percent)
print('size of the new trainset: ', len(trainset))

size of the new trainset:  30000

Dataloaders#

As before

# Dataloader
num_workers = multiprocessing.cpu_count()

print(f'----> number of workers: {num_workers}')

trainloader = torch.utils.data.DataLoader(
    trainset, batch_size=batch_size, shuffle=True, num_workers=num_workers)
testloader = torch.utils.data.DataLoader(
    testset, batch_size=batch_size, shuffle=False, num_workers=num_workers)

----> number of workers: 2

Load pre-trained model#

Load the checkpoint of the model previously trained on CIFAR-100

model = ResNet18()

checkpointPath = '/content/checkpoint/pretrain.t7'

print(' ===> loading pretrained model from: ', checkpointPath)
if os.path.isfile(checkpointPath):
  state_dict = torch.load(checkpointPath)
  best_acc = state_dict['acc']
  print('Best Accuracy:', best_acc)
  if "state_dict" in state_dict:
      state_dict = state_dict["state_dict"]
  # remove prefixe "module."
  state_dict = {k.replace("module.", ""): v for k, v in state_dict.items()}
  for k, v in model.state_dict().items():
    if k not in list(state_dict):
      print('key "{}" could not be found in provided state dict'.format(k))
    elif state_dict[k].shape != v.shape:
      print('key "{}" is of different shape in model and provided state dict'.format(k))
      state_dict[k] = v
  msg = model.load_state_dict(state_dict, strict=False)
  print("Load pretrained model with msg: {}".format(msg))
else:
  raise Exception('No pretrained weights found')

 ===> loading pretrained model from:  /content/checkpoint/pretrain.t7
Best Accuracy: tensor(62.9100)
Load pretrained model with msg: <All keys matched successfully>

Freeze model parameters#

In transfer learning, we usually do not re-train all the weights of the model, but only a subset of them, for instance the last layer. Here we first freeze all the parameters of the model, and we will unfreeze one layer below.

# Freeze the model parameters, you can also freeze some layers only

for param in model.parameters():
  param.requires_grad = False

Loss function, optimizer and unfreeze last layer#

num_ftrs = model.linear.in_features
model.linear = nn.Linear(num_ftrs, num_classes)

model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(
    model.linear.parameters(),
    lr=lr,
    momentum=0.9,
    weight_decay=1e-4,
)

Check number of parameters#

We can calculate the number of total parameters and the number of trainable parameters, that is those that will be updated during training. Since we have freezed most of the parameters, the number of training parameters should be much smaller.

total_params = sum(p.numel() for p in model.parameters())
trainable_total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print('Total Parameters:', total_params, 'Trainable parameters: ', trainable_total_params)

Total Parameters: 11173962 Trainable parameters:  5130

Train the target model#

outModelName = 'finetuned'
logname = result_folder + model.__class__.__name__ + f'_{outModelName}.csv'

if not os.path.exists(logname):
  with open(logname, 'w') as logfile:
    logwriter = csv.writer(logfile, delimiter=',')
    logwriter.writerow(['epoch', 'train loss', 'train acc', 'test loss', 'test acc'])

for epoch in range(start_epoch, max_epochs_target):
  adjust_learning_rate(optimizer, epoch)
  train_loss, train_acc = train(model, epoch, use_cuda=use_cuda)
  test_loss, test_acc = test(model, epoch, outModelName, use_cuda=use_cuda)
  with open(logname, 'a') as logfile:
    logwriter = csv.writer(logfile, delimiter=',')
    logwriter.writerow([epoch, train_loss, train_acc.item(), test_loss, test_acc.item()])
  print(f'Epoch: {epoch} | train acc: {train_acc} | test acc: {test_acc}')

Epoch: 0
0 235 Loss: 2.302 | Acc: 16.406% (21/128)
0 79 Loss: 0.630 | Acc: 79.688% (102/128)
Saving..
Epoch: 0 | train acc: 71.086669921875 | test acc: 75.08999633789062

Epoch: 1
0 235 Loss: 0.757 | Acc: 72.656% (93/128)
0 79 Loss: 0.619 | Acc: 79.688% (102/128)
Saving..
Epoch: 1 | train acc: 75.86000061035156 | test acc: 76.54000091552734

Epoch: 2
0 235 Loss: 0.666 | Acc: 75.781% (97/128)
0 79 Loss: 0.640 | Acc: 78.125% (100/128)
Saving..
Epoch: 2 | train acc: 77.04000091552734 | test acc: 76.55000305175781

Epoch: 3
0 235 Loss: 0.579 | Acc: 81.250% (104/128)
0 79 Loss: 0.577 | Acc: 79.688% (102/128)
Saving..
Epoch: 3 | train acc: 77.56999969482422 | test acc: 77.2300033569336

Epoch: 4
0 235 Loss: 0.661 | Acc: 78.125% (100/128)
0 79 Loss: 0.613 | Acc: 76.562% (98/128)
Saving..
Epoch: 4 | train acc: 77.6866683959961 | test acc: 77.44999694824219

Epoch: 5
0 235 Loss: 0.627 | Acc: 80.469% (103/128)
0 79 Loss: 0.626 | Acc: 80.469% (103/128)
Epoch: 5 | train acc: 78.163330078125 | test acc: 77.37999725341797

Epoch: 6
0 235 Loss: 0.602 | Acc: 77.344% (99/128)
0 79 Loss: 0.607 | Acc: 78.125% (100/128)
Saving..
Epoch: 6 | train acc: 78.42333221435547 | test acc: 78.02999877929688

Epoch: 7
0 235 Loss: 0.537 | Acc: 75.781% (97/128)
0 79 Loss: 0.608 | Acc: 79.688% (102/128)
Saving..
Epoch: 7 | train acc: 78.49333190917969 | test acc: 78.1500015258789

Epoch: 8
0 235 Loss: 0.578 | Acc: 75.781% (97/128)
0 79 Loss: 0.650 | Acc: 76.562% (98/128)
Epoch: 8 | train acc: 78.15333557128906 | test acc: 76.83000183105469

Epoch: 9
0 235 Loss: 0.583 | Acc: 77.344% (99/128)
0 79 Loss: 0.616 | Acc: 77.344% (99/128)
Saving..
Epoch: 9 | train acc: 78.66999816894531 | test acc: 78.20999908447266

Plot results#

# title plot results
results = pd.read_csv(f'/content/results/ResNet_{outModelName}.csv', sep =',')
results.head()

	epoch	train loss	train acc	test loss	test acc
0	0	0.819690	71.086670	0.713260	75.089996
1	1	0.680940	75.860001	0.674431	76.540001
2	2	0.650245	77.040001	0.675883	76.550003
3	3	0.638555	77.570000	0.652776	77.230003
4	4	0.630500	77.686668	0.666428	77.449997

train_accuracy = results['train acc'].values
test_accuracy = results['test acc'].values

print(f'Average Accuracy over {max_epochs_target} epochs:', sum(test_accuracy)//len(test_accuracy))
print(f'best accuraccy over {max_epochs_target} epochs:', max(test_accuracy))

Average Accuracy over 10 epochs: 77.0
best accuraccy over 10 epochs: 78.20999908447266

figureName = 'figure' # change figure name

plt.plot(results['epoch'].values, train_accuracy, label='train')
plt.plot(results['epoch'].values, test_accuracy, label='test')
plt.xlabel('Number of epochs')
plt.ylabel('Accuracy')
plt.title(f'Train/Test Accuracy curve for {max_epochs} epochs')
plt.savefig(f'/content/results/{figureName}.png')
plt.legend()
plt.show()

../../_images/21b84580d952f1fe3f22ec7e42676e4f68e56cf51227e679df82e555f47dab45.png