Tutorial 1: Learn how to use modern convnets

Tutorial 1: Learn how to use modern convnets#

Week 2, Day 3: Modern Convnets

By Neuromatch Academy

Content creators: Laura Pede, Richard Vogg, Marissa Weis, Timo Lüddecke, Alexander Ecker

Content reviewers: Arush Tagade, Polina Turishcheva, Yu-Fang Yang, Bettina Hein, Melvin Selim Atay, Kelson Shilling-Scrivo

Content editors: Gagana B, Roberto Guidotti, Spiros Chavlis

Production editors: Anoop Kulkarni, Roberto Guidotti, Cary Murray, Gagana B, Spiros Chavlis

Tutorial notebook is based on an initial version by Ben Heil

Tutorial Objectives#

In this tutorial we are going to learn more about Convnets. More specifically, we will:

Learn about modern CNNs and Transfer Learning.
Understand how architectures incorporate ideas we have about the world.
Understand the operating principles underlying the basic building blocks of modern CNNs.
Understand the concept of transfer learning and learn to recognize opportunities for applying it.
(Bonus) Understand the speed vs. accuracy trade-off.

Setup#

Install dependencies#

Install and import feedback gadget#

# Import libraries
import os
import time
import tqdm
import torch
import IPython
import torchvision

import numpy as np
import matplotlib.pyplot as plt

import torch.nn as nn
import torch.nn.functional as F

from torchvision import transforms
from torchvision.models import AlexNet
from torchvision.utils import make_grid
from torchvision.datasets import ImageFolder

from PIL import Image
from io import BytesIO

Figure settings#

Set random seed#

Executing set_seed(seed=seed) you are setting the seed

Set device (GPU or CPU). Execute `set_device()`#

SEED = 2021
set_seed(seed=SEED)
DEVICE = set_device()

Random seed 2021 has been set.
WARNING: For this notebook to perform best, if possible, in the menu under `Runtime` -> `Change runtime type.`  select `GPU` 

Section 1: Modern CNNs and Transfer Learning#

Time estimate: ~25mins

Video 1: Modern CNNs and Transfer Learning#

Submit your feedback#

Images are high dimensional. That is to say that image_length * image_width * image_channels is a big number, and multiplying that big number by a normal sized fully-connected layer leads to a ton of parameters to learn. Yesterday, we learned about convolutional neural networks, one way of working around high dimensionality in images and other domains.

The widget below (i.e., Interactive Demo 1) calculates the parameters required for a single convolutional or fully connected layer that operates on an image of a certain height and width.

Recall that, the number of parameters of a convolutional layer \(l\) are calculated as:

(83)#\[\begin{equation} \text{num_of_params}_l = \left[ \left( H \times W \times K_{l-1} \right) + 1 \right] \times K_l \end{equation}\]

where \(H\) denotes the shape of the height of the filter, \(W\) the shape of the width of the filter, and \(K_l\) denotes the number of the filters in the \(l\)-th layer. The added \(1\) is because of the bias term for each filter.

While a fully connected layer contains:

(84)#\[\begin{equation} \text{num_of_params}_l = \left[ \left( N_{l-1} \times N_l \right) + 1 \times N_l \right] \end{equation}\]

where \(N_l\) denotes the number of nodes in the \(l\)-th layer.

Adjust the sliders to gain an intuition for how different model and data characteristics affect the number of parameters your model need to fit.

Note: these classes are designed to show parameter scaling in the first layer of a network, to be actually useful they would need more layers, an activation function, etc.

class FullyConnectedNet(nn.Module):
  """
  Fully connected network with the following structure:
  nn.Linear(self.input_size, 256)
  """

  def __init__(self):
    """
    Initialize parameters of FullyConnectedNet

    Args:
      None

    Returns:
      Nothing
    """
    super(FullyConnectedNet, self).__init__()

    image_width = 128
    image_channels = 3
    self.input_size = image_channels * image_width ** 2

    self.fc1 = nn.Linear(self.input_size, 256)

  def forward(self, x):
    """
    Forward pass of FullyConnectedNet

    Args:
      x: torch.tensor
        Input data

    Returns:
      x: torch.tensor
        Output from FullyConnectedNet
    """
    x = x.view(-1, self.input_size)
    return self.fc1(x)

class ConvNet(nn.Module):
  """
  Convolutional Neural Network
  """

  def __init__(self):
    """
    Initialize parameters of ConvNet

    Args:
      None

    Returns:
      Nothing
    """
    super(ConvNet, self).__init__()

    self.conv1 = nn.Conv2d(in_channels=3,
                            out_channels=256,
                            kernel_size=(3, 3),
                            padding=1)

  def forward(self, x):
    """
    Forward pass of ConvNet

    Args:
      x: torch.tensor
        Input data

    Returns:
      x: torch.tensor
        Output after passing x through Conv2d layer
    """
    return self.conv1(x)

Coding Exercise 1: Calculate number of parameters in FCNN vs ConvNet#

Write a function that calculates the number of parameters of a given network. Apply the function to the above defined fully-connected network and convolutional network and compare the parameter counts.

Hint: torch.numel

def get_parameter_count(network):
  """
  Calculate the number of parameters used by the fully connected/convolutional network.
  Hint: Casting the result of network.parameters() to a list may make it
        easier to work with

  Args:
    network: nn.module
      Network to calculate the parameters of fully connected/convolutional network

  Returns:
    param_count: int
      The number of parameters in the network
  """

  ####################################################################
  # Fill in all missing code below (...),
  # then remove or comment the line below to test your function
  raise NotImplementedError("Convolution math")
  ####################################################################
  # Get the network's parameters
  parameters = ...

  param_count = 0
  # Loop over all layers
  for layer in parameters:
    param_count += ...

  return param_count



# Initialize networks
fccnet = FullyConnectedNet()
convnet = ConvNet()
## Apply the above defined function to both networks by uncommenting the following lines
# print(f"FCCN parameter count: {get_parameter_count(fccnet)}")
# print(f"ConvNet parameter count: {get_parameter_count(convnet)}")

FCCN parameter count: 12583168
ConvNet parameter count: 7168

Click for solution

Submit your feedback#

Interactive Demo 1: Check your results#

The widget below calculates the number of parameters in a FCNN and CNN with the same architecture as our models above. Our models had an input image that was 128x128, and used 256 filters (or 256 nodes in the FCNN case). Check that the calculations you made above are correct.

Note how few parameters the convolutional networks take, especially as you increase the input image size.

Parameter Calculator#

Run this cell to enable the widget!

Submit your feedback#

Section 2: The History of Convnets#

Time estimate: ~15mins

Convolutional neural networks have been around for a long time. The first CNN model was published in 1980, and was based on ideas in neuroscience that predated it by decades. Why is it then that AlexNet, a CNN model published in 2012, is generally considered to mark the start of the deep learning revolution?

Watch the video below to get a better idea of the role that hardware and the internet have played in progressing deep learning.

Video 2: History of convnets#

Submit your feedback#

Think! 2: Challenges of improving CNNs#

As we shall see today, the story of deep learning and CNNs has been one of scaling networks: making them bigger and deeper.

Based on what you know so far from previous days, what challenges might researchers have faced when trying to scale up CNNs and applying them to different visual recognition tasks? Do you already have some ideas how these challenges might have been addressed?

Discuss this with your group for ~10 minutes.

(Hint: labeled data, compute and memory are all finite)

Click for solution

Submit your feedback#

Section 3: Big and Deep Convnets#

Time estimate: 18mins

Video 3: AlexNet & VGG#

Submit your feedback#

Section 3.1: Introduction to AlexNet#

AlexNet arguably marked the start of the current age of deep learning. It incorporates a number of the defining characteristics of successful DL today: deep networks, GPU-powered paralellization, and building blocks encoding task-specific priors. In this section you’ll have the opportunity to play with AlexNet and see the world through its eyes.

Import Alexnet#

This cell gives you the alexnet model as well as the input_image and input_batch variables used below

Section 3.2: What does AlexNet learn?#

This code visualizes the top-layer filters learned by AlexNet. What do these filters remind you of?

with torch.no_grad():
  params = list(alexnet.parameters())
  fig, axs = plt.subplots(8, 8, figsize=(8, 8))
  filters = []
  for filter_index in range(params[0].shape[0]):
    row_index = filter_index // 8
    col_index = filter_index % 8

    filter = params[0][filter_index,:,:,:]
    filter_image = filter.permute(1, 2, 0).cpu()
    scale = np.abs(filter_image).max()
    scaled_image = filter_image / (2 * scale) + 0.5
    filters.append(scaled_image.cpu())
    axs[row_index, col_index].imshow(scaled_image.cpu())
    axs[row_index, col_index].axis('off')
  plt.show()

../../../_images/cad62490cc79bc2852f9bf1c709824a08b83aa341dd7972c67403496baa7af6c.png

Think! 3.2.1: Filter Similarity#

What do these filters remind you of?

Click for solution

Submit your feedback#

Interactive Demo 3.2: What does AlexNet see?#

One way of visualizing CNNs is to look at the output of individual filters for a given image. Below is a widget that lets you examine the outputs of various filters used in AlexNet.

Run this cell to enable the widget

Submit your feedback#

Think! 3.2.2 Filter Purpose#

What do these filters appear to be doing? Note that different filters play different roles so there are several good answers.

Click for solution

Submit your feedback#

Section 4: Convnets After AlexNet#

Time estimate: ~25mins

Video 4: Residual Networks (ResNets)#

Submit your feedback#

In this section we’ll be working with a state of the art CNN model called ResNet. ResNet has two particularly interesting features. First, it uses skip connections to avoid the vanishing gradient problem. Second, each block (collection of layers) in a ResNet can be treated as learning a residual function.

Mathematically, a neural network can be thought of as a series of operations that maps an input (like an image of a dog) to an output (like the label “dog”). In math-speak a mapping from an input to an output is called a function. Neural networks are a flexible way of expressing that function.

If you were to subtract out the true function mapping images to class labels from the function learned by a network, you’d be left with the residual error or “residual function”. ResNets try to learn the original function, then the residual function, then the residual of the residual, and so on, using their residual blocks and adding them to the output of the preceeding layers.

In this section we’ll run several images through a pre-trained ResNet and see what happens.

Download imagenette#

Data is being downloaded...

The download has been completed.

Set Up Textual ImageNet labels#

Show code cell source Hide code cell source

# @title Set Up Textual ImageNet labels
dict_map={0: 'tench, Tinca tinca',
'goldfish, Carassius auratus',
'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',
'tiger shark, Galeocerdo cuvieri',
'hammerhead, hammerhead shark',
'electric ray, crampfish, numbfish, torpedo',
'stingray',
'cock',
'hen',
'ostrich, Struthio camelus',
'brambling, Fringilla montifringilla',
'goldfinch, Carduelis carduelis',
'house finch, linnet, Carpodacus mexicanus',
'junco, snowbird',
'indigo bunting, indigo finch, indigo bird, Passerina cyanea',
'robin, American robin, Turdus migratorius',
'bulbul',
'jay',
'magpie',
'chickadee',
'water ouzel, dipper',
'kite',
'bald eagle, American eagle, Haliaeetus leucocephalus',
'vulture',
'great grey owl, great gray owl, Strix nebulosa',
'European fire salamander, Salamandra salamandra',
'common newt, Triturus vulgaris',
'eft',
'spotted salamander, Ambystoma maculatum',
'axolotl, mud puppy, Ambystoma mexicanum',
'bullfrog, Rana catesbeiana',
'tree frog, tree-frog',
'tailed frog, bell toad, ribbed toad, tailed toad, Ascaphus trui',
'loggerhead, loggerhead turtle, Caretta caretta',
'leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea',
'mud turtle',
'terrapin',
'box turtle, box tortoise',
'banded gecko',
'common iguana, iguana, Iguana iguana',
'American chameleon, anole, Anolis carolinensis',
'whiptail, whiptail lizard',
'agama',
'frilled lizard, Chlamydosaurus kingi',
'alligator lizard',
'Gila monster, Heloderma suspectum',
'green lizard, Lacerta viridis',
'African chameleon, Chamaeleo chamaeleon',
'Komodo dragon, Komodo lizard, dragon lizard, giant lizard, Varanus komodoensis',
'African crocodile, Nile crocodile, Crocodylus niloticus',
'American alligator, Alligator mississipiensis',
'triceratops',
'thunder snake, worm snake, Carphophis amoenus',
'ringneck snake, ring-necked snake, ring snake',
'hognose snake, puff adder, sand viper',
'green snake, grass snake',
'king snake, kingsnake',
'garter snake, grass snake',
'water snake',
'vine snake',
'night snake, Hypsiglena torquata',
'boa constrictor, Constrictor constrictor',
'rock python, rock snake, Python sebae',
'Indian cobra, Naja naja',
'green mamba',
'sea snake',
'horned viper, cerastes, sand viper, horned asp, Cerastes cornutus',
'diamondback, diamondback rattlesnake, Crotalus adamanteus',
'sidewinder, horned rattlesnake, Crotalus cerastes',
'trilobite',
'harvestman, daddy longlegs, Phalangium opilio',
'scorpion',
'black and gold garden spider, Argiope aurantia',
'barn spider, Araneus cavaticus',
'garden spider, Aranea diademata',
'black widow, Latrodectus mactans',
'tarantula',
'wolf spider, hunting spider',
'tick',
'centipede',
'black grouse',
'ptarmigan',
'ruffed grouse, partridge, Bonasa umbellus',
'prairie chicken, prairie grouse, prairie fowl',
'peacock',
'quail',
'partridge',
'African grey, African gray, Psittacus erithacus',
'macaw',
'sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita',
'lorikeet',
'coucal',
'bee eater',
'hornbill',
'hummingbird',
'jacamar',
'toucan',
'drake',
'red-breasted merganser, Mergus serrator',
'goose',
'black swan, Cygnus atratus',
'tusker',
'echidna, spiny anteater, anteater',
'platypus, duckbill, duckbilled platypus, duck-billed platypus, Ornithorhynchus anatinus',
'wallaby, brush kangaroo',
'koala, koala bear, kangaroo bear, native bear, Phascolarctos cinereus',
'wombat',
'jellyfish',
'sea anemone, anemone',
'brain coral',
'flatworm, platyhelminth',
'nematode, nematode worm, roundworm',
'conch',
'snail',
'slug',
'sea slug, nudibranch',
'chiton, coat-of-mail shell, sea cradle, polyplacophore',
'chambered nautilus, pearly nautilus, nautilus',
'Dungeness crab, Cancer magister',
'rock crab, Cancer irroratus',
'fiddler crab',
'king crab, Alaska crab, Alaskan king crab, Alaska king crab, Paralithodes camtschatica',
'American lobster, Northern lobster, Maine lobster, Homarus americanus',
'spiny lobster, langouste, rock lobster, crawfish, crayfish, sea crawfish',
'crayfish, crawfish, crawdad, crawdaddy',
'hermit crab',
'isopod',
'white stork, Ciconia ciconia',
'black stork, Ciconia nigra',
'spoonbill',
'flamingo',
'little blue heron, Egretta caerulea',
'American egret, great white heron, Egretta albus',
'bittern',
'crane',
'limpkin, Aramus pictus',
'European gallinule, Porphyrio porphyrio',
'American coot, marsh hen, mud hen, water hen, Fulica americana',
'bustard',
'ruddy turnstone, Arenaria interpres',
'red-backed sandpiper, dunlin, Erolia alpina',
'redshank, Tringa totanus',
'dowitcher',
'oystercatcher, oyster catcher',
'pelican',
'king penguin, Aptenodytes patagonica',
'albatross, mollymawk',
'grey whale, gray whale, devilfish, Eschrichtius gibbosus, Eschrichtius robustus',
'killer whale, killer, orca, grampus, sea wolf, Orcinus orca',
'dugong, Dugong dugon',
'sea lion',
'Chihuahua',
'Japanese spaniel',
'Maltese dog, Maltese terrier, Maltese',
'Pekinese, Pekingese, Peke',
'Shih-Tzu',
'Blenheim spaniel',
'papillon',
'toy terrier',
'Rhodesian ridgeback',
'Afghan hound, Afghan',
'basset, basset hound',
'beagle',
'bloodhound, sleuthhound',
'bluetick',
'black-and-tan coonhound',
'Walker hound, Walker foxhound',
'English foxhound',
'redbone',
'borzoi, Russian wolfhound',
'Irish wolfhound',
'Italian greyhound',
'whippet',
'Ibizan hound, Ibizan Podenco',
'Norwegian elkhound, elkhound',
'otterhound, otter hound',
'Saluki, gazelle hound',
'Scottish deerhound, deerhound',
'Weimaraner',
'Staffordshire bullterrier, Staffordshire bull terrier',
'American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier',
'Bedlington terrier',
'Border terrier',
'Kerry blue terrier',
'Irish terrier',
'Norfolk terrier',
'Norwich terrier',
'Yorkshire terrier',
'wire-haired fox terrier',
'Lakeland terrier',
'Sealyham terrier, Sealyham',
'Airedale, Airedale terrier',
'cairn, cairn terrier',
'Australian terrier',
'Dandie Dinmont, Dandie Dinmont terrier',
'Boston bull, Boston terrier',
'miniature schnauzer',
'giant schnauzer',
'standard schnauzer',
'Scotch terrier, Scottish terrier, Scottie',
'Tibetan terrier, chrysanthemum dog',
'silky terrier, Sydney silky',
'soft-coated wheaten terrier',
'West Highland white terrier',
'Lhasa, Lhasa apso',
'flat-coated retriever',
'curly-coated retriever',
'golden retriever',
'Labrador retriever',
'Chesapeake Bay retriever',
'German short-haired pointer',
'vizsla, Hungarian pointer',
'English setter',
'Irish setter, red setter',
'Gordon setter',
'Brittany spaniel',
'clumber, clumber spaniel',
'English springer, English springer spaniel',
'Welsh springer spaniel',
'cocker spaniel, English cocker spaniel, cocker',
'Sussex spaniel',
'Irish water spaniel',
'kuvasz',
'schipperke',
'groenendael',
'malinois',
'briard',
'kelpie',
'komondor',
'Old English sheepdog, bobtail',
'Shetland sheepdog, Shetland sheep dog, Shetland',
'collie',
'Border collie',
'Bouvier des Flandres, Bouviers des Flandres',
'Rottweiler',
'German shepherd, German shepherd dog, German police dog, alsatian',
'Doberman, Doberman pinscher',
'miniature pinscher',
'Greater Swiss Mountain dog',
'Bernese mountain dog',
'Appenzeller',
'EntleBucher',
'boxer',
'bull mastiff',
'Tibetan mastiff',
'French bulldog',
'Great Dane',
'Saint Bernard, St Bernard',
'Eskimo dog, husky',
'malamute, malemute, Alaskan malamute',
'Siberian husky',
'dalmatian, coach dog, carriage dog',
'affenpinscher, monkey pinscher, monkey dog',
'basenji',
'pug, pug-dog',
'Leonberg',
'Newfoundland, Newfoundland dog',
'Great Pyrenees',
'Samoyed, Samoyede',
'Pomeranian',
'chow, chow chow',
'keeshond',
'Brabancon griffon',
'Pembroke, Pembroke Welsh corgi',
'Cardigan, Cardigan Welsh corgi',
'toy poodle',
'miniature poodle',
'standard poodle',
'Mexican hairless',
'timber wolf, grey wolf, gray wolf, Canis lupus',
'white wolf, Arctic wolf, Canis lupus tundrarum',
'red wolf, maned wolf, Canis rufus, Canis niger',
'coyote, prairie wolf, brush wolf, Canis latrans',
'dingo, warrigal, warragal, Canis dingo',
'dhole, Cuon alpinus',
'African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus',
'hyena, hyaena',
'red fox, Vulpes vulpes',
'kit fox, Vulpes macrotis',
'Arctic fox, white fox, Alopex lagopus',
'grey fox, gray fox, Urocyon cinereoargenteus',
'tabby, tabby cat',
'tiger cat',
'Persian cat',
'Siamese cat, Siamese',
'Egyptian cat',
'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor',
'lynx, catamount',
'leopard, Panthera pardus',
'snow leopard, ounce, Panthera uncia',
'jaguar, panther, Panthera onca, Felis onca',
'lion, king of beasts, Panthera leo',
'tiger, Panthera tigris',
'cheetah, chetah, Acinonyx jubatus',
'brown bear, bruin, Ursus arctos',
'American black bear, black bear, Ursus americanus, Euarctos americanus',
'ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus',
'sloth bear, Melursus ursinus, Ursus ursinus',
'mongoose',
'meerkat, mierkat',
'tiger beetle',
'ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle',
'ground beetle, carabid beetle',
'long-horned beetle, longicorn, longicorn beetle',
'leaf beetle, chrysomelid',
'dung beetle',
'rhinoceros beetle',
'weevil',
'fly',
'bee',
'ant, emmet, pismire',
'grasshopper, hopper',
'cricket',
'walking stick, walkingstick, stick insect',
'cockroach, roach',
'mantis, mantid',
'cicada, cicala',
'leafhopper',
'lacewing, lacewing fly',
"dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk",
'damselfly',
'admiral',
'ringlet, ringlet butterfly',
'monarch, monarch butterfly, milkweed butterfly, Danaus plexippus',
'cabbage butterfly',
'sulphur butterfly, sulfur butterfly',
'lycaenid, lycaenid butterfly',
'starfish, sea star',
'sea urchin',
'sea cucumber, holothurian',
'wood rabbit, cottontail, cottontail rabbit',
'hare',
'Angora, Angora rabbit',
'hamster',
'porcupine, hedgehog',
'fox squirrel, eastern fox squirrel, Sciurus niger',
'marmot',
'beaver',
'guinea pig, Cavia cobaya',
'sorrel',
'zebra',
'hog, pig, grunter, squealer, Sus scrofa',
'wild boar, boar, Sus scrofa',
'warthog',
'hippopotamus, hippo, river horse, Hippopotamus amphibius',
'ox',
'water buffalo, water ox, Asiatic buffalo, Bubalus bubalis',
'bison',
'ram, tup',
'bighorn, bighorn sheep, cimarron, Rocky Mountain bighorn, Rocky Mountain sheep, Ovis canadensis',
'ibex, Capra ibex',
'hartebeest',
'impala, Aepyceros melampus',
'gazelle',
'Arabian camel, dromedary, Camelus dromedarius',
'llama',
'weasel',
'mink',
'polecat, fitch, foulmart, foumart, Mustela putorius',
'black-footed ferret, ferret, Mustela nigripes',
'otter',
'skunk, polecat, wood pussy',
'badger',
'armadillo',
'three-toed sloth, ai, Bradypus tridactylus',
'orangutan, orang, orangutang, Pongo pygmaeus',
'gorilla, Gorilla gorilla',
'chimpanzee, chimp, Pan troglodytes',
'gibbon, Hylobates lar',
'siamang, Hylobates syndactylus, Symphalangus syndactylus',
'guenon, guenon monkey',
'patas, hussar monkey, Erythrocebus patas',
'baboon',
'macaque',
'langur',
'colobus, colobus monkey',
'proboscis monkey, Nasalis larvatus',
'marmoset',
'capuchin, ringtail, Cebus capucinus',
'howler monkey, howler',
'titi, titi monkey',
'spider monkey, Ateles geoffroyi',
'squirrel monkey, Saimiri sciureus',
'Madagascar cat, ring-tailed lemur, Lemur catta',
'indri, indris, Indri indri, Indri brevicaudatus',
'Indian elephant, Elephas maximus',
'African elephant, Loxodonta africana',
'lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens',
'giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca',
'barracouta, snoek',
'eel',
'coho, cohoe, coho salmon, blue jack, silver salmon, Oncorhynchus kisutch',
'rock beauty, Holocanthus tricolor',
'anemone fish',
'sturgeon',
'gar, garfish, garpike, billfish, Lepisosteus osseus',
'lionfish',
'puffer, pufferfish, blowfish, globefish',
'abacus',
'abaya',
"academic gown, academic robe, judge's robe",
'accordion, piano accordion, squeeze box',
'acoustic guitar',
'aircraft carrier, carrier, flattop, attack aircraft carrier',
'airliner',
'airship, dirigible',
'altar',
'ambulance',
'amphibian, amphibious vehicle',
'analog clock',
'apiary, bee house',
'apron',
'ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin',
'assault rifle, assault gun',
'backpack, back pack, knapsack, packsack, rucksack, haversack',
'bakery, bakeshop, bakehouse',
'balance beam, beam',
'balloon',
'ballpoint, ballpoint pen, ballpen, Biro',
'Band Aid',
'banjo',
'bannister, banister, balustrade, balusters, handrail',
'barbell',
'barber chair',
'barbershop',
'barn',
'barometer',
'barrel, cask',
'barrow, garden cart, lawn cart, wheelbarrow',
'baseball',
'basketball',
'bassinet',
'bassoon',
'bathing cap, swimming cap',
'bath towel',
'bathtub, bathing tub, bath, tub',
'beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon',
'beacon, lighthouse, beacon light, pharos',
'beaker',
'bearskin, busby, shako',
'beer bottle',
'beer glass',
'bell cote, bell cot',
'bib',
'bicycle-built-for-two, tandem bicycle, tandem',
'bikini, two-piece',
'binder, ring-binder',
'binoculars, field glasses, opera glasses',
'birdhouse',
'boathouse',
'bobsled, bobsleigh, bob',
'bolo tie, bolo, bola tie, bola',
'bonnet, poke bonnet',
'bookcase',
'bookshop, bookstore, bookstall',
'bottlecap',
'bow',
'bow tie, bow-tie, bowtie',
'brass, memorial tablet, plaque',
'brassiere, bra, bandeau',
'breakwater, groin, groyne, mole, bulwark, seawall, jetty',
'breastplate, aegis, egis',
'broom',
'bucket, pail',
'buckle',
'bulletproof vest',
'bullet train, bullet',
'butcher shop, meat market',
'cab, hack, taxi, taxicab',
'caldron, cauldron',
'candle, taper, wax light',
'cannon',
'canoe',
'can opener, tin opener',
'cardigan',
'car mirror',
'carousel, carrousel, merry-go-round, roundabout, whirligig',
"carpenter's kit, tool kit",
'carton',
'car wheel',
'cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM',
'cassette',
'cassette player',
'castle',
'catamaran',
'CD player',
'cello, violoncello',
'cellular telephone, cellular phone, cellphone, cell, mobile phone',
'chain',
'chainlink fence',
'chain mail, ring mail, mail, chain armor, chain armour, ring armor, ring armour',
'chain saw, chainsaw',
'chest',
'chiffonier, commode',
'chime, bell, gong',
'china cabinet, china closet',
'Christmas stocking',
'church, church building',
'cinema, movie theater, movie theatre, movie house, picture palace',
'cleaver, meat cleaver, chopper',
'cliff dwelling',
'cloak',
'clog, geta, patten, sabot',
'cocktail shaker',
'coffee mug',
'coffeepot',
'coil, spiral, volute, whorl, helix',
'combination lock',
'computer keyboard, keypad',
'confectionery, confectionary, candy store',
'container ship, containership, container vessel',
'convertible',
'corkscrew, bottle screw',
'cornet, horn, trumpet, trump',
'cowboy boot',
'cowboy hat, ten-gallon hat',
'cradle',
'crane',
'crash helmet',
'crate',
'crib, cot',
'Crock Pot',
'croquet ball',
'crutch',
'cuirass',
'dam, dike, dyke',
'desk',
'desktop computer',
'dial telephone, dial phone',
'diaper, nappy, napkin',
'digital clock',
'digital watch',
'dining table, board',
'dishrag, dishcloth',
'dishwasher, dish washer, dishwashing machine',
'disk brake, disc brake',
'dock, dockage, docking facility',
'dogsled, dog sled, dog sleigh',
'dome',
'doormat, welcome mat',
'drilling platform, offshore rig',
'drum, membranophone, tympan',
'drumstick',
'dumbbell',
'Dutch oven',
'electric fan, blower',
'electric guitar',
'electric locomotive',
'entertainment center',
'envelope',
'espresso maker',
'face powder',
'feather boa, boa',
'file, file cabinet, filing cabinet',
'fireboat',
'fire engine, fire truck',
'fire screen, fireguard',
'flagpole, flagstaff',
'flute, transverse flute',
'folding chair',
'football helmet',
'forklift',
'fountain',
'fountain pen',
'four-poster',
'freight car',
'French horn, horn',
'frying pan, frypan, skillet',
'fur coat',
'garbage truck, dustcart',
'gasmask, respirator, gas helmet',
'gas pump, gasoline pump, petrol pump, island dispenser',
'goblet',
'go-kart',
'golf ball',
'golfcart, golf cart',
'gondola',
'gong, tam-tam',
'gown',
'grand piano, grand',
'greenhouse, nursery, glasshouse',
'grille, radiator grille',
'grocery store, grocery, food market, market',
'guillotine',
'hair slide',
'hair spray',
'half track',
'hammer',
'hamper',
'hand blower, blow dryer, blow drier, hair dryer, hair drier',
'hand-held computer, hand-held microcomputer',
'handkerchief, hankie, hanky, hankey',
'hard disc, hard disk, fixed disk',
'harmonica, mouth organ, harp, mouth harp',
'harp',
'harvester, reaper',
'hatchet',
'holster',
'home theater, home theatre',
'honeycomb',
'hook, claw',
'hoopskirt, crinoline',
'horizontal bar, high bar',
'horse cart, horse-cart',
'hourglass',
'iPod',
'iron, smoothing iron',
"jack-o'-lantern",
'jean, blue jean, denim',
'jeep, landrover',
'jersey, T-shirt, tee shirt',
'jigsaw puzzle',
'jinrikisha, ricksha, rickshaw',
'joystick',
'kimono',
'knee pad',
'knot',
'lab coat, laboratory coat',
'ladle',
'lampshade, lamp shade',
'laptop, laptop computer',
'lawn mower, mower',
'lens cap, lens cover',
'letter opener, paper knife, paperknife',
'library',
'lifeboat',
'lighter, light, igniter, ignitor',
'limousine, limo',
'liner, ocean liner',
'lipstick, lip rouge',
'Loafer',
'lotion',
'loudspeaker, speaker, speaker unit, loudspeaker system, speaker system',
"loupe, jeweler's loupe",
'lumbermill, sawmill',
'magnetic compass',
'mailbag, postbag',
'mailbox, letter box',
'maillot',
'maillot, tank suit',
'manhole cover',
'maraca',
'marimba, xylophone',
'mask',
'matchstick',
'maypole',
'maze, labyrinth',
'measuring cup',
'medicine chest, medicine cabinet',
'megalith, megalithic structure',
'microphone, mike',
'microwave, microwave oven',
'military uniform',
'milk can',
'minibus',
'miniskirt, mini',
'minivan',
'missile',
'mitten',
'mixing bowl',
'mobile home, manufactured home',
'Model T',
'modem',
'monastery',
'monitor',
'moped',
'mortar',
'mortarboard',
'mosque',
'mosquito net',
'motor scooter, scooter',
'mountain bike, all-terrain bike, off-roader',
'mountain tent',
'mouse, computer mouse',
'mousetrap',
'moving van',
'muzzle',
'nail',
'neck brace',
'necklace',
'nipple',
'notebook, notebook computer',
'obelisk',
'oboe, hautboy, hautbois',
'ocarina, sweet potato',
'odometer, hodometer, mileometer, milometer',
'oil filter',
'organ, pipe organ',
'oscilloscope, scope, cathode-ray oscilloscope, CRO',
'overskirt',
'oxcart',
'oxygen mask',
'packet',
'paddle, boat paddle',
'paddlewheel, paddle wheel',
'padlock',
'paintbrush',
"pajama, pyjama, pj's, jammies",
'palace',
'panpipe, pandean pipe, syrinx',
'paper towel',
'parachute, chute',
'parallel bars, bars',
'park bench',
'parking meter',
'passenger car, coach, carriage',
'patio, terrace',
'pay-phone, pay-station',
'pedestal, plinth, footstall',
'pencil box, pencil case',
'pencil sharpener',
'perfume, essence',
'Petri dish',
'photocopier',
'pick, plectrum, plectron',
'pickelhaube',
'picket fence, paling',
'pickup, pickup truck',
'pier',
'piggy bank, penny bank',
'pill bottle',
'pillow',
'ping-pong ball',
'pinwheel',
'pirate, pirate ship',
'pitcher, ewer',
"plane, carpenter's plane, woodworking plane",
'planetarium',
'plastic bag',
'plate rack',
'plow, plough',
"plunger, plumber's helper",
'Polaroid camera, Polaroid Land camera',
'pole',
'police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria',
'poncho',
'pool table, billiard table, snooker table',
'pop bottle, soda bottle',
'pot, flowerpot',
"potter's wheel",
'power drill',
'prayer rug, prayer mat',
'printer',
'prison, prison house',
'projectile, missile',
'projector',
'puck, hockey puck',
'punching bag, punch bag, punching ball, punchball',
'purse',
'quill, quill pen',
'quilt, comforter, comfort, puff',
'racer, race car, racing car',
'racket, racquet',
'radiator',
'radio, wireless',
'radio telescope, radio reflector',
'rain barrel',
'recreational vehicle, RV, R.V.',
'reel',
'reflex camera',
'refrigerator, icebox',
'remote control, remote',
'restaurant, eating house, eating place, eatery',
'revolver, six-gun, six-shooter',
'rifle',
'rocking chair, rocker',
'rotisserie',
'rubber eraser, rubber, pencil eraser',
'rugby ball',
'rule, ruler',
'running shoe',
'safe',
'safety pin',
'saltshaker, salt shaker',
'sandal',
'sarong',
'sax, saxophone',
'scabbard',
'scale, weighing machine',
'school bus',
'schooner',
'scoreboard',
'screen, CRT screen',
'screw',
'screwdriver',
'seat belt, seatbelt',
'sewing machine',
'shield, buckler',
'shoe shop, shoe-shop, shoe store',
'shoji',
'shopping basket',
'shopping cart',
'shovel',
'shower cap',
'shower curtain',
'ski',
'ski mask',
'sleeping bag',
'slide rule, slipstick',
'sliding door',
'slot, one-armed bandit',
'snorkel',
'snowmobile',
'snowplow, snowplough',
'soap dispenser',
'soccer ball',
'sock',
'solar dish, solar collector, solar furnace',
'sombrero',
'soup bowl',
'space bar',
'space heater',
'space shuttle',
'spatula',
'speedboat',
"spider web, spider's web",
'spindle',
'sports car, sport car',
'spotlight, spot',
'stage',
'steam locomotive',
'steel arch bridge',
'steel drum',
'stethoscope',
'stole',
'stone wall',
'stopwatch, stop watch',
'stove',
'strainer',
'streetcar, tram, tramcar, trolley, trolley car',
'stretcher',
'studio couch, day bed',
'stupa, tope',
'submarine, pigboat, sub, U-boat',
'suit, suit of clothes',
'sundial',
'sunglass',
'sunglasses, dark glasses, shades',
'sunscreen, sunblock, sun blocker',
'suspension bridge',
'swab, swob, mop',
'sweatshirt',
'swimming trunks, bathing trunks',
'swing',
'switch, electric switch, electrical switch',
'syringe',
'table lamp',
'tank, army tank, armored combat vehicle, armoured combat vehicle',
'tape player',
'teapot',
'teddy, teddy bear',
'television, television system',
'tennis ball',
'thatch, thatched roof',
'theater curtain, theatre curtain',
'thimble',
'thresher, thrasher, threshing machine',
'throne',
'tile roof',
'toaster',
'tobacco shop, tobacconist shop, tobacconist',
'toilet seat',
'torch',
'totem pole',
'tow truck, tow car, wrecker',
'toyshop',
'tractor',
'trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi',
'tray',
'trench coat',
'tricycle, trike, velocipede',
'trimaran',
'tripod',
'triumphal arch',
'trolleybus, trolley coach, trackless trolley',
'trombone',
'tub, vat',
'turnstile',
'typewriter keyboard',
'umbrella',
'unicycle, monocycle',
'upright, upright piano',
'vacuum, vacuum cleaner',
'vase',
'vault',
'velvet',
'vending machine',
'vestment',
'viaduct',
'violin, fiddle',
'volleyball',
'waffle iron',
'wall clock',
'wallet, billfold, notecase, pocketbook',
'wardrobe, closet, press',
'warplane, military plane',
'washbasin, handbasin, washbowl, lavabo, wash-hand basin',
'washer, automatic washer, washing machine',
'water bottle',
'water jug',
'water tower',
'whiskey jug',
'whistle',
'wig',
'window screen',
'window shade',
'Windsor tie',
'wine bottle',
'wing',
'wok',
'wooden spoon',
'wool, woolen, woollen',
'worm fence, snake fence, snake-rail fence, Virginia fence',
'wreck',
'yawl',
'yurt',
'web site, website, internet site, site',
'comic book',
'crossword puzzle, crossword',
'street sign',
'traffic light, traffic signal, stoplight',
'book jacket, dust cover, dust jacket, dust wrapper',
'menu',
'plate',
'guacamole',
'consomme',
'hot pot, hotpot',
'trifle',
'ice cream, icecream',
'ice lolly, lolly, lollipop, popsicle',
'French loaf',
'bagel, beigel',
'pretzel',
'cheeseburger',
'hotdog, hot dog, red hot',
'mashed potato',
'head cabbage',
'broccoli',
'cauliflower',
'zucchini, courgette',
'spaghetti squash',
'acorn squash',
'butternut squash',
'cucumber, cuke',
'artichoke, globe artichoke',
'bell pepper',
'cardoon',
'mushroom',
'Granny Smith',
'strawberry',
'orange',
'lemon',
'fig',
'pineapple, ananas',
'banana',
'jackfruit, jak, jack',
'custard apple',
'pomegranate',
'hay',
'carbonara',
'chocolate sauce, chocolate syrup',
'dough',
'meat loaf, meatloaf',
'pizza, pizza pie',
'potpie',
'burrito',
'red wine',
'espresso',
'cup',
'eggnog',
'alp',
'bubble',
'cliff, drop, drop-off',
'coral reef',
'geyser',
'lakeside, lakeshore',
'promontory, headland, head, foreland',
'sandbar, sand bar',
'seashore, coast, seacoast, sea-coast',
'valley, vale',
'volcano',
'ballplayer, baseball player',
'groom, bridegroom',
'scuba diver',
'rapeseed',
'daisy',
"yellow lady's slipper, yellow lady-slipper, Cypripedium calceolus, Cypripedium parviflorum",
'corn',
'acorn',
'hip, rose hip, rosehip',
'buckeye, horse chestnut, conker',
'coral fungus',
'agaric',
'gyromitra',
'stinkhorn, carrion fungus',
'earthstar',
'hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa',
'bolete',
'ear, spike, capitulum',
'toilet tissue, toilet paper, bathroom tissue'}

Map Imagenette Labels to Imagenet Labels#

Prepare Imagenette Data#

# To preserve reproducibility
g_seed = torch.Generator()
g_seed.manual_seed(SEED)

imagenette_train_loader = torch.utils.data.DataLoader(imagenette_train_subset,
                                                      batch_size=16,
                                                      shuffle=True,
                                                      num_workers=2,
                                                      worker_init_fn=seed_worker,
                                                      generator=g_seed
                                                      )

imagenette_val_loader = torch.utils.data.DataLoader(imagenette_val,
                                                    batch_size=16,
                                                    shuffle=False,
                                                    num_workers=2,
                                                    worker_init_fn=seed_worker,
                                                    generator=g_seed)

dataiter = iter(imagenette_val_loader)
images, labels = next(dataiter)

# Show images
plt.figure(figsize=(8, 8))
plt.imshow(make_grid(images, nrow=4).permute(1, 2, 0))
plt.axis('off')
plt.show()

../../../_images/74e1cd8b6df29a34cc2ab3cca5a960ea330bdcead2d9dbe8b7ce5d58bcc698c8.png

eval_imagenette function#

Imagenette Train Loop#

This cell creates a ResNet model pretrained on ImageNet, a 1000 class image prediction dataset. The model is then trained to make predictions on Imagenette, a small subset of ImageNet classes that is useful for demonstrations and prototyping.

# Original network
top_1_accuracies = []
top_5_accuracies = []

# Instantiate a pretrained resnet model
set_seed(seed=SEED)
resnet = torchvision.models.resnet18(weights='ResNet18_Weights.DEFAULT').to(DEVICE)
resnet_opt = torch.optim.Adam(resnet.parameters(), lr=1e-4)
loss_fn = nn.CrossEntropyLoss()

imagenette_train_loop(resnet,
                      resnet_opt,
                      imagenette_train_loader,
                      loss_fn,
                      device=DEVICE)

top_1_acc, top_5_acc = eval_imagenette(resnet,
                                       imagenette_val_loader,
                                       len(imagenette_val),
                                       device=DEVICE)
top_1_accuracies.append(top_1_acc.item())
top_5_accuracies.append(top_5_acc.item())

Random seed 2021 has been set.

Coding Exercise 4.1: Use the ResNet model#

Complete the function below that runs a batch of images through the trained ResNet and returns the Top 5 class predictions and their probabilities. Note that the ResNet model returns unnormalized logits\(^\dagger\). To obtain probabilities, you need to normalize the logits using softmax.

\(^\dagger\) \( \text{logit}(p) = \sigma^{-1}(p) = \text{log} \left( \frac{p}{1-p} \right), \, \text{for} \, p \in (0,1)\), where \(\sigma(\cdot)\) is the sigmoid function, i.e., \(\sigma(z) = 1/(1+e^{-z})\). For more information see here.

def predict_top5(images, device, seed):
  """
  Function to predict top 5 classes

  Args:
    images: torch.tensor
      Image data with dimensionality B x C x H x W batch size x number of channels x height x width)
    device: STRING
      `cuda` if GPU is available, else `cpu`.

  Output:
    top5_probs: torch.tensor
      Tensor(B, 5) with top 5 class probabilities
    top5_names: list
      List of top 5 class names (B, 5)
  """
  ####################################################################
  # Fill in all missing code below (...),
  # then remove or comment the line below to test your function
  raise NotImplementedError("Predict top 5")
  ####################################################################
  set_seed(seed=seed)

  B = images.size(0)
  with torch.no_grad():
    # Run images through model
    images = ...
    output = ...
    # The model output is unnormalized. To get probabilities, run a softmax on it.
    probs = ...
    # Fetch output from GPU and convert to numpy array
    probs = ...

  # Get top 5 predictions
  _, top5_idcs = output.topk(5, 1, True, True)
  top5_idcs = top5_idcs.t().cpu().numpy()
  top5_probs = probs[torch.arange(B), top5_idcs]

  # Convert indices to class names
  top5_names = []
  for b in range(B):
    temp = [dict_map[key].split(',')[0] for key in top5_idcs[:, b]]
    top5_names.append(temp)

  return top5_names, top5_probs


# Get batch of images
dataiter = iter(imagenette_val_loader)
images, labels = next(dataiter)

## Uncomment to test your function and retrieve top 5 predictions
# top5_names, top5_probs = predict_top5(images, DEVICE, SEED)
# print(top5_names[1])

You will see something like this:

Random seed 2021 has been set.
['gas pump', 'chain saw', 'jinrikisha', 'rifle', 'turnstile']

Click for solution

Submit your feedback#

# Visualize probabilities of top 5 predictions
fig, ax = plt.subplots(5, 2, figsize=(10, 20))

for i in range(5):
  ax[i, 0].imshow(np.moveaxis(images[i].numpy(), 0, -1))
  ax[i, 0].axis('off')

  ax[i, 1].bar(np.arange(5), top5_probs[:, i])
  ax[i, 1].set_xticks(np.arange(5))
  ax[i, 1].set_xticklabels(top5_names[i], rotation=30)

fig.tight_layout()
plt.show()

Out-of-distribution examples#

The code below runs two out-of-distribution examples through the trained ResNet. Look at the predictions and discuss, why the model might fail to make accurate predictions on these images.

loc = 'https://raw.githubusercontent.com/NeuromatchAcademy/course-content-dl/main/tutorials/W2D3_ModernConvnets/static/'

fname1 = 'bonsai-svg-5.png'
response = requests.get(loc + fname1)
image = Image.open(BytesIO(response.content)).resize((256, 256))
data = torch.from_numpy(np.asarray(image)[:, :, :3]) / 255.

fname2 = 'Pokémon_Pikachu_art.png'
response = requests.get(loc + fname2)
image = Image.open(BytesIO(response.content)).resize((256, 256))
data2 = torch.from_numpy(np.asarray(image)[:, :, :3]) / 255.

images = torch.stack([data, data2]).permute(0, 3, 1, 2)

# Retrieve top 5 predictions
top5_names, top5_probs = predict_top5(images, DEVICE, SEED)

# Visualize probabilities of top 5 predictions
fig, ax = plt.subplots(2, 2, figsize=(10, 10))

for i in range(2):
  ax[i, 0].imshow(np.moveaxis(images[i].numpy(), 0, -1))
  ax[i, 0].axis('off')

  ax[i, 1].bar(np.arange(5), top5_probs[:, i])
  ax[i, 1].set_xticks(np.arange(5))
  ax[i, 1].set_xticklabels(top5_names[i], rotation=30)

fig.tight_layout()
plt.show()

Section 5: Inception + ResNeXt#

Time estimate: ~27mins

Video 5: Improving efficiency: Inception and ResNeXt#

Submit your feedback#

ResNet vs ResNeXt#

https://raw.githubusercontent.com/NeuromatchAcademy/course-content-dl/main/tutorials/W2D3_ModernConvnets/static/ResNets.png

Xie et al., 2016

Interactive Demo 5: ResNet vs. ResNeXt#

The widgets below calculate the number of parameters in a ResNet (top) and the parameters in a ResNeXt (bottom). We assume that the number of input and output channels (or feature maps) is the same (labeled “Channels in+out” in the widget). We refer to the number of channels after the first and the second layer of one block of either ResNet or ResNeXt as “bottleneck channels”.

The sliders are currently in the position that is displayed in the figure above. The goal of the following tasks is to investigate the difference in expressiveness and numbers of parameters in ResNet and ResNeXt.

Parameter Calculator#

Run this cell to enable the widget

Show code cell source Hide code cell source

# @title Parameter Calculator
# @markdown Run this cell to enable the widget
from IPython.display import display as dis

def calculate_parameters_resnet(d_in, resnet_channels):
    """
    ResNet math: Implement how parameters scale

    Args:
      d_in: int
        Input dimensionality
      resnet_channels: int
        Number of channels in ResNet

    Returns:
      None
    """
    d_out = d_in
    resnet_parameters = d_in*resnet_channels + 3*3*resnet_channels*resnet_channels + resnet_channels*d_out

    print('ResNet parameters: {}'.format(resnet_parameters))
    return None


def calculate_parameters_resnext(d_in, resnext_channels,
                                 num_paths):
    """
    ResNext math: Implement how parameters scale

    Args:
      d_in: int
        Input dimensionality
      resnet_channels: int
        Number of channels in ResNext
      num_paths: int
        Number of pathways in ResNext

    Returns:
      None
    """
    d_out = d_in
    d = resnext_channels

    resnext_parameters = (d_in*d + 3*3*d*d + d*d_out)*num_paths

    print('ResNeXt parameters: {}'.format(resnext_parameters))
    return None


labels = ['ResNet', 'ResNeXt']
descriptions_resnet = ['Channels in+out', 'Bottleneck channels']
descriptions_resnext = ['Channels in+out', 'Bottleneck channels',
                        'Number of paths (cardinality)']
lbox_resnet = widgets.VBox([widgets.Label(description) for description in descriptions_resnet])
lbox_resnext = widgets.VBox([widgets.Label(description) for description in descriptions_resnext])

d_in = widgets.FloatLogSlider(
    value=256,
    base=2,
    min=1, # Max exponent of base
    max=10, # Min exponent of base
    step=1, # Exponent step
)
resnet_channels = widgets.FloatLogSlider(
    value=64,
    base=2,
    min=5, # Max exponent of base
    max=10, # Min exponent of base
    step=1, # Exponent step
)
resnext_channels = widgets.FloatLogSlider(
    value=4,
    base=2,
    min=1, # Max exponent of base
    max=10, # Min exponent of base
    step=1, # Exponent step
)
num_paths = widgets.FloatLogSlider(
    value=32,
    base=2,
    min=0, # Max exponent of base
    max=7, # Min exponent of base
    step=1, # Exponent step
)

rbox_resnet = widgets.VBox([d_in, resnet_channels])
rbox_resnext = widgets.VBox([d_in, resnext_channels, num_paths])
ui_resnet = widgets.HBox([lbox_resnet, rbox_resnet])
ui_resnet_labeled = widgets.VBox(
    [widgets.HTML(value="<b>" + labels[0] + "</b>"), ui_resnet],
    layout=widgets.Layout(border='1px solid black'))
ui_resnext = widgets.HBox([lbox_resnext, rbox_resnext])
ui_resnext_labeled = widgets.VBox(
    [widgets.HTML(value="<b>" + labels[1] + "</b>"), ui_resnext],
    layout=widgets.Layout(border='1px solid black'))
ui = widgets.VBox([ui_resnet_labeled, ui_resnext_labeled])

out_resnet = widgets.interactive_output(calculate_parameters_resnet,
                     {'d_in':d_in,
                     'resnet_channels':resnet_channels})

out_resnext = widgets.interactive_output(calculate_parameters_resnext,
                     {'d_in':d_in,
                     'resnext_channels':resnext_channels,
                     'num_paths':num_paths})

d1 = dis(ui, out_resnet, out_resnext)

Submit your feedback#

Think! 5: ResNet vs. ResNeXt#

In the figure above, both networks, i.e., ResNet and ResNeXt, have a similar number of parameters.

How many channels are there in the bottleneck of the two networks, respectively?
How are these channels connected to each other from the first to the second layer in the blocks of the two networks, respectively?
What does it mean for the expressiveness of the two models relative to each other?

Click for solution

Submit your feedback#

Now we want to look at the number of parameters.

How does the difference in number of parameters change if we fix the number of channels in the bottleneck of both ResNet and ResNeXt to be 64, but vary the number of paths in ResNeXt? (8 paths with 8 channels each would be one such example)
Which number of paths results in the biggest parameter savings?

Click for solution

Section 6: Depthwise separable convolutions#

Time estimate: ~23mins

Video 6: Improving efficiency: MobileNet#

Submit your feedback#

Section 6.1: Depthwise separable convolutions#

Another way to reduce the computational cost of large models is the use of depthwise separable convolutions (introduced here). Depthwise separable convolutions are the key component making MobileNets efficient.

https://raw.githubusercontent.com/NeuromatchAcademy/course-content-dl/main/tutorials/W2D3_ModernConvnets/static/SchematicCNN.png

Coding Exercise 6.1: Calculation of parameters#

Fill in the calculation of the parameters of regular convolution and depthwise separable convolution in the function below. Above you can see the example given in the video for you to check if your calculation is correct.

def convolution_math(in_channels, filter_size, out_channels):
  """
  Convolution math: Implement how parameters scale as a function of feature maps
  and filter size in convolution vs depthwise separable convolution.

  Args:
    in_channels : int
      Number of input channels
    filter_size : int
      Size of the filter
    out_channels : int
      Number of output channels

  Returns:
    None
  """
  ####################################################################
  # Fill in all missing code below (...),
  # then remove or comment the line below to test your function
  raise NotImplementedError("Convolution math")
  ####################################################################
  # Calculate the number of parameters for regular convolution
  conv_parameters = ...
  # Calculate the number of parameters for depthwise separable convolution
  depthwise_conv_parameters = ...

  print(f"Depthwise separable: {depthwise_conv_parameters} parameters")
  print(f"Regular convolution: {conv_parameters} parameters")

  return None



## Uncomment to test your function
# convolution_math(in_channels=4, filter_size=3, out_channels=2)

Depthwise separable: 44 parameters
Regular convolution: 72 parameters

Click for solution

Submit your feedback#

Think! 6.1: How do parameter savings depend the on number of input feature maps, 4 vs. 64?#

Click for solution

Submit your feedback#

Section 7: Transfer Learning#

Time estimate: ~24mins

Video 7: Transfer Learning#

Submit your feedback#

The most common way large image models are trained in practice is via transfer learning. One first pretrains a network on a large classification dataset like ImageNet, then uses the weights of this network as initialization for training (“fine-tuning”) that network on your task of choice.

While training a network twice sounds like a strange thing to do, the model ends up training faster on the target dataset and often outperforms training “from scratch”. There are also other benefits such as robustness to noise that are the subject of active research.

In this section we will demonstrate transfer learning by taking a model trained on ImageNet and teaching it to classify Pokemon.

Section 7.1: Download and prepare the data#

Download Data#

Data is being downloaded...

The download has been completed.

# List the different Pokemon
os.listdir("small_pokemon_dataset/")

['Charizard',
 'Bulbasaur',
 'Venusaur',
 'Ivysaur',
 'Charmander',
 'Squirtle',
 'Charmeleon',
 'Wartortle',
 'Blastoise']

Determine number of classes#

9 types of Pokemon

Display Example Images#

../../../_images/60cc8572f3991abec8f1730c9009b905fe852eaa23898e8e32c398263969ad1b.png

Section 7.2: Fine-tuning a ResNet#

It is common in computer vision to take a large model trained on a large dataset (often ImageNet), replace the classification layer and fine-tune the entire network to perform a different task.

Here we’ll be using a pre-trained ResNet model to classify types of Pokemon.

resnet = torchvision.models.resnet18(weights='ResNet18_Weights.DEFAULT')
num_ftrs = resnet.fc.in_features
# Reset final fully connected layer, number of classes = types of Pokemon = 9
resnet.fc = nn.Linear(num_ftrs, num_classes)
resnet.to(DEVICE)
optimizer = torch.optim.Adam(resnet.parameters(), lr=1e-4)
loss_fn = nn.CrossEntropyLoss()

Finetune ResNet#

../../../_images/ce648fc0850f246118f25cb0cffb6df877b4743cb16e7a18e369db9ae9d7ff3f.png

Section 7.3: Train only classification layer#

Another possible way to make use of transfer learning is to take a pre-trained model and replace the last layer, the classification layer (sometimes also called the “linear readout”). Instead of fine-tuning the whole model as before, we train only the classification layer.

resnet = torchvision.models.resnet18(weights='ResNet18_Weights.DEFAULT')
for param in resnet.parameters():
  param.requires_grad = False
num_ftrs = resnet.fc.in_features
# ResNet final fully connected layer
resnet.fc = nn.Linear(num_ftrs, num_classes)
resnet.to(DEVICE)
optimizer = torch.optim.Adam(resnet.fc.parameters(), lr=1e-2)
loss_fn = nn.CrossEntropyLoss()

Finetune readout of ResNet#

../../../_images/9e6930b478ef6b080e15cdfe8a4e5146359a70ed0e5717527b84a5679553d6b3.png

Section 7.4: Training ResNet from scratch#

As a baseline and for comparison reasons we will also train the ResNet “from scratch” – that is: initialize the weights randomly and train the entire network exclusively on the Pokemon dataset.

resnet = torchvision.models.resnet18(weights=None)
num_ftrs = resnet.fc.in_features
# ResNet final fully connected layer
resnet.fc = nn.Linear(num_ftrs, num_classes)
resnet.to(DEVICE)
optimizer = torch.optim.Adam(resnet.parameters(), lr=1e-4)

loss_fn = nn.CrossEntropyLoss()

Train ResNet from scratch#

../../../_images/9ebeffadeac1af52406dfc62a7814eef68f962d95d2a9ebb881340685aa82ea7.png

Section 7.5: Head to Head Comparison#

Starting from a randomly initialized network works less well, especially in the case of small datsets. Note that the model converges more slowly and less evenly.

Plot Accuracies#

../../../_images/703134423b825f78ae06058ff9c4d0fb8d393160931df82f0be334f1ec7356a7.png

Exercise 7.5.1: Pretrained ResNet vs. ResNet trained from scratch#

First, we compare the Pretrained ResNet with the ResNet trained from scratch. Why might pretrained models outperform models trained from scratch? In what cases would you expect them to be worse?

Click for solution

Submit your feedback#

Exercise 7.5.2: Training only the classification layer#

Second, take a look at the different transfer learning methods - fine-tuning the whole network and training only the classification layer. Why might fine-tuning the whole network outperform training only the classification layer? What are the benefits of training only the classification layer? In what cases would you expect a similar performance of both methods?

Click for solution

Submit your feedback#

Summary#

In this tutorial, you have learned about the modern Convnets (CNNs), their architecture, and operating principles. Also, you are now familiar with the notion of Transfer Learning, and you have learned when to apply it. If you have time left, you will learn more about the speed vs. accuracy trade-off. In the next tutorial, we will see the modern convnets in a facial recognition task.

Video 8: Summary and Outlook#

Submit your feedback#

Daily survey#

Don’t forget to complete your reflections and content check in the daily survey! Please be patient after logging in as there is a small delay before you will be redirected to the survey.

Bonus: Speed-Accuracy Trade-Off / Different Backbones#

Time estimate: ~ 21mins

Video 9: Speed-accuracy trade-off#

Submit your feedback#

As the models got larger and the number of connections increased so did the computational costs involved. In the modern era of image processing, there is a tradeoff between model performance and computational cost. Models can reach extremely high performance on many problems, but achieving state of the art results requires huge amounts of compute power.

https://raw.githubusercontent.com/NeuromatchAcademy/course-content-dl/main/tutorials/W2D3_ModernConvnets/static/compute_vs_performance.png

Bonus Coding Exercise: Compare accuracy and training speed of different models#

The goal is to load three pretrained models and fine-tune them. models is a dictionary where the keys are the names of the models and the values are the corresponding model objects. Currently the names are ResNet18, AlexNet and VGG-19. For a start, load these models from torchvision.models and make sure they are pretrained.

If you want to try other models, just change the dictionary, or if you want to even try out more than three models, just add them to the dictionary and add their learning rates in the array below.

Imagenette Train Loop: `train_loop(model, optimizer, train_loader, loss_fn, device)`#

Run the models: `run_models(models, lr_rates)`#

Plot accuracies vs. training speed#

def create_models(weights):
  """
  Creates models

  Args:
    weights: list of strings
      If True, load pretrained models.

  Returns:
    models: dict
      Log of models
    lr_rates: list
      Log of learning rates
  """
  ####################################################################
  # Fill in all missing code below (...),
  # then remove or comment the line below to test your function
  raise NotImplementedError("create pretrained models")
  ####################################################################
  # Load three pretrained models from torchvision.models
  # [these are just examples, other models are possible as well]
  model1 = ...
  model2 = ...
  model3 = ...

  models = {'...': model1, '...': model2, '...': model3}
  lr_rates = [1e-4, 1e-4, 1e-4]

  return models, lr_rates


weight_list = ['ResNet18_Weights.DEFAULT', 'AlexNet_Weights.DEFAULT', 'VGG19_Weights.DEFAULT']
## Uncomment below to test your function
# models, lr_rates = create_models(weights=weight_list)
# times, top_1_accuracies = run_models(models, lr_rates)
# plot_acc_speed(times, top_1_accuracies, models)

Click for solution

Example output:

Submit your feedback#

Bonus Exercise 1: Finding the best model#

Look at the plot above. It shows the training speed vs. the accuracy of the models you chose. The training speed is measured as the mean time the training takes per epoch. The size of the marker visualizes the number of parameters of the model.

Which model seems to be the best for this task and why? Explain your conclusion based on speed, accuracy and number of parameters.

Click for solution

Submit your feedback#

Bonus Exercise 2: Speed and accuracy correlation#

How does the speed correlate with the accuracy? Are faster models also more accurate?

Click for solution

Tutorial 1: Learn how to use modern convnets

Contents

Tutorial 1: Learn how to use modern convnets#

Tutorial Objectives#

Setup#

Install dependencies#

Install and import feedback gadget#

Figure settings#

Set random seed#

Set device (GPU or CPU). Execute set_device()#

Section 1: Modern CNNs and Transfer Learning#

Video 1: Modern CNNs and Transfer Learning#

Submit your feedback#

Coding Exercise 1: Calculate number of parameters in FCNN vs ConvNet#

Submit your feedback#

Interactive Demo 1: Check your results#

Parameter Calculator#

Submit your feedback#

Section 2: The History of Convnets#

Video 2: History of convnets#

Submit your feedback#

Think! 2: Challenges of improving CNNs#

Submit your feedback#

Section 3: Big and Deep Convnets#

Video 3: AlexNet & VGG#

Submit your feedback#

Section 3.1: Introduction to AlexNet#

Import Alexnet#

Section 3.2: What does AlexNet learn?#

Think! 3.2.1: Filter Similarity#

Submit your feedback#

Interactive Demo 3.2: What does AlexNet see?#

Submit your feedback#

Think! 3.2.2 Filter Purpose#

Submit your feedback#

Further Reading#

Section 4: Convnets After AlexNet#

Video 4: Residual Networks (ResNets)#

Submit your feedback#

Download imagenette#

Set Up Textual ImageNet labels#

Map Imagenette Labels to Imagenet Labels#

Prepare Imagenette Data#

eval_imagenette function#

Imagenette Train Loop#

Coding Exercise 4.1: Use the ResNet model#

Submit your feedback#

Out-of-distribution examples#

Section 5: Inception + ResNeXt#

Video 5: Improving efficiency: Inception and ResNeXt#

Submit your feedback#

ResNet vs ResNeXt#

Interactive Demo 5: ResNet vs. ResNeXt#

Parameter Calculator#

Submit your feedback#

Think! 5: ResNet vs. ResNeXt#

Submit your feedback#

Section 6: Depthwise separable convolutions#

Video 6: Improving efficiency: MobileNet#

Submit your feedback#

Section 6.1: Depthwise separable convolutions#

Coding Exercise 6.1: Calculation of parameters#

Submit your feedback#

Think! 6.1: How do parameter savings depend the on number of input feature maps, 4 vs. 64?#

Submit your feedback#

Section 7: Transfer Learning#

Video 7: Transfer Learning#

Submit your feedback#

Section 7.1: Download and prepare the data#

Download Data#

Determine number of classes#

Display Example Images#

Section 7.2: Fine-tuning a ResNet#

Finetune ResNet#

Section 7.3: Train only classification layer#

Finetune readout of ResNet#

Section 7.4: Training ResNet from scratch#

Train ResNet from scratch#

Section 7.5: Head to Head Comparison#

Plot Accuracies#

Set device (GPU or CPU). Execute `set_device()`#

Imagenette Train Loop: `train_loop(model, optimizer, train_loader, loss_fn, device)`#

Run the models: `run_models(models, lr_rates)`#