Bonus Tutorial: Modern ConvNets and Facial Recognition (Bonus)

Bonus Tutorial: Modern ConvNets and Facial Recognition (Bonus)#

Week 2, Day 2: ConvNets

By Neuromatch Academy

Content creators: Laura Pede, Richard Vogg, Marissa Weis, Timo Lüddecke, Alexander Ecker

Content reviewers: Arush Tagade, Polina Turishcheva, Yu-Fang Yang, Bettina Hein, Melvin Selim Atay, Kelson Shilling-Scrivo, Jiaxin Cindy Tu

Content editors: Gagana B, Roberto Guidotti, Spiros Chavlis, Jiaxin Cindy Tu

Production editors: Anoop Kulkarni, Roberto Guidotti, Cary Murray, Gagana B, Spiros Chavlis, Konstantine Tsafatinos

Tutorial notebook is based on an initial version by Ben Heil

Tutorial Objectives#

In this bonus lecture you will:

Learn about the history of CNNs and modern CNN architectures (AlexNet, ResNet, Inception, ResNeXt, MobileNet)
Compare accuracy and training speed across different model backbones
Apply modern ConvNets to facial recognition
Understand ethical considerations in facial recognition systems

Setup#

Install and import feedback gadget#

Uncomment to install dependencies#

# Import libraries
import os
import glob
import time
import torch
import random
import torchvision

import numpy as np
import sklearn.decomposition
import matplotlib.pyplot as plt

import torch.nn as nn
import torch.nn.functional as F

from torchvision import transforms
from torchvision.models import AlexNet
from torchvision.utils import make_grid
from torchvision.datasets import ImageFolder

from PIL import Image
from io import BytesIO
from tqdm import tqdm
from facenet_pytorch import MTCNN, InceptionResnetV1

Figure settings#

Set random seed#

Executing set_seed(seed=seed) you are setting the seed

Set device (GPU or CPU). Execute `set_device()`#

SEED = 2021
set_seed(seed=SEED)
DEVICE = set_device()

Random seed 2021 has been set.
WARNING: For this notebook to perform best, if possible, in the menu under `Runtime` -> `Change runtime type.`  select `GPU` 

Section 1: The History of Convnets#

Time estimate: ~15mins

Convolutional neural networks have been around for a long time. The first CNN model was published in 1980, and was based on ideas in neuroscience that predated it by decades. Why is it then that AlexNet, a CNN model published in 2012, is generally considered to mark the start of the deep learning revolution?

Watch the video below to get a better idea of the role that hardware and the internet have played in progressing deep learning.

Video 1: History of convnets#

Submit your feedback#

Think! 1: Challenges of improving CNNs#

As we shall see today, the story of deep learning and CNNs has been one of scaling networks: making them bigger and deeper.

Based on what you know so far from previous days, what challenges might researchers have faced when trying to scale up CNNs and applying them to different visual recognition tasks? Do you already have some ideas how these challenges might have been addressed?

Discuss this with your group for ~10 minutes.

(Hint: labeled data, compute and memory are all finite)

Click for solution

Submit your feedback#

Section 2: Big and Deep Convnets#

Time estimate: 18mins

Video 2: AlexNet & VGG#

Submit your feedback#

Section 2.1: Introduction to AlexNet#

AlexNet arguably marked the start of the current age of deep learning. It incorporates a number of the defining characteristics of successful DL today: deep networks, GPU-powered paralellization, and building blocks encoding task-specific priors. In this section you’ll have the opportunity to play with AlexNet and see the world through its eyes.

Import Alexnet#

Downloading weights...

Saved 244.4 MB
File size on disk: 244.4 MB

Model loaded successfully!

Section 2.2: What does AlexNet learn?#

This code visualizes the top-layer filters learned by AlexNet. What do these filters remind you of?

with torch.no_grad():
  params = list(alexnet.parameters())
  fig, axs = plt.subplots(8, 8, figsize=(8, 8))
  filters = []
  for filter_index in range(params[0].shape[0]):
    row_index = filter_index // 8
    col_index = filter_index % 8

    filter = params[0][filter_index,:,:,:]
    filter_image = filter.permute(1, 2, 0).cpu()
    scale = np.abs(filter_image).max()
    scaled_image = filter_image / (2 * scale) + 0.5
    filters.append(scaled_image.cpu())
    axs[row_index, col_index].imshow(scaled_image.cpu())
    axs[row_index, col_index].axis('off')
  plt.show()

../../../_images/1b7945355af72d8c72c791108a8cbb47eb159a372fe6e411a7f3c756f27c4ff8.png

Think! 2.2.1: Filter Similarity#

What do these filters remind you of?

Click for solution

Submit your feedback#

Interactive Demo 3.2: What does AlexNet see?#

One way of visualizing CNNs is to look at the output of individual filters for a given image. Below is a widget that lets you examine the outputs of various filters used in AlexNet.

Run this cell to enable the widget

Submit your feedback#

Think! 2.2.2 Filter Purpose#

What do these filters appear to be doing? Note that different filters play different roles so there are several good answers.

Click for solution

Submit your feedback#

Section 3: Convnets After AlexNet#

Time estimate: ~25mins

Video 3: Residual Networks (ResNets)#

Submit your feedback#

In this section we’ll be working with a state of the art CNN model called ResNet. ResNet has two particularly interesting features. First, it uses skip connections to avoid the vanishing gradient problem. Second, each block (collection of layers) in a ResNet can be treated as learning a residual function.

Mathematically, a neural network can be thought of as a series of operations that maps an input (like an image of a dog) to an output (like the label “dog”). In math-speak a mapping from an input to an output is called a function. Neural networks are a flexible way of expressing that function.

If you were to subtract out the true function mapping images to class labels from the function learned by a network, you’d be left with the residual error or “residual function”. ResNets try to learn the original function, then the residual function, then the residual of the residual, and so on, using their residual blocks and adding them to the output of the preceeding layers.

In this section we’ll run several images through a pre-trained ResNet and see what happens.

Download imagenette#

Data is being downloaded...

The download has been completed.

Set Up Textual ImageNet labels#

Show code cell source Hide code cell source

# @title Set Up Textual ImageNet labels
dict_map={0: 'tench, Tinca tinca',
'goldfish, Carassius auratus',
'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',
'tiger shark, Galeocerdo cuvieri',
'hammerhead, hammerhead shark',
'electric ray, crampfish, numbfish, torpedo',
'stingray',
'cock',
'hen',
'ostrich, Struthio camelus',
'brambling, Fringilla montifringilla',
'goldfinch, Carduelis carduelis',
'house finch, linnet, Carpodacus mexicanus',
'junco, snowbird',
'indigo bunting, indigo finch, indigo bird, Passerina cyanea',
'robin, American robin, Turdus migratorius',
'bulbul',
'jay',
'magpie',
'chickadee',
'water ouzel, dipper',
'kite',
'bald eagle, American eagle, Haliaeetus leucocephalus',
'vulture',
'great grey owl, great gray owl, Strix nebulosa',
'European fire salamander, Salamandra salamandra',
'common newt, Triturus vulgaris',
'eft',
'spotted salamander, Ambystoma maculatum',
'axolotl, mud puppy, Ambystoma mexicanum',
'bullfrog, Rana catesbeiana',
'tree frog, tree-frog',
'tailed frog, bell toad, ribbed toad, tailed toad, Ascaphus trui',
'loggerhead, loggerhead turtle, Caretta caretta',
'leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea',
'mud turtle',
'terrapin',
'box turtle, box tortoise',
'banded gecko',
'common iguana, iguana, Iguana iguana',
'American chameleon, anole, Anolis carolinensis',
'whiptail, whiptail lizard',
'agama',
'frilled lizard, Chlamydosaurus kingi',
'alligator lizard',
'Gila monster, Heloderma suspectum',
'green lizard, Lacerta viridis',
'African chameleon, Chamaeleo chamaeleon',
'Komodo dragon, Komodo lizard, dragon lizard, giant lizard, Varanus komodoensis',
'African crocodile, Nile crocodile, Crocodylus niloticus',
'American alligator, Alligator mississipiensis',
'triceratops',
'thunder snake, worm snake, Carphophis amoenus',
'ringneck snake, ring-necked snake, ring snake',
'hognose snake, puff adder, sand viper',
'green snake, grass snake',
'king snake, kingsnake',
'garter snake, grass snake',
'water snake',
'vine snake',
'night snake, Hypsiglena torquata',
'boa constrictor, Constrictor constrictor',
'rock python, rock snake, Python sebae',
'Indian cobra, Naja naja',
'green mamba',
'sea snake',
'horned viper, cerastes, sand viper, horned asp, Cerastes cornutus',
'diamondback, diamondback rattlesnake, Crotalus adamanteus',
'sidewinder, horned rattlesnake, Crotalus cerastes',
'trilobite',
'harvestman, daddy longlegs, Phalangium opilio',
'scorpion',
'black and gold garden spider, Argiope aurantia',
'barn spider, Araneus cavaticus',
'garden spider, Aranea diademata',
'black widow, Latrodectus mactans',
'tarantula',
'wolf spider, hunting spider',
'tick',
'centipede',
'black grouse',
'ptarmigan',
'ruffed grouse, partridge, Bonasa umbellus',
'prairie chicken, prairie grouse, prairie fowl',
'peacock',
'quail',
'partridge',
'African grey, African gray, Psittacus erithacus',
'macaw',
'sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita',
'lorikeet',
'coucal',
'bee eater',
'hornbill',
'hummingbird',
'jacamar',
'toucan',
'drake',
'red-breasted merganser, Mergus serrator',
'goose',
'black swan, Cygnus atratus',
'tusker',
'echidna, spiny anteater, anteater',
'platypus, duckbill, duckbilled platypus, duck-billed platypus, Ornithorhynchus anatinus',
'wallaby, brush kangaroo',
'koala, koala bear, kangaroo bear, native bear, Phascolarctos cinereus',
'wombat',
'jellyfish',
'sea anemone, anemone',
'brain coral',
'flatworm, platyhelminth',
'nematode, nematode worm, roundworm',
'conch',
'snail',
'slug',
'sea slug, nudibranch',
'chiton, coat-of-mail shell, sea cradle, polyplacophore',
'chambered nautilus, pearly nautilus, nautilus',
'Dungeness crab, Cancer magister',
'rock crab, Cancer irroratus',
'fiddler crab',
'king crab, Alaska crab, Alaskan king crab, Alaska king crab, Paralithodes camtschatica',
'American lobster, Northern lobster, Maine lobster, Homarus americanus',
'spiny lobster, langouste, rock lobster, crawfish, crayfish, sea crawfish',
'crayfish, crawfish, crawdad, crawdaddy',
'hermit crab',
'isopod',
'white stork, Ciconia ciconia',
'black stork, Ciconia nigra',
'spoonbill',
'flamingo',
'little blue heron, Egretta caerulea',
'American egret, great white heron, Egretta albus',
'bittern',
'crane',
'limpkin, Aramus pictus',
'European gallinule, Porphyrio porphyrio',
'American coot, marsh hen, mud hen, water hen, Fulica americana',
'bustard',
'ruddy turnstone, Arenaria interpres',
'red-backed sandpiper, dunlin, Erolia alpina',
'redshank, Tringa totanus',
'dowitcher',
'oystercatcher, oyster catcher',
'pelican',
'king penguin, Aptenodytes patagonica',
'albatross, mollymawk',
'grey whale, gray whale, devilfish, Eschrichtius gibbosus, Eschrichtius robustus',
'killer whale, killer, orca, grampus, sea wolf, Orcinus orca',
'dugong, Dugong dugon',
'sea lion',
'Chihuahua',
'Japanese spaniel',
'Maltese dog, Maltese terrier, Maltese',
'Pekinese, Pekingese, Peke',
'Shih-Tzu',
'Blenheim spaniel',
'papillon',
'toy terrier',
'Rhodesian ridgeback',
'Afghan hound, Afghan',
'basset, basset hound',
'beagle',
'bloodhound, sleuthhound',
'bluetick',
'black-and-tan coonhound',
'Walker hound, Walker foxhound',
'English foxhound',
'redbone',
'borzoi, Russian wolfhound',
'Irish wolfhound',
'Italian greyhound',
'whippet',
'Ibizan hound, Ibizan Podenco',
'Norwegian elkhound, elkhound',
'otterhound, otter hound',
'Saluki, gazelle hound',
'Scottish deerhound, deerhound',
'Weimaraner',
'Staffordshire bullterrier, Staffordshire bull terrier',
'American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier',
'Bedlington terrier',
'Border terrier',
'Kerry blue terrier',
'Irish terrier',
'Norfolk terrier',
'Norwich terrier',
'Yorkshire terrier',
'wire-haired fox terrier',
'Lakeland terrier',
'Sealyham terrier, Sealyham',
'Airedale, Airedale terrier',
'cairn, cairn terrier',
'Australian terrier',
'Dandie Dinmont, Dandie Dinmont terrier',
'Boston bull, Boston terrier',
'miniature schnauzer',
'giant schnauzer',
'standard schnauzer',
'Scotch terrier, Scottish terrier, Scottie',
'Tibetan terrier, chrysanthemum dog',
'silky terrier, Sydney silky',
'soft-coated wheaten terrier',
'West Highland white terrier',
'Lhasa, Lhasa apso',
'flat-coated retriever',
'curly-coated retriever',
'golden retriever',
'Labrador retriever',
'Chesapeake Bay retriever',
'German short-haired pointer',
'vizsla, Hungarian pointer',
'English setter',
'Irish setter, red setter',
'Gordon setter',
'Brittany spaniel',
'clumber, clumber spaniel',
'English springer, English springer spaniel',
'Welsh springer spaniel',
'cocker spaniel, English cocker spaniel, cocker',
'Sussex spaniel',
'Irish water spaniel',
'kuvasz',
'schipperke',
'groenendael',
'malinois',
'briard',
'kelpie',
'komondor',
'Old English sheepdog, bobtail',
'Shetland sheepdog, Shetland sheep dog, Shetland',
'collie',
'Border collie',
'Bouvier des Flandres, Bouviers des Flandres',
'Rottweiler',
'German shepherd, German shepherd dog, German police dog, alsatian',
'Doberman, Doberman pinscher',
'miniature pinscher',
'Greater Swiss Mountain dog',
'Bernese mountain dog',
'Appenzeller',
'EntleBucher',
'boxer',
'bull mastiff',
'Tibetan mastiff',
'French bulldog',
'Great Dane',
'Saint Bernard, St Bernard',
'Eskimo dog, husky',
'malamute, malemute, Alaskan malamute',
'Siberian husky',
'dalmatian, coach dog, carriage dog',
'affenpinscher, monkey pinscher, monkey dog',
'basenji',
'pug, pug-dog',
'Leonberg',
'Newfoundland, Newfoundland dog',
'Great Pyrenees',
'Samoyed, Samoyede',
'Pomeranian',
'chow, chow chow',
'keeshond',
'Brabancon griffon',
'Pembroke, Pembroke Welsh corgi',
'Cardigan, Cardigan Welsh corgi',
'toy poodle',
'miniature poodle',
'standard poodle',
'Mexican hairless',
'timber wolf, grey wolf, gray wolf, Canis lupus',
'white wolf, Arctic wolf, Canis lupus tundrarum',
'red wolf, maned wolf, Canis rufus, Canis niger',
'coyote, prairie wolf, brush wolf, Canis latrans',
'dingo, warrigal, warragal, Canis dingo',
'dhole, Cuon alpinus',
'African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus',
'hyena, hyaena',
'red fox, Vulpes vulpes',
'kit fox, Vulpes macrotis',
'Arctic fox, white fox, Alopex lagopus',
'grey fox, gray fox, Urocyon cinereoargenteus',
'tabby, tabby cat',
'tiger cat',
'Persian cat',
'Siamese cat, Siamese',
'Egyptian cat',
'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor',
'lynx, catamount',
'leopard, Panthera pardus',
'snow leopard, ounce, Panthera uncia',
'jaguar, panther, Panthera onca, Felis onca',
'lion, king of beasts, Panthera leo',
'tiger, Panthera tigris',
'cheetah, chetah, Acinonyx jubatus',
'brown bear, bruin, Ursus arctos',
'American black bear, black bear, Ursus americanus, Euarctos americanus',
'ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus',
'sloth bear, Melursus ursinus, Ursus ursinus',
'mongoose',
'meerkat, mierkat',
'tiger beetle',
'ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle',
'ground beetle, carabid beetle',
'long-horned beetle, longicorn, longicorn beetle',
'leaf beetle, chrysomelid',
'dung beetle',
'rhinoceros beetle',
'weevil',
'fly',
'bee',
'ant, emmet, pismire',
'grasshopper, hopper',
'cricket',
'walking stick, walkingstick, stick insect',
'cockroach, roach',
'mantis, mantid',
'cicada, cicala',
'leafhopper',
'lacewing, lacewing fly',
"dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk",
'damselfly',
'admiral',
'ringlet, ringlet butterfly',
'monarch, monarch butterfly, milkweed butterfly, Danaus plexippus',
'cabbage butterfly',
'sulphur butterfly, sulfur butterfly',
'lycaenid, lycaenid butterfly',
'starfish, sea star',
'sea urchin',
'sea cucumber, holothurian',
'wood rabbit, cottontail, cottontail rabbit',
'hare',
'Angora, Angora rabbit',
'hamster',
'porcupine, hedgehog',
'fox squirrel, eastern fox squirrel, Sciurus niger',
'marmot',
'beaver',
'guinea pig, Cavia cobaya',
'sorrel',
'zebra',
'hog, pig, grunter, squealer, Sus scrofa',
'wild boar, boar, Sus scrofa',
'warthog',
'hippopotamus, hippo, river horse, Hippopotamus amphibius',
'ox',
'water buffalo, water ox, Asiatic buffalo, Bubalus bubalis',
'bison',
'ram, tup',
'bighorn, bighorn sheep, cimarron, Rocky Mountain bighorn, Rocky Mountain sheep, Ovis canadensis',
'ibex, Capra ibex',
'hartebeest',
'impala, Aepyceros melampus',
'gazelle',
'Arabian camel, dromedary, Camelus dromedarius',
'llama',
'weasel',
'mink',
'polecat, fitch, foulmart, foumart, Mustela putorius',
'black-footed ferret, ferret, Mustela nigripes',
'otter',
'skunk, polecat, wood pussy',
'badger',
'armadillo',
'three-toed sloth, ai, Bradypus tridactylus',
'orangutan, orang, orangutang, Pongo pygmaeus',
'gorilla, Gorilla gorilla',
'chimpanzee, chimp, Pan troglodytes',
'gibbon, Hylobates lar',
'siamang, Hylobates syndactylus, Symphalangus syndactylus',
'guenon, guenon monkey',
'patas, hussar monkey, Erythrocebus patas',
'baboon',
'macaque',
'langur',
'colobus, colobus monkey',
'proboscis monkey, Nasalis larvatus',
'marmoset',
'capuchin, ringtail, Cebus capucinus',
'howler monkey, howler',
'titi, titi monkey',
'spider monkey, Ateles geoffroyi',
'squirrel monkey, Saimiri sciureus',
'Madagascar cat, ring-tailed lemur, Lemur catta',
'indri, indris, Indri indri, Indri brevicaudatus',
'Indian elephant, Elephas maximus',
'African elephant, Loxodonta africana',
'lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens',
'giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca',
'barracouta, snoek',
'eel',
'coho, cohoe, coho salmon, blue jack, silver salmon, Oncorhynchus kisutch',
'rock beauty, Holocanthus tricolor',
'anemone fish',
'sturgeon',
'gar, garfish, garpike, billfish, Lepisosteus osseus',
'lionfish',
'puffer, pufferfish, blowfish, globefish',
'abacus',
'abaya',
"academic gown, academic robe, judge's robe",
'accordion, piano accordion, squeeze box',
'acoustic guitar',
'aircraft carrier, carrier, flattop, attack aircraft carrier',
'airliner',
'airship, dirigible',
'altar',
'ambulance',
'amphibian, amphibious vehicle',
'analog clock',
'apiary, bee house',
'apron',
'ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin',
'assault rifle, assault gun',
'backpack, back pack, knapsack, packsack, rucksack, haversack',
'bakery, bakeshop, bakehouse',
'balance beam, beam',
'balloon',
'ballpoint, ballpoint pen, ballpen, Biro',
'Band Aid',
'banjo',
'bannister, banister, balustrade, balusters, handrail',
'barbell',
'barber chair',
'barbershop',
'barn',
'barometer',
'barrel, cask',
'barrow, garden cart, lawn cart, wheelbarrow',
'baseball',
'basketball',
'bassinet',
'bassoon',
'bathing cap, swimming cap',
'bath towel',
'bathtub, bathing tub, bath, tub',
'beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon',
'beacon, lighthouse, beacon light, pharos',
'beaker',
'bearskin, busby, shako',
'beer bottle',
'beer glass',
'bell cote, bell cot',
'bib',
'bicycle-built-for-two, tandem bicycle, tandem',
'bikini, two-piece',
'binder, ring-binder',
'binoculars, field glasses, opera glasses',
'birdhouse',
'boathouse',
'bobsled, bobsleigh, bob',
'bolo tie, bolo, bola tie, bola',
'bonnet, poke bonnet',
'bookcase',
'bookshop, bookstore, bookstall',
'bottlecap',
'bow',
'bow tie, bow-tie, bowtie',
'brass, memorial tablet, plaque',
'brassiere, bra, bandeau',
'breakwater, groin, groyne, mole, bulwark, seawall, jetty',
'breastplate, aegis, egis',
'broom',
'bucket, pail',
'buckle',
'bulletproof vest',
'bullet train, bullet',
'butcher shop, meat market',
'cab, hack, taxi, taxicab',
'caldron, cauldron',
'candle, taper, wax light',
'cannon',
'canoe',
'can opener, tin opener',
'cardigan',
'car mirror',
'carousel, carrousel, merry-go-round, roundabout, whirligig',
"carpenter's kit, tool kit",
'carton',
'car wheel',
'cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM',
'cassette',
'cassette player',
'castle',
'catamaran',
'CD player',
'cello, violoncello',
'cellular telephone, cellular phone, cellphone, cell, mobile phone',
'chain',
'chainlink fence',
'chain mail, ring mail, mail, chain armor, chain armour, ring armor, ring armour',
'chain saw, chainsaw',
'chest',
'chiffonier, commode',
'chime, bell, gong',
'china cabinet, china closet',
'Christmas stocking',
'church, church building',
'cinema, movie theater, movie theatre, movie house, picture palace',
'cleaver, meat cleaver, chopper',
'cliff dwelling',
'cloak',
'clog, geta, patten, sabot',
'cocktail shaker',
'coffee mug',
'coffeepot',
'coil, spiral, volute, whorl, helix',
'combination lock',
'computer keyboard, keypad',
'confectionery, confectionary, candy store',
'container ship, containership, container vessel',
'convertible',
'corkscrew, bottle screw',
'cornet, horn, trumpet, trump',
'cowboy boot',
'cowboy hat, ten-gallon hat',
'cradle',
'crane',
'crash helmet',
'crate',
'crib, cot',
'Crock Pot',
'croquet ball',
'crutch',
'cuirass',
'dam, dike, dyke',
'desk',
'desktop computer',
'dial telephone, dial phone',
'diaper, nappy, napkin',
'digital clock',
'digital watch',
'dining table, board',
'dishrag, dishcloth',
'dishwasher, dish washer, dishwashing machine',
'disk brake, disc brake',
'dock, dockage, docking facility',
'dogsled, dog sled, dog sleigh',
'dome',
'doormat, welcome mat',
'drilling platform, offshore rig',
'drum, membranophone, tympan',
'drumstick',
'dumbbell',
'Dutch oven',
'electric fan, blower',
'electric guitar',
'electric locomotive',
'entertainment center',
'envelope',
'espresso maker',
'face powder',
'feather boa, boa',
'file, file cabinet, filing cabinet',
'fireboat',
'fire engine, fire truck',
'fire screen, fireguard',
'flagpole, flagstaff',
'flute, transverse flute',
'folding chair',
'football helmet',
'forklift',
'fountain',
'fountain pen',
'four-poster',
'freight car',
'French horn, horn',
'frying pan, frypan, skillet',
'fur coat',
'garbage truck, dustcart',
'gasmask, respirator, gas helmet',
'gas pump, gasoline pump, petrol pump, island dispenser',
'goblet',
'go-kart',
'golf ball',
'golfcart, golf cart',
'gondola',
'gong, tam-tam',
'gown',
'grand piano, grand',
'greenhouse, nursery, glasshouse',
'grille, radiator grille',
'grocery store, grocery, food market, market',
'guillotine',
'hair slide',
'hair spray',
'half track',
'hammer',
'hamper',
'hand blower, blow dryer, blow drier, hair dryer, hair drier',
'hand-held computer, hand-held microcomputer',
'handkerchief, hankie, hanky, hankey',
'hard disc, hard disk, fixed disk',
'harmonica, mouth organ, harp, mouth harp',
'harp',
'harvester, reaper',
'hatchet',
'holster',
'home theater, home theatre',
'honeycomb',
'hook, claw',
'hoopskirt, crinoline',
'horizontal bar, high bar',
'horse cart, horse-cart',
'hourglass',
'iPod',
'iron, smoothing iron',
"jack-o'-lantern",
'jean, blue jean, denim',
'jeep, landrover',
'jersey, T-shirt, tee shirt',
'jigsaw puzzle',
'jinrikisha, ricksha, rickshaw',
'joystick',
'kimono',
'knee pad',
'knot',
'lab coat, laboratory coat',
'ladle',
'lampshade, lamp shade',
'laptop, laptop computer',
'lawn mower, mower',
'lens cap, lens cover',
'letter opener, paper knife, paperknife',
'library',
'lifeboat',
'lighter, light, igniter, ignitor',
'limousine, limo',
'liner, ocean liner',
'lipstick, lip rouge',
'Loafer',
'lotion',
'loudspeaker, speaker, speaker unit, loudspeaker system, speaker system',
"loupe, jeweler's loupe",
'lumbermill, sawmill',
'magnetic compass',
'mailbag, postbag',
'mailbox, letter box',
'maillot',
'maillot, tank suit',
'manhole cover',
'maraca',
'marimba, xylophone',
'mask',
'matchstick',
'maypole',
'maze, labyrinth',
'measuring cup',
'medicine chest, medicine cabinet',
'megalith, megalithic structure',
'microphone, mike',
'microwave, microwave oven',
'military uniform',
'milk can',
'minibus',
'miniskirt, mini',
'minivan',
'missile',
'mitten',
'mixing bowl',
'mobile home, manufactured home',
'Model T',
'modem',
'monastery',
'monitor',
'moped',
'mortar',
'mortarboard',
'mosque',
'mosquito net',
'motor scooter, scooter',
'mountain bike, all-terrain bike, off-roader',
'mountain tent',
'mouse, computer mouse',
'mousetrap',
'moving van',
'muzzle',
'nail',
'neck brace',
'necklace',
'nipple',
'notebook, notebook computer',
'obelisk',
'oboe, hautboy, hautbois',
'ocarina, sweet potato',
'odometer, hodometer, mileometer, milometer',
'oil filter',
'organ, pipe organ',
'oscilloscope, scope, cathode-ray oscilloscope, CRO',
'overskirt',
'oxcart',
'oxygen mask',
'packet',
'paddle, boat paddle',
'paddlewheel, paddle wheel',
'padlock',
'paintbrush',
"pajama, pyjama, pj's, jammies",
'palace',
'panpipe, pandean pipe, syrinx',
'paper towel',
'parachute, chute',
'parallel bars, bars',
'park bench',
'parking meter',
'passenger car, coach, carriage',
'patio, terrace',
'pay-phone, pay-station',
'pedestal, plinth, footstall',
'pencil box, pencil case',
'pencil sharpener',
'perfume, essence',
'Petri dish',
'photocopier',
'pick, plectrum, plectron',
'pickelhaube',
'picket fence, paling',
'pickup, pickup truck',
'pier',
'piggy bank, penny bank',
'pill bottle',
'pillow',
'ping-pong ball',
'pinwheel',
'pirate, pirate ship',
'pitcher, ewer',
"plane, carpenter's plane, woodworking plane",
'planetarium',
'plastic bag',
'plate rack',
'plow, plough',
"plunger, plumber's helper",
'Polaroid camera, Polaroid Land camera',
'pole',
'police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria',
'poncho',
'pool table, billiard table, snooker table',
'pop bottle, soda bottle',
'pot, flowerpot',
"potter's wheel",
'power drill',
'prayer rug, prayer mat',
'printer',
'prison, prison house',
'projectile, missile',
'projector',
'puck, hockey puck',
'punching bag, punch bag, punching ball, punchball',
'purse',
'quill, quill pen',
'quilt, comforter, comfort, puff',
'racer, race car, racing car',
'racket, racquet',
'radiator',
'radio, wireless',
'radio telescope, radio reflector',
'rain barrel',
'recreational vehicle, RV, R.V.',
'reel',
'reflex camera',
'refrigerator, icebox',
'remote control, remote',
'restaurant, eating house, eating place, eatery',
'revolver, six-gun, six-shooter',
'rifle',
'rocking chair, rocker',
'rotisserie',
'rubber eraser, rubber, pencil eraser',
'rugby ball',
'rule, ruler',
'running shoe',
'safe',
'safety pin',
'saltshaker, salt shaker',
'sandal',
'sarong',
'sax, saxophone',
'scabbard',
'scale, weighing machine',
'school bus',
'schooner',
'scoreboard',
'screen, CRT screen',
'screw',
'screwdriver',
'seat belt, seatbelt',
'sewing machine',
'shield, buckler',
'shoe shop, shoe-shop, shoe store',
'shoji',
'shopping basket',
'shopping cart',
'shovel',
'shower cap',
'shower curtain',
'ski',
'ski mask',
'sleeping bag',
'slide rule, slipstick',
'sliding door',
'slot, one-armed bandit',
'snorkel',
'snowmobile',
'snowplow, snowplough',
'soap dispenser',
'soccer ball',
'sock',
'solar dish, solar collector, solar furnace',
'sombrero',
'soup bowl',
'space bar',
'space heater',
'space shuttle',
'spatula',
'speedboat',
"spider web, spider's web",
'spindle',
'sports car, sport car',
'spotlight, spot',
'stage',
'steam locomotive',
'steel arch bridge',
'steel drum',
'stethoscope',
'stole',
'stone wall',
'stopwatch, stop watch',
'stove',
'strainer',
'streetcar, tram, tramcar, trolley, trolley car',
'stretcher',
'studio couch, day bed',
'stupa, tope',
'submarine, pigboat, sub, U-boat',
'suit, suit of clothes',
'sundial',
'sunglass',
'sunglasses, dark glasses, shades',
'sunscreen, sunblock, sun blocker',
'suspension bridge',
'swab, swob, mop',
'sweatshirt',
'swimming trunks, bathing trunks',
'swing',
'switch, electric switch, electrical switch',
'syringe',
'table lamp',
'tank, army tank, armored combat vehicle, armoured combat vehicle',
'tape player',
'teapot',
'teddy, teddy bear',
'television, television system',
'tennis ball',
'thatch, thatched roof',
'theater curtain, theatre curtain',
'thimble',
'thresher, thrasher, threshing machine',
'throne',
'tile roof',
'toaster',
'tobacco shop, tobacconist shop, tobacconist',
'toilet seat',
'torch',
'totem pole',
'tow truck, tow car, wrecker',
'toyshop',
'tractor',
'trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi',
'tray',
'trench coat',
'tricycle, trike, velocipede',
'trimaran',
'tripod',
'triumphal arch',
'trolleybus, trolley coach, trackless trolley',
'trombone',
'tub, vat',
'turnstile',
'typewriter keyboard',
'umbrella',
'unicycle, monocycle',
'upright, upright piano',
'vacuum, vacuum cleaner',
'vase',
'vault',
'velvet',
'vending machine',
'vestment',
'viaduct',
'violin, fiddle',
'volleyball',
'waffle iron',
'wall clock',
'wallet, billfold, notecase, pocketbook',
'wardrobe, closet, press',
'warplane, military plane',
'washbasin, handbasin, washbowl, lavabo, wash-hand basin',
'washer, automatic washer, washing machine',
'water bottle',
'water jug',
'water tower',
'whiskey jug',
'whistle',
'wig',
'window screen',
'window shade',
'Windsor tie',
'wine bottle',
'wing',
'wok',
'wooden spoon',
'wool, woolen, woollen',
'worm fence, snake fence, snake-rail fence, Virginia fence',
'wreck',
'yawl',
'yurt',
'web site, website, internet site, site',
'comic book',
'crossword puzzle, crossword',
'street sign',
'traffic light, traffic signal, stoplight',
'book jacket, dust cover, dust jacket, dust wrapper',
'menu',
'plate',
'guacamole',
'consomme',
'hot pot, hotpot',
'trifle',
'ice cream, icecream',
'ice lolly, lolly, lollipop, popsicle',
'French loaf',
'bagel, beigel',
'pretzel',
'cheeseburger',
'hotdog, hot dog, red hot',
'mashed potato',
'head cabbage',
'broccoli',
'cauliflower',
'zucchini, courgette',
'spaghetti squash',
'acorn squash',
'butternut squash',
'cucumber, cuke',
'artichoke, globe artichoke',
'bell pepper',
'cardoon',
'mushroom',
'Granny Smith',
'strawberry',
'orange',
'lemon',
'fig',
'pineapple, ananas',
'banana',
'jackfruit, jak, jack',
'custard apple',
'pomegranate',
'hay',
'carbonara',
'chocolate sauce, chocolate syrup',
'dough',
'meat loaf, meatloaf',
'pizza, pizza pie',
'potpie',
'burrito',
'red wine',
'espresso',
'cup',
'eggnog',
'alp',
'bubble',
'cliff, drop, drop-off',
'coral reef',
'geyser',
'lakeside, lakeshore',
'promontory, headland, head, foreland',
'sandbar, sand bar',
'seashore, coast, seacoast, sea-coast',
'valley, vale',
'volcano',
'ballplayer, baseball player',
'groom, bridegroom',
'scuba diver',
'rapeseed',
'daisy',
"yellow lady's slipper, yellow lady-slipper, Cypripedium calceolus, Cypripedium parviflorum",
'corn',
'acorn',
'hip, rose hip, rosehip',
'buckeye, horse chestnut, conker',
'coral fungus',
'agaric',
'gyromitra',
'stinkhorn, carrion fungus',
'earthstar',
'hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa',
'bolete',
'ear, spike, capitulum',
'toilet tissue, toilet paper, bathroom tissue'}

Map Imagenette Labels to Imagenet Labels#

Prepare Imagenette Data#

# To preserve reproducibility
g_seed = torch.Generator()
g_seed.manual_seed(SEED)

imagenette_train_loader = torch.utils.data.DataLoader(imagenette_train_subset,
                                                      batch_size=16,
                                                      shuffle=True,
                                                      num_workers=2,
                                                      worker_init_fn=seed_worker,
                                                      generator=g_seed
                                                      )

imagenette_val_loader = torch.utils.data.DataLoader(imagenette_val,
                                                    batch_size=16,
                                                    shuffle=False,
                                                    num_workers=2,
                                                    worker_init_fn=seed_worker,
                                                    generator=g_seed)

dataiter = iter(imagenette_val_loader)
images, labels = next(dataiter)

# Show images
plt.figure(figsize=(8, 8))
plt.imshow(make_grid(images, nrow=4).permute(1, 2, 0))
plt.axis('off')
plt.show()

../../../_images/e304a1a7d587e489c4eff92a3af11264b0f98d440b40ee9876bea10f6fcbce3f.png

eval_imagenette function#

Imagenette Train Loop#

This cell creates a ResNet model pretrained on ImageNet, a 1000 class image prediction dataset. The model is then trained to make predictions on Imagenette, a small subset of ImageNet classes that is useful for demonstrations and prototyping.

# Original network
top_1_accuracies = []
top_5_accuracies = []

# Instantiate a pretrained resnet model
set_seed(seed=SEED)
resnet = torchvision.models.resnet18(weights='ResNet18_Weights.DEFAULT').to(DEVICE)
resnet_opt = torch.optim.Adam(resnet.parameters(), lr=1e-4)
loss_fn = nn.CrossEntropyLoss()

imagenette_train_loop(resnet,
                      resnet_opt,
                      imagenette_train_loader,
                      loss_fn,
                      device=DEVICE)

top_1_acc, top_5_acc = eval_imagenette(resnet,
                                       imagenette_val_loader,
                                       len(imagenette_val),
                                       device=DEVICE)
top_1_accuracies.append(top_1_acc.item())
top_5_accuracies.append(top_5_acc.item())

Random seed 2021 has been set.

Coding Exercise 3.1: Use the ResNet model#

Complete the function below that runs a batch of images through the trained ResNet and returns the Top 5 class predictions and their probabilities. Note that the ResNet model returns unnormalized logits\(^\dagger\). To obtain probabilities, you need to normalize the logits using softmax.

\(^\dagger\) \( \text{logit}(p) = \sigma^{-1}(p) = \text{log} \left( \frac{p}{1-p} \right), \, \text{for} \, p \in (0,1)\), where \(\sigma(\cdot)\) is the sigmoid function, i.e., \(\sigma(z) = 1/(1+e^{-z})\). For more information see here.

def predict_top5(images, device, seed):
  """
  Function to predict top 5 classes

  Args:
    images: torch.tensor
      Image data with dimensionality B x C x H x W batch size x number of channels x height x width)
    device: STRING
      `cuda` if GPU is available, else `cpu`.

  Output:
    top5_probs: torch.tensor
      Tensor(B, 5) with top 5 class probabilities
    top5_names: list
      List of top 5 class names (B, 5)
  """
  ####################################################################
  # Fill in all missing code below (...),
  # then remove or comment the line below to test your function
  raise NotImplementedError("Predict top 5")
  ####################################################################
  set_seed(seed=seed)

  B = images.size(0)
  with torch.no_grad():
    # Run images through model
    images = ...
    output = ...
    # The model output is unnormalized. To get probabilities, run a softmax on it.
    probs = ...
    # Fetch output from GPU and convert to numpy array
    probs = ...

  # Get top 5 predictions
  _, top5_idcs = output.topk(5, 1, True, True)
  top5_idcs = top5_idcs.t().cpu().numpy()
  top5_probs = probs[torch.arange(B), top5_idcs]

  # Convert indices to class names
  top5_names = []
  for b in range(B):
    temp = [dict_map[key].split(',')[0] for key in top5_idcs[:, b]]
    top5_names.append(temp)

  return top5_names, top5_probs


# Get batch of images
dataiter = iter(imagenette_val_loader)
images, labels = next(dataiter)

## Uncomment to test your function and retrieve top 5 predictions
# top5_names, top5_probs = predict_top5(images, DEVICE, SEED)
# print(top5_names[1])

You will see something like this:

Random seed 2021 has been set.
['gas pump', 'chain saw', 'jinrikisha', 'rifle', 'turnstile']

Click for solution

Submit your feedback#

# Visualize probabilities of top 5 predictions
fig, ax = plt.subplots(5, 2, figsize=(10, 20))

for i in range(5):
  ax[i, 0].imshow(np.moveaxis(images[i].numpy(), 0, -1))
  ax[i, 0].axis('off')

  ax[i, 1].bar(np.arange(5), top5_probs[:, i])
  ax[i, 1].set_xticks(np.arange(5))
  ax[i, 1].set_xticklabels(top5_names[i], rotation=30)

fig.tight_layout()
plt.show()

Out-of-distribution examples#

The code below runs two out-of-distribution examples through the trained ResNet. Look at the predictions and discuss, why the model might fail to make accurate predictions on these images.

loc = 'https://raw.githubusercontent.com/NeuromatchAcademy/course-content-dl/main/tutorials/W2D2_Convnets/static/'

fname1 = 'bonsai-svg-5.png'
response = requests.get(loc + fname1)
image = Image.open(BytesIO(response.content)).resize((256, 256))
data = torch.from_numpy(np.asarray(image)[:, :, :3]) / 255.

fname2 = 'Pokémon_Pikachu_art.png'
response = requests.get(loc + fname2)
image = Image.open(BytesIO(response.content)).resize((256, 256))
data2 = torch.from_numpy(np.asarray(image)[:, :, :3]) / 255.

images = torch.stack([data, data2]).permute(0, 3, 1, 2)

# Retrieve top 5 predictions
top5_names, top5_probs = predict_top5(images, DEVICE, SEED)

# Visualize probabilities of top 5 predictions
fig, ax = plt.subplots(2, 2, figsize=(10, 10))

for i in range(2):
  ax[i, 0].imshow(np.moveaxis(images[i].numpy(), 0, -1))
  ax[i, 0].axis('off')

  ax[i, 1].bar(np.arange(5), top5_probs[:, i])
  ax[i, 1].set_xticks(np.arange(5))
  ax[i, 1].set_xticklabels(top5_names[i], rotation=30)

fig.tight_layout()
plt.show()

Section 4: Inception + ResNeXt#

Time estimate: ~27mins

Video 4: Improving efficiency: Inception and ResNeXt#

Submit your feedback#

ResNet vs ResNeXt#

https://raw.githubusercontent.com/NeuromatchAcademy/course-content-dl/main/tutorials/W2D2_Convnets/static/ResNets.png

Xie et al., 2016

Interactive Demo 4: ResNet vs. ResNeXt#

The widgets below calculate the number of parameters in a ResNet (top) and the parameters in a ResNeXt (bottom). We assume that the number of input and output channels (or feature maps) is the same (labeled “Channels in+out” in the widget). We refer to the number of channels after the first and the second layer of one block of either ResNet or ResNeXt as “bottleneck channels”.

The sliders are currently in the position that is displayed in the figure above. The goal of the following tasks is to investigate the difference in expressiveness and numbers of parameters in ResNet and ResNeXt.

Parameter Calculator#

Run this cell to enable the widget

Show code cell source Hide code cell source

# @title Parameter Calculator
# @markdown Run this cell to enable the widget
from IPython.display import display as dis

def calculate_parameters_resnet(d_in, resnet_channels):
    """
    ResNet math: Implement how parameters scale

    Args:
      d_in: int
        Input dimensionality
      resnet_channels: int
        Number of channels in ResNet

    Returns:
      None
    """
    d_out = d_in
    resnet_parameters = d_in*resnet_channels + 3*3*resnet_channels*resnet_channels + resnet_channels*d_out

    print('ResNet parameters: {}'.format(resnet_parameters))
    return None


def calculate_parameters_resnext(d_in, resnext_channels,
                                 num_paths):
    """
    ResNext math: Implement how parameters scale

    Args:
      d_in: int
        Input dimensionality
      resnet_channels: int
        Number of channels in ResNext
      num_paths: int
        Number of pathways in ResNext

    Returns:
      None
    """
    d_out = d_in
    d = resnext_channels

    resnext_parameters = (d_in*d + 3*3*d*d + d*d_out)*num_paths

    print('ResNeXt parameters: {}'.format(resnext_parameters))
    return None


labels = ['ResNet', 'ResNeXt']
descriptions_resnet = ['Channels in+out', 'Bottleneck channels']
descriptions_resnext = ['Channels in+out', 'Bottleneck channels',
                        'Number of paths (cardinality)']
lbox_resnet = widgets.VBox([widgets.Label(description) for description in descriptions_resnet])
lbox_resnext = widgets.VBox([widgets.Label(description) for description in descriptions_resnext])

d_in = widgets.FloatLogSlider(
    value=256,
    base=2,
    min=1, # Max exponent of base
    max=10, # Min exponent of base
    step=1, # Exponent step
)
resnet_channels = widgets.FloatLogSlider(
    value=64,
    base=2,
    min=5, # Max exponent of base
    max=10, # Min exponent of base
    step=1, # Exponent step
)
resnext_channels = widgets.FloatLogSlider(
    value=4,
    base=2,
    min=1, # Max exponent of base
    max=10, # Min exponent of base
    step=1, # Exponent step
)
num_paths = widgets.FloatLogSlider(
    value=32,
    base=2,
    min=0, # Max exponent of base
    max=7, # Min exponent of base
    step=1, # Exponent step
)

rbox_resnet = widgets.VBox([d_in, resnet_channels])
rbox_resnext = widgets.VBox([d_in, resnext_channels, num_paths])
ui_resnet = widgets.HBox([lbox_resnet, rbox_resnet])
ui_resnet_labeled = widgets.VBox(
    [widgets.HTML(value="<b>" + labels[0] + "</b>"), ui_resnet],
    layout=widgets.Layout(border='1px solid black'))
ui_resnext = widgets.HBox([lbox_resnext, rbox_resnext])
ui_resnext_labeled = widgets.VBox(
    [widgets.HTML(value="<b>" + labels[1] + "</b>"), ui_resnext],
    layout=widgets.Layout(border='1px solid black'))
ui = widgets.VBox([ui_resnet_labeled, ui_resnext_labeled])

out_resnet = widgets.interactive_output(calculate_parameters_resnet,
                     {'d_in':d_in,
                     'resnet_channels':resnet_channels})

out_resnext = widgets.interactive_output(calculate_parameters_resnext,
                     {'d_in':d_in,
                     'resnext_channels':resnext_channels,
                     'num_paths':num_paths})

d1 = dis(ui, out_resnet, out_resnext)

Submit your feedback#

Think! 4: ResNet vs. ResNeXt#

In the figure above, both networks, i.e., ResNet and ResNeXt, have a similar number of parameters.

How many channels are there in the bottleneck of the two networks, respectively?
How are these channels connected to each other from the first to the second layer in the blocks of the two networks, respectively?
What does it mean for the expressiveness of the two models relative to each other?

Click for solution

Submit your feedback#

Now we want to look at the number of parameters.

How does the difference in number of parameters change if we fix the number of channels in the bottleneck of both ResNet and ResNeXt to be 64, but vary the number of paths in ResNeXt? (8 paths with 8 channels each would be one such example)
Which number of paths results in the biggest parameter savings?

Click for solution

Section 5: Depthwise separable convolutions#

Time estimate: ~23mins

Video 5: Improving efficiency: MobileNet#

Submit your feedback#

Section 5.1: Depthwise separable convolutions#

Another way to reduce the computational cost of large models is the use of depthwise separable convolutions (introduced here). Depthwise separable convolutions are the key component making MobileNets efficient.

https://raw.githubusercontent.com/NeuromatchAcademy/course-content-dl/main/tutorials/W2D2_Convnets/static/SchematicCNN.png

Coding Exercise 5.1: Calculation of parameters#

Fill in the calculation of the parameters of regular convolution and depthwise separable convolution in the function below. Above you can see the example given in the video for you to check if your calculation is correct.

def convolution_math(in_channels, filter_size, out_channels):
  """
  Convolution math: Implement how parameters scale as a function of feature maps
  and filter size in convolution vs depthwise separable convolution.

  Args:
    in_channels : int
      Number of input channels
    filter_size : int
      Size of the filter
    out_channels : int
      Number of output channels

  Returns:
    None
  """
  ####################################################################
  # Fill in all missing code below (...),
  # then remove or comment the line below to test your function
  raise NotImplementedError("Convolution math")
  ####################################################################
  # Calculate the number of parameters for regular convolution
  conv_parameters = ...
  # Calculate the number of parameters for depthwise separable convolution
  depthwise_conv_parameters = ...

  print(f"Depthwise separable: {depthwise_conv_parameters} parameters")
  print(f"Regular convolution: {conv_parameters} parameters")

  return None



## Uncomment to test your function
# convolution_math(in_channels=4, filter_size=3, out_channels=2)

Depthwise separable: 44 parameters
Regular convolution: 72 parameters

Click for solution

Submit your feedback#

Think! 5.1: How do parameter savings depend the on number of input feature maps, 4 vs. 64?#

Click for solution

Submit your feedback#

Summary#

So far, you have learned about the modern Convnets (CNNs), their architecture, and operating principles. If you have time left, you can continue to learn more about the speed vs. accuracy trade-off, as well as the modern convnets in a facial recognition task.

Video 6: Summary and Outlook#

Submit your feedback#

Section 6: Speed-Accuracy Trade-Off / Different Backbones#

Time estimate: ~ 21mins

Video 7: Speed-accuracy trade-off#

Submit your feedback#

As the models got larger and the number of connections increased so did the computational costs involved. In the modern era of image processing, there is a tradeoff between model performance and computational cost. Models can reach extremely high performance on many problems, but achieving state of the art results requires huge amounts of compute power.

https://raw.githubusercontent.com/NeuromatchAcademy/course-content-dl/main/tutorials/W2D2_Convnets/static/compute_vs_performance.png

Coding Exercise 6: Compare accuracy and training speed of different models#

The goal is to load three pretrained models and fine-tune them. models is a dictionary where the keys are the names of the models and the values are the corresponding model objects. Currently the names are ResNet18, AlexNet and VGG-19. For a start, load these models from torchvision.models and make sure they are pretrained.

If you want to try other models, just change the dictionary, or if you want to even try out more than three models, just add them to the dictionary and add their learning rates in the array below.

Imagenette Train Loop: `train_loop(model, optimizer, train_loader, loss_fn, device)`#

Run the models: `run_models(models, lr_rates)`#

Plot accuracies vs. training speed#

def create_models(weights):
  """
  Creates models

  Args:
    weights: list of strings
      If True, load pretrained models.

  Returns:
    models: dict
      Log of models
    lr_rates: list
      Log of learning rates
  """
  ####################################################################
  # Fill in all missing code below (...),
  # then remove or comment the line below to test your function
  raise NotImplementedError("create pretrained models")
  ####################################################################
  # Load three pretrained models from torchvision.models
  # [these are just examples, other models are possible as well]
  model1 = ...
  model2 = ...
  model3 = ...

  models = {'...': model1, '...': model2, '...': model3}
  lr_rates = [1e-4, 1e-4, 1e-4]

  return models, lr_rates


weight_list = ['ResNet18_Weights.DEFAULT', 'AlexNet_Weights.DEFAULT', 'VGG19_Weights.DEFAULT']
## Uncomment below to test your function
# models, lr_rates = create_models(weights=weight_list)
# times, top_1_accuracies = run_models(models, lr_rates)
# plot_acc_speed(times, top_1_accuracies, models)

Click for solution

Example output:

Submit your feedback#

Exercise 6.1: Finding the best model#

Look at the plot above. It shows the training speed vs. the accuracy of the models you chose. The training speed is measured as the mean time the training takes per epoch. The size of the marker visualizes the number of parameters of the model.

Which model seems to be the best for this task and why? Explain your conclusion based on speed, accuracy and number of parameters.

Click for solution

Submit your feedback#

Exercise 6.2: Speed and accuracy correlation#

How does the speed correlate with the accuracy? Are faster models also more accurate?

Click for solution

Submit your feedback#

Section 7: Face Recognition#

Time estimate: ~12mins

Section 7.1: Download and prepare the data#

Download Faces Data#

Data is being downloaded...

The download has been completed.

Video 8: Face Recognition using CNNs#

Submit your feedback#

One application of large CNNs is facial recognition. The problem formulation in facial recognition is a little different from the image classification we’ve seen so far. In facial recognition, we don’t want to have a fixed number of individuals that the model can learn. If that were the case then to learn a new person it would be necessary to modify the output portion of the architecture and retrain to account for the new person.

Instead, we train a model to learn an embedding where images from the same individual are close to each other in an embedded space, and images corresponding to different people are far apart. When the model is trained, it takes as input an image and outputs an embedding vector corresponding to the image.

To achieve this, facial recognitions typically use a triplet loss that compares two images from the same individual (i.e., “anchor” and “positive” images) and a negative image from a different individual (i.e., “negative” image). The loss requires the distance between the anchor and negative points to be greater than a margin \(\alpha\) + the distance between the anchor and positive points.

Section 7.2: View and transform the data#

A well-trained facial recognition system should be able to map different images of the same individual relatively close together. We will load 15 images of three individuals (maybe you know them - then you can see that your brain is quite well in facial recognition).

After viewing the images, we will transform them: MTCNN (github repo) detects the face and crops the image around the face. Then we stack all the images together in a tensor.

Display Images#

Here are the source images of Bruce Lee, Neil Patrick Harris, and Pam Grier

../../../_images/7b79f623d38851b0539278b16584cf8fbcad0c8b6cc7c408d9d8be366f1092b6.png

Image Preprocessing Function#

Now that we have our images loaded, we need to preprocess them. To make the images easier for the network to learn, we crop them to include just faces.

bruce_tensor, bruce_display = process_images('faces/bruce/*.jpg')
neil_tensor, neil_display = process_images('faces/neil/*.jpg')
pam_tensor, pam_display = process_images('faces/pam/*.jpg')

tensor_to_display = torch.cat((bruce_display, neil_display, pam_display))

plt.figure(figsize=(15, 15))
plt.imshow(make_grid(tensor_to_display, nrow=15).permute(1, 2, 0))
plt.axis('off')
plt.show()

../../../_images/ecd2d804646037bd26627d1c7707d3dc25d97c68e28d2126383e58dee04d15e1.png

Section 7.3: Embedding with a pretrained network#

We load a pretrained facial recognition model called FaceNet. It was trained on the VGGFace2 dataset which contains 3.31 million images of 9131 individuals.

We use the pretrained model to calculate embeddings for all of our input images.

resnet = InceptionResnetV1(pretrained='vggface2').eval().to(DEVICE)

# Calculate embedding
resnet.classify = False
bruce_embeddings = resnet(bruce_tensor.to(DEVICE))
neil_embeddings = resnet(neil_tensor.to(DEVICE))
pam_embeddings = resnet(pam_tensor.to(DEVICE))

Think! 7.3: Embedding vectors#

We want to understand what happens when the model receives an image and returns the corresponding embedding vector.

What are the height, width and number of channels of one input image?
What are the dimensions of one stack of images (e.g. bruce_tensor)?
What are the dimensions of the corresponding embedding (e.g. bruce_embeddings)?
What would be the dimensions of the embedding of one input image?

Hints:

You can double click on a variable name and hover over it to see the dimensions of tensors.
You do not have to answer the questions in the order they are asked.

Click for solution

Submit your feedback#

We cannot show 512-dimensional vectors visually, but using Principal Component Analysis (PCA) we can project the 512 dimensions onto a 2-dimensional space while preserving the maximum amount of data variation possible. This is just a visual aid for us to understand the concept. Note that if you would like to do any calculation, like distances between two images, this would be done with the whole 512-dimensional embedding vectors.

embedding_tensor = torch.cat((bruce_embeddings,
                              neil_embeddings,
                              pam_embeddings)).to(device='cpu')

pca = sklearn.decomposition.PCA(n_components=2)
pca_tensor = pca.fit_transform(embedding_tensor.detach().cpu().numpy())

num = 15
categs = 3
colors = ['blue', 'orange', 'magenta']
labels = ['Bruce Lee', 'Neil Patrick Harris', 'Pam Grier']
markers = ['o', 'x', 's']
plt.figure(figsize=(8, 8))
for i in range(categs):
   plt.scatter(pca_tensor[i*num:(i+1)*num, 0],
               pca_tensor[i*num:(i+1)*num, 1],
               c=colors[i],
               marker=markers[i], label=labels[i])
plt.legend()
plt.title('PCA Representation of the Image Embeddings')
plt.xlabel('PC 1')
plt.ylabel('PC 2')
plt.show()

../../../_images/806d9d0377a24bc63a19742e162c5b7a3fc16c568ca4e28e0dd7034092036626.png

Great! The images corresponding to each individual are separated from each other in the embedding space!

If Neil Patrick Harris wants to unlock his phone with facial recognition, the phone takes the image from the camera, calculates the embedding and checks if it is close to the registered embeddings corresponding to Neil Patrick Harris.

Section 8: Ethics – bias/discrimination due to pre-training datasets#

Time estimate: ~19mins

Popular facial recognition datasets like VGGFace2 and CASIA-WebFace consist primarily of caucasian faces. As a result, even state of the art facial recognition models substantially underperform when attempting to recognize faces of other races.

Given the implications that poor model performance can have in fields like security and criminal justice, it’s very important to be aware of these limitations if you’re going to be building facial recognition systems.

In this example we will work with a small subset from the UTKFace dataset with 49 pictures of black women and 49 picture of white women. We will use the same pretrained model as in Section 8 of Tutorial 1, see and discuss the consequences of the model being trained on an imbalanced dataset.

Video 9: Ethical aspects#

Submit your feedback#

Section 8.1: Download the Data#

Run this cell to get the data#

Data is being downloaded...

The download has been completed.

Section 8.2: Load, view and transform the data#

black_female_tensor, black_female_display = process_images('face_sample2/??_1_1_*.jpg', size=150)
white_female_tensor, white_female_display = process_images('face_sample2/??_1_0_*.jpg', size=150)

We can check the dimensions of these tensors and see that for each group we have images of size \(150 \times 150\) and three channels (RGB) of 49 individuals.

Note: Originally, the size of images was \(200 \times 200\), but due to RAM resources, we have reduced it. You can change it back, i.e., size=200.

print(white_female_tensor.shape)
print(black_female_tensor.shape)

torch.Size([49, 3, 150, 150])
torch.Size([49, 3, 150, 150])

Visualize some example faces#

../../../_images/1bd9dfb9240eb96ddc09e819a0a901d4b04ff6b0e395393ff02fa7aea4670d80.png

Section 8.3: Calculate embeddings#

We use the same pretrained facial recognition network as in section 8 to calculate embeddings. If you have memory issues running this part, go to Edit > Notebook settings and check if GPU is selected as Hardware accelerator. If this does not help you can restart the notebook, go to Runtime -> Restart runtime.

resnet.classify = False
black_female_embeddings = resnet(black_female_tensor.to(DEVICE))
white_female_embeddings = resnet(white_female_tensor.to(DEVICE))

We will use the embeddings to show that the model was trained on an imbalanced dataset. For this, we are going to calculate a distance matrix of all combinations of images, like in this small example with \(n=3\) (in our case \(n=98\)).

https://raw.githubusercontent.com/richardvogg/face_sample/main/04_DistanceMatrix.png

Calculate the distance between each pair of image embeddings in our tensor and visualize all the distances. Remember that two embeddings are vectors and the distance between two vectors is the Euclidean distance.

Function to calculate pairwise distances#

torch.cdist is used

Visualize the distances#

../../../_images/be676570b248fe9b3d404d8202d25a5049c2b569eb7fe3f12d1f810c427b3a45.png

Exercise 8.1: Face similarity#

What do you observe? The faces of which group are more similar to each other for the Face Detection algorithm?

Click for solution

Submit your feedback#

Exercise 8.2: Embeddings#

What does it mean in real life applications that the distance is smaller between the embeddings of one group?
Can you come up with example situations/applications where this has a negative impact?
What could you do to avoid these problems?

Click for solution

Submit your feedback#

Lastly, to show the importance of the dataset which you use to pretrain your model, look at how much space white men and women take in different embeddings. FairFace is a dataset which is specifically created with completely balanced classes. The blue dots in all visualizations are white male and white female.

Adopted from Kärkkäinen and Joo, 2019, arXiv

Section 8.4: Within Sum of Squares#

Time estimate: ~10mins

We can try to put this observation in numbers. For this we work with the embeddings. We want to calculate the centroid of each group, which is the average of the 49 embeddings of the group. As each embedding vector has a dimension of 512, the centroid will also have this dimension.

Now we can calculate how far away the observations \(x\) of each group \(S_i\) are from the centroid \(\mu_i\). This concept is known as Within Sum of Squares (WSS) from cluster analysis.

(68)#\[\begin{equation} \text{WSS} = \sum_{x\in S_i} ||x - \mu_i||^2 \end{equation}\]

where \(|| \cdot ||\) is the Euclidean norm.

The Within Sum of Squares (WSS) is a number which measures this variability of a group in the embedding space. If all embeddings of one group were very close to each other, the WSS would be very small. In our case we see that the WSS for the black females is much smaller than for the white females. This means that it is much harder for the model to distinguish two black females than to distinguish two white females. The WSS complements the observation from the distance matrix, where we observed overall smaller pairwise distances between black females.

Function to calculate WSS#

Let’s calculate the WSS for the two groups of our example.

Black female embedding WSS: 34.23
White female embedding WSS: 44.59

Summary#

In this tutorial we have learned how to apply a modern convnet in real application such as facial recognition. However, as the state-of-the-art tools for facial recognition are trained mostly with caucasian faces, they fail or they perform much worse when they have to deal with faces from other races.

Bonus Tutorial: Modern ConvNets and Facial Recognition (Bonus)

Contents

Bonus Tutorial: Modern ConvNets and Facial Recognition (Bonus)#

Tutorial Objectives#

Setup#

Install and import feedback gadget#

Uncomment to install dependencies#

Figure settings#

Set random seed#

Set device (GPU or CPU). Execute set_device()#

Section 1: The History of Convnets#

Video 1: History of convnets#

Submit your feedback#

Think! 1: Challenges of improving CNNs#

Submit your feedback#

Section 2: Big and Deep Convnets#

Video 2: AlexNet & VGG#

Submit your feedback#

Section 2.1: Introduction to AlexNet#

Import Alexnet#

Section 2.2: What does AlexNet learn?#

Think! 2.2.1: Filter Similarity#

Submit your feedback#

Interactive Demo 3.2: What does AlexNet see?#

Submit your feedback#

Think! 2.2.2 Filter Purpose#

Submit your feedback#

Further Reading#

Section 3: Convnets After AlexNet#

Video 3: Residual Networks (ResNets)#

Submit your feedback#

Download imagenette#

Set Up Textual ImageNet labels#

Map Imagenette Labels to Imagenet Labels#

Prepare Imagenette Data#

eval_imagenette function#

Imagenette Train Loop#

Coding Exercise 3.1: Use the ResNet model#

Submit your feedback#

Out-of-distribution examples#

Section 4: Inception + ResNeXt#

Video 4: Improving efficiency: Inception and ResNeXt#

Submit your feedback#

ResNet vs ResNeXt#

Interactive Demo 4: ResNet vs. ResNeXt#

Parameter Calculator#

Submit your feedback#

Think! 4: ResNet vs. ResNeXt#

Submit your feedback#

Section 5: Depthwise separable convolutions#

Video 5: Improving efficiency: MobileNet#

Submit your feedback#

Section 5.1: Depthwise separable convolutions#

Coding Exercise 5.1: Calculation of parameters#

Submit your feedback#

Think! 5.1: How do parameter savings depend the on number of input feature maps, 4 vs. 64?#

Submit your feedback#

Summary#

Video 6: Summary and Outlook#

Submit your feedback#

Section 6: Speed-Accuracy Trade-Off / Different Backbones#

Video 7: Speed-accuracy trade-off#

Submit your feedback#

Coding Exercise 6: Compare accuracy and training speed of different models#

Imagenette Train Loop: train_loop(model, optimizer, train_loader, loss_fn, device)#

Run the models: run_models(models, lr_rates)#

Plot accuracies vs. training speed#

Submit your feedback#

Exercise 6.1: Finding the best model#

Submit your feedback#

Exercise 6.2: Speed and accuracy correlation#

Submit your feedback#

Section 7: Face Recognition#

Section 7.1: Download and prepare the data#

Download Faces Data#

Video 8: Face Recognition using CNNs#

Submit your feedback#

Section 7.2: View and transform the data#

Display Images#

Image Preprocessing Function#

Set device (GPU or CPU). Execute `set_device()`#

Imagenette Train Loop: `train_loop(model, optimizer, train_loader, loss_fn, device)`#

Run the models: `run_models(models, lr_rates)`#