{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {},
"id": "view-in-github"
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"# Bonus Tutorial: Facial recognition using modern convnets\n",
"\n",
"**Week 2, Day 3: Modern Convnets**\n",
"\n",
"**By Neuromatch Academy**\n",
"\n",
"__Content creators:__ Laura Pede, Richard Vogg, Marissa Weis, Timo Lüddecke, Alexander Ecker\n",
"\n",
"__Content reviewers:__ Arush Tagade, Polina Turishcheva, Yu-Fang Yang, Bettina Hein, Melvin Selim Atay, Kelson Shilling-Scrivo\n",
"\n",
"__Content editors:__ Roberto Guidotti, Spiros Chavlis\n",
"\n",
"__Production editors:__ Anoop Kulkarni, Roberto Guidotti, Cary Murray, Gagana B, Spiros Chavlis\n",
"\n",
"
\n",
"\n",
"*Notebook is based on an initial version by Ben Heil*"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Tutorial Objectives\n",
"\n",
"In this tutorial you will learn about:\n",
"\n",
"1. An application of modern CNNs in facial recognition.\n",
"2. Ethical aspects of facial recognition."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @markdown\n",
"from IPython.display import IFrame\n",
"from ipywidgets import widgets\n",
"out = widgets.Output()\n",
"with out:\n",
" print(f\"If you want to download the slides: https://osf.io/download/4r2dp/\")\n",
" display(IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/4r2dp/?direct%26mode=render%26action=download%26mode=render\", width=730, height=410))\n",
"display(out)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"These are the slides for the videos in this tutorial. If you want to download locally the slides, click [here](https://osf.io/4r2dp/download)."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install dependencies\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Install `facenet` - A model used to do facial recognition\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Install dependencies\n",
"# @markdown Install `facenet` - A model used to do facial recognition\n",
"!pip install facenet-pytorch --quiet\n",
"!pip install Pillow --quiet"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install and import feedback gadget\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Install and import feedback gadget\n",
"\n",
"!pip3 install vibecheck datatops --quiet\n",
"\n",
"from vibecheck import DatatopsContentReviewContainer\n",
"def content_review(notebook_section: str):\n",
" return DatatopsContentReviewContainer(\n",
" \"\", # No text prompt\n",
" notebook_section,\n",
" {\n",
" \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n",
" \"name\": \"neuromatch_dl\",\n",
" \"user_key\": \"f379rz8y\",\n",
" },\n",
" ).render()\n",
"\n",
"\n",
"feedback_prefix = \"W2D3_T2_Bonus\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"# Imports\n",
"import glob\n",
"import torch\n",
"\n",
"import numpy as np\n",
"import sklearn.decomposition\n",
"import matplotlib.pyplot as plt\n",
"\n",
"from PIL import Image\n",
"\n",
"from torchvision import transforms\n",
"from torchvision.utils import make_grid\n",
"from torchvision.datasets import ImageFolder\n",
"\n",
"from facenet_pytorch import MTCNN, InceptionResnetV1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set random seed\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Executing `set_seed(seed=seed)` you are setting the seed\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Set random seed\n",
"\n",
"# @markdown Executing `set_seed(seed=seed)` you are setting the seed\n",
"\n",
"# For DL its critical to set the random seed so that students can have a\n",
"# baseline to compare their results to expected results.\n",
"# Read more here: https://pytorch.org/docs/stable/notes/randomness.html\n",
"\n",
"# Call `set_seed` function in the exercises to ensure reproducibility.\n",
"import random\n",
"import torch\n",
"\n",
"def set_seed(seed=None, seed_torch=True):\n",
" \"\"\"\n",
" Function that controls randomness. NumPy and random modules must be imported.\n",
"\n",
" Args:\n",
" seed : Integer\n",
" A non-negative integer that defines the random state. Default is `None`.\n",
" seed_torch : Boolean\n",
" If `True` sets the random seed for pytorch tensors, so pytorch module\n",
" must be imported. Default is `True`.\n",
"\n",
" Returns:\n",
" Nothing.\n",
" \"\"\"\n",
" if seed is None:\n",
" seed = np.random.choice(2 ** 32)\n",
" random.seed(seed)\n",
" np.random.seed(seed)\n",
" if seed_torch:\n",
" torch.manual_seed(seed)\n",
" torch.cuda.manual_seed_all(seed)\n",
" torch.cuda.manual_seed(seed)\n",
" torch.backends.cudnn.benchmark = False\n",
" torch.backends.cudnn.deterministic = True\n",
"\n",
" print(f'Random seed {seed} has been set.')\n",
"\n",
"\n",
"# In case that `DataLoader` is used\n",
"def seed_worker(worker_id):\n",
" \"\"\"\n",
" DataLoader will reseed workers following randomness in\n",
" multi-process data loading algorithm.\n",
"\n",
" Args:\n",
" worker_id: integer\n",
" ID of subprocess to seed. 0 means that\n",
" the data will be loaded in the main process\n",
" Refer: https://pytorch.org/docs/stable/data.html#data-loading-randomness for more details\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" worker_seed = torch.initial_seed() % 2**32\n",
" np.random.seed(worker_seed)\n",
" random.seed(worker_seed)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set device (GPU or CPU). Execute `set_device()`\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Set device (GPU or CPU). Execute `set_device()`\n",
"# especially if torch modules used.\n",
"\n",
"# Inform the user if the notebook uses GPU or CPU.\n",
"\n",
"def set_device():\n",
" \"\"\"\n",
" Set the device. CUDA if available, CPU otherwise\n",
"\n",
" Args:\n",
" None\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
" if device != \"cuda\":\n",
" print(\"WARNING: For this notebook to perform best, \"\n",
" \"if possible, in the menu under `Runtime` -> \"\n",
" \"`Change runtime type.` select `GPU` \")\n",
" else:\n",
" print(\"GPU is enabled in this notebook.\")\n",
"\n",
" return device"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"SEED = 2021\n",
"set_seed(seed=SEED)\n",
"DEVICE = set_device()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Section 1: Face Recognition\n",
"\n",
"*Time estimate: ~12mins*"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 1.1: Download and prepare the data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download Faces Data\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Download Faces Data\n",
"import requests, zipfile, io, os\n",
"\n",
"# Original link: https://github.com/ben-heil/cis_522_data.git\n",
"url = 'https://osf.io/2kyfb/download'\n",
"\n",
"fname = 'faces'\n",
"\n",
"if not os.path.exists(fname+'zip'):\n",
" print(\"Data is being downloaded...\")\n",
" r = requests.get(url, stream=True)\n",
" z = zipfile.ZipFile(io.BytesIO(r.content))\n",
" z.extractall()\n",
" print(\"The download has been completed.\")\n",
"else:\n",
" print(\"Data has already been downloaded.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Video 1: Face Recognition using CNNs\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 1: Face Recognition using CNNs\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'jJqEv8hpRa4'), ('Bilibili', 'BV17B4y1K7WV')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Face_Recognition_using_CNNs_Video\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"One application of large CNNs is **facial recognition**. The problem formulation in facial recognition is a little different from the image classification we've seen so far. In facial recognition, we don't want to have a fixed number of individuals that the model can learn. If that were the case then to learn a new person it would be necessary to modify the output portion of the architecture and retrain to account for the new person.\n",
"\n",
"Instead, we train a model to learn an **embedding** where images from the same individual are close to each other in an embedded space, and images corresponding to different people are far apart. When the model is trained, it takes as input an image and outputs an embedding vector corresponding to the image.\n",
"\n",
"To achieve this, facial recognitions typically use a **triplet loss** that compares two images from the same individual (i.e., \"anchor\" and \"positive\" images) and a negative image from a different individual (i.e., \"negative\" image). The loss requires the distance between the anchor and negative points to be greater than a margin $\\alpha$ + the distance between the anchor and positive points."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 1.2: View and transform the data\n",
"\n",
"A well-trained facial recognition system should be able to map different images of the same individual relatively close together. We will load 15 images of three individuals (maybe you know them - then you can see that your brain is quite well in facial recognition).\n",
"\n",
"After viewing the images, we will transform them: MTCNN ([github repo](https://github.com/ipazc/mtcnn)) detects the face and crops the image around the face. Then we stack all the images together in a tensor."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Display Images\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Here are the source images of Bruce Lee, Neil Patrick Harris, and Pam Grier\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Display Images\n",
"# @markdown Here are the source images of Bruce Lee, Neil Patrick Harris, and Pam Grier\n",
"train_transform = transforms.Compose((transforms.Resize((256, 256)),\n",
" transforms.ToTensor()))\n",
"\n",
"face_dataset = ImageFolder('faces', transform=train_transform)\n",
"\n",
"image_count = len(face_dataset)\n",
"\n",
"face_loader = torch.utils.data.DataLoader(face_dataset,\n",
" batch_size=45,\n",
" shuffle=False)\n",
"\n",
"dataiter = iter(face_loader)\n",
"images, labels = next(dataiter)\n",
"\n",
"# Show images\n",
"plt.figure(figsize=(15, 15))\n",
"plt.imshow(make_grid(images, nrow=15).permute(1, 2, 0))\n",
"plt.axis('off')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Image Preprocessing Function\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Image Preprocessing Function\n",
"def process_images(image_dir: str, size=256):\n",
" \"\"\"\n",
" This function returns two tensors for the\n",
" given image dir: one usable for inputting into the\n",
" facenet model, and one that is [0,1] scaled for\n",
" visualizing\n",
"\n",
" Parameters:\n",
" image_dir: string\n",
" The glob corresponding to images in a directory\n",
" size: int\n",
" Size [default: 256]\n",
"\n",
" Returns:\n",
" model_tensor: torch.tensor\n",
" A image_count x channels x height x width\n",
" tensor scaled to between -1 and 1,\n",
" with the faces detected and cropped to the center\n",
" using mtcnn\n",
" display_tensor: torch.tensor\n",
" A transformed version of the model\n",
" tensor scaled to between 0 and 1\n",
" \"\"\"\n",
" mtcnn = MTCNN(image_size=size, margin=32)\n",
" images = []\n",
" for img_path in glob.glob(image_dir):\n",
" img = Image.open(img_path)\n",
" # Normalize and crop image\n",
" img_cropped = mtcnn(img)\n",
" images.append(img_cropped)\n",
"\n",
" model_tensor = torch.stack(images)\n",
" display_tensor = model_tensor / (model_tensor.max() * 2)\n",
" display_tensor += .5\n",
"\n",
" return model_tensor, display_tensor"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Now that we have our images loaded, we need to preprocess them. To make the images easier for the network to learn, we crop them to include just faces."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"bruce_tensor, bruce_display = process_images('faces/bruce/*.jpg')\n",
"neil_tensor, neil_display = process_images('faces/neil/*.jpg')\n",
"pam_tensor, pam_display = process_images('faces/pam/*.jpg')\n",
"\n",
"tensor_to_display = torch.cat((bruce_display, neil_display, pam_display))\n",
"\n",
"plt.figure(figsize=(15, 15))\n",
"plt.imshow(make_grid(tensor_to_display, nrow=15).permute(1, 2, 0))\n",
"plt.axis('off')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 1.3: Embedding with a pretrained network\n",
"\n",
"We load a pretrained facial recognition model called [FaceNet](https://github.com/timesler/facenet-pytorch). It was trained on the [VGGFace2](https://github.com/ox-vgg/vgg_face2) dataset which contains 3.31 million images of 9131 individuals.\n",
"\n",
"We use the pretrained model to calculate embeddings for all of our input images."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"resnet = InceptionResnetV1(pretrained='vggface2').eval().to(DEVICE)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"# Calculate embedding\n",
"resnet.classify = False\n",
"bruce_embeddings = resnet(bruce_tensor.to(DEVICE))\n",
"neil_embeddings = resnet(neil_tensor.to(DEVICE))\n",
"pam_embeddings = resnet(pam_tensor.to(DEVICE))"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Think! 1.3: Embedding vectors\n",
"\n",
"We want to understand what happens when the model receives an image and returns the corresponding embedding vector.\n",
"\n",
"- What are the height, width and number of channels of one input image?\n",
"- What are the dimensions of one stack of images (e.g. bruce_tensor)?\n",
"- What are the dimensions of the corresponding embedding (e.g. bruce_embeddings)?\n",
"- What would be the dimensions of the embedding of one input image?\n",
"\n",
"\n",
"**Hints:**\n",
"- You can double click on a variable name and hover over it to see the dimensions of tensors.\n",
"- You do not have to answer the questions in the order they are asked."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W2D3_ModernConvnets/solutions/W2D3_Tutorial2_Solution_63b9d19f.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Embedding_vectors_Discussion\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"We cannot show 512-dimensional vectors visually, but using **Principal Component Analysis (PCA)** we can project the 512 dimensions onto a 2-dimensional space while preserving the maximum amount of data variation possible. This is just a visual aid for us to understand the concept. Note that if you would like to do any calculation, like distances between two images, this would be done with the whole 512-dimensional embedding vectors."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"embedding_tensor = torch.cat((bruce_embeddings,\n",
" neil_embeddings,\n",
" pam_embeddings)).to(device='cpu')\n",
"\n",
"pca = sklearn.decomposition.PCA(n_components=2)\n",
"pca_tensor = pca.fit_transform(embedding_tensor.detach().cpu().numpy())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"num = 15\n",
"categs = 3\n",
"colors = ['blue', 'orange', 'magenta']\n",
"labels = ['Bruce Lee', 'Neil Patrick Harris', 'Pam Grier']\n",
"markers = ['o', 'x', 's']\n",
"plt.figure(figsize=(8, 8))\n",
"for i in range(categs):\n",
" plt.scatter(pca_tensor[i*num:(i+1)*num, 0],\n",
" pca_tensor[i*num:(i+1)*num, 1],\n",
" c=colors[i],\n",
" marker=markers[i], label=labels[i])\n",
"plt.legend()\n",
"plt.title('PCA Representation of the Image Embeddings')\n",
"plt.xlabel('PC 1')\n",
"plt.ylabel('PC 2')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Great! The images corresponding to each individual are separated from each other in the embedding space!\n",
"\n",
"If Neil Patrick Harris wants to unlock his phone with facial recognition, the phone takes the image from the camera, calculates the embedding and checks if it is close to the registered embeddings corresponding to Neil Patrick Harris."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Section 2: Ethics – bias/discrimination due to pre-training datasets\n",
"\n",
"*Time estimate: ~19mins*"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Popular facial recognition datasets like VGGFace2 and CASIA-WebFace consist primarily of caucasian faces.\n",
"As a result, even state of the art facial recognition models [substantially underperform](https://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Racial_Faces_in_the_Wild_Reducing_Racial_Bias_by_Information_ICCV_2019_paper.pdf) when attempting to recognize faces of other races.\n",
"\n",
"Given the implications that poor model performance can have in fields like security and criminal justice, it's very important to be aware of these limitations if you're going to be building facial recognition systems.\n",
"\n",
"In this example we will work with a small subset from the [UTKFace](https://susanqq.github.io/UTKFace/) dataset with 49 pictures of black women and 49 picture of white women. We will use the same pretrained model as in Section 8 of Tutorial 1, see and discuss the consequences of the model being trained on an imbalanced dataset."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Video 2: Ethical aspects\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 2: Ethical aspects\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'vYilJV3PqUM'), ('Bilibili', 'BV1Jo4y1Q7K3')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Ethical_aspects_Video\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 2.1: Download the Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Run this cell to get the data\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Run this cell to get the data\n",
"\n",
"# Original link: https://github.com/richardvogg/face_sample.git\n",
"url = 'https://osf.io/36wyh/download'\n",
"fname = 'face_sample2'\n",
"\n",
"if not os.path.exists(fname+'zip'):\n",
" print(\"Data is being downloaded...\")\n",
" r = requests.get(url, stream=True)\n",
" z = zipfile.ZipFile(io.BytesIO(r.content))\n",
" z.extractall()\n",
" print(\"The download has been completed.\")\n",
"else:\n",
" print(\"Data has already been downloaded.\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 2.2: Load, view and transform the data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"black_female_tensor, black_female_display = process_images('face_sample2/??_1_1_*.jpg', size=150)\n",
"white_female_tensor, white_female_display = process_images('face_sample2/??_1_0_*.jpg', size=150)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"We can check the dimensions of these tensors and see that for each group we have images of size $150 \\times 150$ and three channels (RGB) of 49 individuals.\n",
"\n",
"**Note:** Originally, the size of images was $200 \\times 200$, but due to RAM resources, we have reduced it. You can change it back, i.e., `size=200`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"print(white_female_tensor.shape)\n",
"print(black_female_tensor.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Visualize some example faces\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Visualize some example faces\n",
"tensor_to_display = torch.cat((white_female_display[:15],\n",
" black_female_display[:15]))\n",
"\n",
"plt.figure(figsize=(12, 12))\n",
"plt.imshow(make_grid(tensor_to_display, nrow = 15).permute(1, 2, 0))\n",
"plt.axis('off')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 2.3: Calculate embeddings\n",
"\n",
"We use the same pretrained facial recognition network as in section 8 to calculate embeddings. If you have memory issues running this part, go to `Edit > Notebook settings` and check if GPU is selected as `Hardware accelerator`. If this does not help you can restart the notebook, go to `Runtime -> Restart runtime`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"resnet.classify = False\n",
"black_female_embeddings = resnet(black_female_tensor.to(DEVICE))\n",
"white_female_embeddings = resnet(white_female_tensor.to(DEVICE))"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"We will use the embeddings to show that the model was trained on an imbalanced dataset. For this, we are going to calculate a distance matrix of all combinations of images, like in this small example with $n=3$ (in our case $n=98$).\n",
"\n",
"
\n",
"\n",
"Calculate the distance between each pair of image embeddings in our tensor and visualize all the distances. Remember that two embeddings are vectors and the distance between two vectors is the Euclidean distance."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Function to calculate pairwise distances\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" [`torch.cdist`](https://pytorch.org/docs/stable/generated/torch.cdist.html) is used\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Function to calculate pairwise distances\n",
"\n",
"# @markdown [`torch.cdist`](https://pytorch.org/docs/stable/generated/torch.cdist.html) is used\n",
"\n",
"def calculate_pairwise_distances(embedding_tensor):\n",
" \"\"\"\n",
" This function calculates the distance\n",
" between each pair of image embeddings\n",
" in a tensor using the `torch.cdist`.\n",
"\n",
" Parameters:\n",
" embedding_tensor : torch.Tensor\n",
" A num_images x embedding_dimension tensor\n",
"\n",
" Returns:\n",
" distances : torch.Tensor\n",
" A num_images x num_images tensor\n",
" containing the pairwise distances between\n",
" each to image embedding\n",
" \"\"\"\n",
"\n",
" distances = torch.cdist(embedding_tensor, embedding_tensor)\n",
"\n",
" return distances"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Visualize the distances\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Visualize the distances\n",
"\n",
"embedding_tensor = torch.cat((black_female_embeddings,\n",
" white_female_embeddings)).to(device='cpu')\n",
"\n",
"distances = calculate_pairwise_distances(embedding_tensor)\n",
"\n",
"plt.figure(figsize=(8, 8))\n",
"plt.imshow(distances.detach().cpu().numpy())\n",
"plt.annotate('Black female', (2, -0.5), fontsize=20, va='bottom')\n",
"plt.annotate('White female', (52, -0.5), fontsize=20, va='bottom')\n",
"plt.annotate('Black female', (-0.5, 45), fontsize=20, rotation=90, ha='right')\n",
"plt.annotate('White female', (-0.5, 90), fontsize=20, rotation=90, ha='right')\n",
"cbar = plt.colorbar()\n",
"cbar.set_label('Distance', fontsize=16)\n",
"plt.axis('off')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Exercise 2.1: Face similarity\n",
"\n",
"What do you observe? The faces of which group are more similar to each other for the Face Detection algorithm?"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W2D3_ModernConvnets/solutions/W2D3_Tutorial2_Solution_2309aa23.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Face_Similarity_Discussion\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Exercise 2.2: Embeddings\n",
"\n",
"- What does it mean in real life applications that the distance is smaller between the embeddings of one group?\n",
"- Can you come up with example situations/applications where this has a negative impact?\n",
"- What could you do to avoid these problems?"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W2D3_ModernConvnets/solutions/W2D3_Tutorial2_Solution_9a053438.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Embeddings_Discussion\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Lastly, to show the importance of the dataset which you use to pretrain your model, look at how much space white men and women take in different embeddings. *FairFace* is a dataset which is specifically created with completely balanced classes. The blue dots in all visualizations are white male and white female."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"
\n",
"\n",
"Adopted from [Kärkkäinen and Joo, 2019, arXiv](https://arxiv.org/abs/1908.04913)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Section 3: Within Sum of Squares\n",
"\n",
"*Time estimate: ~10mins*\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"We can try to put this observation in numbers. For this we work with the embeddings.\n",
"We want to calculate the centroid of each group, which is the average of the 49 embeddings of the group. As each embedding vector has a dimension of 512, the centroid will also have this dimension.\n",
"\n",
"Now we can calculate how far away the observations $x$ of each group $S_i$ are from the centroid $\\mu_i$. This concept is known as Within Sum of Squares (WSS) from cluster analysis.\n",
"\n",
"\\begin{equation}\n",
"\\text{WSS} = \\sum_{x\\in S_i} ||x - \\mu_i||^2\n",
"\\end{equation}\n",
"\n",
"where $|| \\cdot ||$ is the Euclidean norm.\n",
"\n",
"The Within Sum of Squares (WSS) is a number which measures this variability of a group in the embedding space. If all embeddings of one group were very close to each other, the WSS would be very small. In our case we see that the WSS for the black females is much smaller than for the white females. This means that it is much harder for the model to distinguish two black females than to distinguish two white females. The WSS complements the observation from the distance matrix, where we observed overall smaller pairwise distances between black females."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Function to calculate WSS\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Function to calculate WSS\n",
"\n",
"def wss(group):\n",
" \"\"\"\n",
" This function returns the sum of squared distances\n",
" of the N vectors of a\n",
" group tensor (N x K) to its centroid (1 x K).\n",
" Hints:\n",
" - to calculate the centroid, torch.mean()\n",
" will be of use.\n",
" - We need the mean of the N=49 observations.\n",
" If our input tensor is of size\n",
" N x K, we expect the centroid to be of\n",
" dimensions 1 x K.\n",
" Use the axis argument within torch.mean\n",
"\n",
" Args:\n",
" group: torch.tensor\n",
" A image_count x embedding_size tensor\n",
"\n",
" Returns:\n",
" sum_sq: torch.tensor\n",
" A 1x1 tensor with the sum of squared distances.\n",
" \"\"\"\n",
"\n",
" centroid = torch.mean(group, axis=0)\n",
" distance = torch.linalg.norm(group - centroid.view(1, -1), axis=1)\n",
" sum_sq = torch.sum(distance**2)\n",
" return sum_sq"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Let's calculate the WSS for the two groups of our example.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @markdown Let's calculate the WSS for the two groups of our example.\n",
"\n",
"print(f\"Black female embedding WSS: {np.round(wss(black_female_embeddings).item(), 2)}\")\n",
"print(f\"White female embedding WSS: {np.round(wss(white_female_embeddings).item(), 2)}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Summary\n",
"\n",
"In this tutorial we have learned how to apply a modern convnet in real application such as facial recognition. However, as the state-of-the-art tools for facial recognition are trained mostly with caucasian faces, they fail or they perform much worse when they have to deal with faces from other races."
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"collapsed_sections": [],
"include_colab_link": true,
"name": "W2D3_Tutorial2",
"provenance": [],
"toc_visible": true
},
"kernel": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
},
"nteract": {
"version": "0.28.0"
}
},
"nbformat": 4,
"nbformat_minor": 0
}