{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {}, "id": "view-in-github" }, "source": [ "\"Open   \"Open" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Tutorial 1: Biological vs. Artificial Neural Networks\n", "\n", "**Week 1, Day 3: Multi Layer Perceptrons**\n", "\n", "**By Neuromatch Academy**\n", "\n", "__Content creators:__ Arash Ash, Surya Ganguli\n", "\n", "__Content reviewers:__ Saeed Salehi, Felix Bartsch, Yu-Fang Yang, Antoine De Comite, Melvin Selim Atay, Kelson Shilling-Scrivo\n", "\n", "__Content editors:__ Gagana B, Kelson Shilling-Scrivo, Spiros Chavlis\n", "\n", "__Production editors:__ Anoop Kulkarni, Kelson Shilling-Scrivo, Gagana B, Spiros Chavlis" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Tutorial objectives\n", "\n", "In this tutorial, we will explore the Multi-layer Perceptrons (MLPs). MLPs are arguably one of the most tractable models (due to their flexibility) that we can use to study deep learning fundamentals. Here we will learn why they are:\n", "\n", "* Similar to biological networks\n", "* Good at function approximation\n", "* Implemented the way they are in PyTorch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @markdown\n", "from IPython.display import IFrame\n", "from ipywidgets import widgets\n", "out = widgets.Output()\n", "with out:\n", " print(f\"If you want to download the slides: https://osf.io/download/4ye56/\")\n", " display(IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/4ye56/?direct%26mode=render%26action=download%26mode=render\", width=730, height=410))\n", "display(out)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Setup\n", "\n", "This is a GPU free notebook!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install and import feedback gadget\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Install and import feedback gadget\n", "\n", "!pip3 install vibecheck datatops --quiet\n", "\n", "from vibecheck import DatatopsContentReviewContainer\n", "def content_review(notebook_section: str):\n", " return DatatopsContentReviewContainer(\n", " \"\", # No text prompt\n", " notebook_section,\n", " {\n", " \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n", " \"name\": \"neuromatch_dl\",\n", " \"user_key\": \"f379rz8y\",\n", " },\n", " ).render()\n", "\n", "\n", "feedback_prefix = \"W1D3_T1\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# Imports\n", "import random\n", "\n", "import torch\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "import torch.nn as nn\n", "import torch.optim as optim\n", "from tqdm.auto import tqdm\n", "from IPython.display import display\n", "from torch.utils.data import DataLoader, TensorDataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Figure settings\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Figure settings\n", "import logging\n", "logging.getLogger('matplotlib.font_manager').disabled = True\n", "\n", "import ipywidgets as widgets # Interactive display\n", "%config InlineBackend.figure_format = 'retina'\n", "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/content-creation/main/nma.mplstyle\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting functions\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Plotting functions\n", "\n", "def imshow(img):\n", " \"\"\"\n", " Helper function to plot unnormalised image\n", "\n", " Args:\n", " img: torch.tensor\n", " Image to be displayed\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " img = img / 2 + 0.5 # Unnormalize\n", " npimg = img.numpy()\n", " plt.imshow(np.transpose(npimg, (1, 2, 0)))\n", " plt.axis(False)\n", " plt.show()\n", "\n", "\n", "def plot_function_approximation(x, relu_acts, y_hat):\n", " \"\"\"\n", " Helper function to plot ReLU activations and\n", " function approximations\n", "\n", " Args:\n", " x: torch.tensor\n", " Incoming Data\n", " relu_acts: torch.tensor\n", " Computed ReLU activations for each point along the x axis (x)\n", " y_hat: torch.tensor\n", " Estimated labels/class predictions\n", " Weighted sum of ReLU activations for every point along x axis\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " fig, axes = plt.subplots(2, 1)\n", "\n", " # Plot ReLU Activations\n", " axes[0].plot(x, relu_acts.T);\n", " axes[0].set(xlabel='x',\n", " ylabel='Activation',\n", " title='ReLU Activations - Basis Functions')\n", " labels = [f\"ReLU {i + 1}\" for i in range(relu_acts.shape[0])]\n", " axes[0].legend(labels, ncol = 2)\n", "\n", " # Plot Function Approximation\n", " axes[1].plot(x, torch.sin(x), label='truth')\n", " axes[1].plot(x, y_hat, label='estimated')\n", " axes[1].legend()\n", " axes[1].set(xlabel='x',\n", " ylabel='y(x)',\n", " title='Function Approximation')\n", "\n", " plt.tight_layout()\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set random seed\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Executing `set_seed(seed=seed)` you are setting the seed\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Set random seed\n", "\n", "# @markdown Executing `set_seed(seed=seed)` you are setting the seed\n", "\n", "# For DL its critical to set the random seed so that students can have a\n", "# baseline to compare their results to expected results.\n", "# Read more here: https://pytorch.org/docs/stable/notes/randomness.html\n", "\n", "# Call `set_seed` function in the exercises to ensure reproducibility.\n", "import random\n", "import torch\n", "\n", "def set_seed(seed=None, seed_torch=True):\n", " \"\"\"\n", " Function that controls randomness. NumPy and random modules must be imported.\n", "\n", " Args:\n", " seed : Integer\n", " A non-negative integer that defines the random state. Default is `None`.\n", " seed_torch : Boolean\n", " If `True` sets the random seed for pytorch tensors, so pytorch module\n", " must be imported. Default is `True`.\n", "\n", " Returns:\n", " Nothing.\n", " \"\"\"\n", " if seed is None:\n", " seed = np.random.choice(2 ** 32)\n", " random.seed(seed)\n", " np.random.seed(seed)\n", " if seed_torch:\n", " torch.manual_seed(seed)\n", " torch.cuda.manual_seed_all(seed)\n", " torch.cuda.manual_seed(seed)\n", " torch.backends.cudnn.benchmark = False\n", " torch.backends.cudnn.deterministic = True\n", "\n", " print(f'Random seed {seed} has been set.')\n", "\n", "\n", "# In case that `DataLoader` is used\n", "def seed_worker(worker_id):\n", " \"\"\"\n", " DataLoader will reseed workers following randomness in\n", " multi-process data loading algorithm.\n", "\n", " Args:\n", " worker_id: integer\n", " ID of subprocess to seed. 0 means that\n", " the data will be loaded in the main process\n", " Refer: https://pytorch.org/docs/stable/data.html#data-loading-randomness for more details\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " worker_seed = torch.initial_seed() % 2**32\n", " np.random.seed(worker_seed)\n", " random.seed(worker_seed)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set device (GPU or CPU). Execute `set_device()`\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Set device (GPU or CPU). Execute `set_device()`\n", "# especially if torch modules used.\n", "\n", "# Inform the user if the notebook uses GPU or CPU.\n", "# NOTE: This is mostly a GPU free tutorial.\n", "\n", "def set_device():\n", " \"\"\"\n", " Set the device. CUDA if available, CPU otherwise\n", "\n", " Args:\n", " None\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", " if device != \"cuda\":\n", " print(\"GPU is not enabled in this notebook. \\n\"\n", " \"If you want to enable it, in the menu under `Runtime` -> \\n\"\n", " \"`Hardware accelerator.` and select `GPU` from the dropdown menu\")\n", " else:\n", " print(\"GPU is enabled in this notebook. \\n\"\n", " \"If you want to disable it, in the menu under `Runtime` -> \\n\"\n", " \"`Hardware accelerator.` and select `None` from the dropdown menu\")\n", "\n", " return device" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "SEED = 2021\n", "set_seed(seed=SEED)\n", "DEVICE = set_device()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 0: Introduction to MLPs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 0: Introduction\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 0: Introduction\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'Gh0KYl7ViAc'), ('Bilibili', 'BV1E3411r7TL')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Introduction_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 1: The Need for MLPs\n", "\n", "*Time estimate: ~35 mins*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 1: Universal Approximation Theorem\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 1: Universal Approximation Theorem\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'tg8HHKo1aH4'), ('Bilibili', 'BV1SP4y147Uv')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Universal_Approximation_Theorem_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Coding Exercise 1: Function approximation with ReLU\n", "Through the Universal Approximation Algorithm, we learned that one hidden layer MLPs are enough to approximate any smooth function! Now let's manually fit a sine function using ReLU activation.\n", "\n", "We will approximate the sine function using a linear combination (a weighted sum) of ReLUs with slope 1. We need to determine the bias terms (which determines where the ReLU inflection point from 0 to linear occurs) and how to weight each ReLU. The idea is to set the weights iteratively so that the slope changes in the new sample's direction.\n", "\n", "First, we generate our \"training data\" from a sine function using `torch.sine` function.\n", "\n", "```python\n", ">>> import torch\n", ">>> torch.manual_seed(2021)\n", "\n", ">>> a = torch.randn(5)\n", ">>> print(a)\n", "tensor([ 2.2871, 0.6413, -0.8615, -0.3649, -0.6931])\n", ">>> torch.sin(a)\n", "tensor([ 0.7542, 0.5983, -0.7588, -0.3569, -0.6389])\n", "```\n", "\n", "These are the points we will use to learn how to approximate the function. We have 10 training data points so we will have 9 ReLUs (we don't need a ReLU for the last data point as we don't have anything to the right of it to model).\n", "\n", "We first need to figure out the bias term for each ReLU and compute the activation of each ReLU where:\n", "\n", "\\begin{equation}\n", " y(x) = \\text{max}(0, x+b)\n", "\\end{equation}\n", "\n", "We then need to figure out the correct weights on each ReLU so the linear combination approximates the desired function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "def approximate_function(x_train, y_train):\n", " \"\"\"\n", " Function to compute and combine ReLU activations\n", "\n", " Args:\n", " x_train: torch.tensor\n", " Training data\n", " y_train: torch.tensor\n", " Ground truth labels corresponding to training data\n", "\n", " Returns:\n", " relu_acts: torch.tensor\n", " Computed ReLU activations for each point along the x axis (x)\n", " y_hat: torch.tensor\n", " Estimated labels/class predictions\n", " Weighted sum of ReLU activations for every point along x axis\n", " x: torch.tensor\n", " x-axis points\n", " \"\"\"\n", " ####################################################################\n", " # Fill in missing code below (...),\n", " # then remove or comment the line below to test your function\n", " raise NotImplementedError(\"Complete approximate_function!\")\n", " ####################################################################\n", "\n", " # Number of relus\n", " n_relus = x_train.shape[0] - 1\n", "\n", " # x axis points (more than x train)\n", " x = torch.linspace(torch.min(x_train), torch.max(x_train), 1000)\n", "\n", " ## COMPUTE RELU ACTIVATIONS\n", "\n", " # First determine what bias terms should be for each of `n_relus` ReLUs\n", " b = ...\n", "\n", " # Compute ReLU activations for each point along the x axis (x)\n", " relu_acts = torch.zeros((n_relus, x.shape[0]))\n", "\n", " for i_relu in range(n_relus):\n", " relu_acts[i_relu, :] = torch.relu(x + b[i_relu])\n", "\n", " ## COMBINE RELU ACTIVATIONS\n", "\n", " # Set up weights for weighted sum of ReLUs\n", " combination_weights = torch.zeros((n_relus, ))\n", "\n", " # Figure out weights on each ReLU\n", " prev_slope = 0\n", " for i in range(n_relus):\n", " delta_x = x_train[i+1] - x_train[i]\n", " slope = (y_train[i+1] - y_train[i]) / delta_x\n", " combination_weights[i] = ...\n", " prev_slope = slope\n", "\n", " # Get output of weighted sum of ReLU activations for every point along x axis\n", " y_hat = ...\n", "\n", " return y_hat, relu_acts, x\n", "\n", "\n", "\n", "# Make training data from sine function\n", "N_train = 10\n", "x_train = torch.linspace(0, 2*np.pi, N_train).view(-1, 1)\n", "y_train = torch.sin(x_train)\n", "\n", "## Uncomment the lines below to test your function approximation\n", "# y_hat, relu_acts, x = approximate_function(x_train, y_train)\n", "# plot_function_approximation(x, relu_acts, y_hat)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W1D3_MultiLayerPerceptrons/solutions/W1D3_Tutorial1_Solution_d38a6c69.py)\n", "\n", "*Example output:*\n", "\n", "Solution hint\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "As you see in the top panel, we obtain 10 shifted ReLUs with the same slope. These are the basis functions that MLP uses to span the functional space, i.e., MLP finds a linear combination of these ReLUs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Function_approximation_with_ReLU_Exercise\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 2: MLPs in PyTorch\n", "\n", "*Time estimate: ~1hr and 20 mins*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 2: Building MLPs in PyTorch\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 2: Building MLPs in PyTorch\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'XtwLnaYJ7uc'), ('Bilibili', 'BV1zh411z7LY')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Building_MLPs_in_PyTorch_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "In the previous segment, we implemented a function to approximate any smooth function using MLPs. We saw that using Lipschitz continuity; We can prove that our approximation is mathematically correct. MLPs are fascinating, but before we get into the details on designing them, let's familiarize ourselves with some basic terminology of MLPs - layer, neuron, depth, width, weight, bias, and activation function. Armed with these ideas, we can now design an MLP given its input, hidden layers, and output size." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Coding Exercise 2: Implement a general-purpose MLP in Pytorch\n", "The objective is to design an MLP with these properties:\n", "* Works with any input (1D, 2D, etc.)\n", "* Construct any number of given hidden layers using `nn.Sequential()` and `add_module()` function\n", "* Use the same given activation function (i.e., [Leaky ReLU](https://pytorch.org/docs/stable/generated/torch.nn.LeakyReLU.html)) in all hidden layers\n", "\n", "**Leaky ReLU** is described by the following mathematical formula:\n", "\n", "\\begin{align}\n", "\\text{LeakyReLU}(x) &= \\text{max}(0,x) + \\text{negative_slope} \\cdot \\text{min}(0, x) \\\\\n", "&=\n", "\\left\\{\n", " \\begin{array}{ll}\n", " x & ,\\; \\text{if} \\; x \\ge 0 \\\\\n", " \\text{negative_slope} \\cdot x & ,\\; \\text{otherwise}\n", " \\end{array}\n", "\\right.\n", "\\end{align}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "class Net(nn.Module):\n", " \"\"\"\n", " Initialize MLP Network\n", " \"\"\"\n", "\n", " def __init__(self, actv, input_feature_num, hidden_unit_nums, output_feature_num):\n", " \"\"\"\n", " Initialize MLP Network parameters\n", "\n", " Args:\n", " actv: string\n", " Activation function\n", " input_feature_num: int\n", " Number of input features\n", " hidden_unit_nums: list\n", " Number of units per hidden layer, list of integers\n", " output_feature_num: int\n", " Number of output features\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " super(Net, self).__init__()\n", " self.input_feature_num = input_feature_num # Save the input size for reshaping later\n", " self.mlp = nn.Sequential() # Initialize layers of MLP\n", "\n", " in_num = input_feature_num # Initialize the temporary input feature to each layer\n", " for i in range(len(hidden_unit_nums)): # Loop over layers and create each one\n", "\n", " ####################################################################\n", " # Fill in missing code below (...),\n", " # Then remove or comment the line below to test your function\n", " raise NotImplementedError(\"Create MLP Layer\")\n", " ####################################################################\n", "\n", " out_num = hidden_unit_nums[i] # Assign the current layer hidden unit from list\n", " layer = ... # Use nn.Linear to define the layer\n", " in_num = out_num # Assign next layer input using current layer output\n", " self.mlp.add_module('Linear_%d'%i, layer) # Append layer to the model with a name\n", "\n", " actv_layer = eval('nn.%s'%actv) # Assign activation function (eval allows us to instantiate object from string)\n", " self.mlp.add_module('Activation_%d'%i, actv_layer) # Append activation to the model with a name\n", "\n", " out_layer = nn.Linear(in_num, output_feature_num) # Create final layer\n", " self.mlp.add_module('Output_Linear', out_layer) # Append the final layer\n", "\n", " def forward(self, x):\n", " \"\"\"\n", " Simulate forward pass of MLP Network\n", "\n", " Args:\n", " x: torch.tensor\n", " Input data\n", "\n", " Returns:\n", " logits: Instance of MLP\n", " Forward pass of MLP\n", " \"\"\"\n", " # Reshape inputs to (batch_size, input_feature_num)\n", " # Just in case the input vector is not 2D, like an image!\n", " x = x.view(-1, self.input_feature_num)\n", "\n", " ####################################################################\n", " # Fill in missing code below (...),\n", " # then remove or comment the line below to test your function\n", " raise NotImplementedError(\"Run MLP model\")\n", " ####################################################################\n", "\n", " logits = ... # Forward pass of MLP\n", " return logits\n", "\n", "\n", "\n", "input = torch.zeros((100, 2))\n", "## Uncomment below to create network and test it on input\n", "# net = Net(actv='LeakyReLU(0.1)', input_feature_num=2, hidden_unit_nums=[100, 10, 5], output_feature_num=1).to(DEVICE)\n", "# y = net(input.to(DEVICE))\n", "# print(f'The output shape is {y.shape} for an input of shape {input.shape}')" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W1D3_MultiLayerPerceptrons/solutions/W1D3_Tutorial1_Solution_a1ac91af.py)\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "```\n", "The output shape is torch.Size([100, 1]) for an input of shape torch.Size([100, 2])\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Implement_a_general_purpose_MLP_in_PyTorch_Exercise\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 2.1: Classification with MLPs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Video 3: Cross Entropy\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 3: Cross Entropy\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'N8pVCbTlves'), ('Bilibili', 'BV1Ag41177mB')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Cross_Entropy_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "The main loss function we could use out of the box for multi-class classification for `N` samples and `C` number of classes is:\n", "\n", "* **CrossEntropyLoss**:\n", "This criterion expects a batch of predictions `x` with shape `(N, C)` and class index in the range $[0, C-1]$ as the target (label) for each `N` samples, hence a batch of `labels` with shape `(N, )`. There are other optional parameters like class weights and class ignores. Feel free to check the PyTorch documentation [here](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) for more detail. Additionally, [here](https://sparrow.dev/cross-entropy-loss-in-pytorch/) you can learn where is appropriate to use the CrossEntropyLoss.\n", "\n", "To get CrossEntropyLoss of a sample $i$, we could first calculate $-\\log(\\text{softmax}(x))$ and then take the element corresponding to $\\text {labels}_i$ as the loss. However, due to numerical stability, we implement this more stable equivalent form,\n", "\n", "\\begin{equation}\n", "\\operatorname{loss}(x_i, \\text {labels}_i)=-\\log \\left(\\frac{\\exp (x[\\text {labels}_i])}{\\sum_{j} \\exp (x[j])}\\right)=-x_i[\\text {labels}_i]+\\log \\left(\\sum_{j=1}^C \\exp (x_i[j])\\right)\n", "\\end{equation}" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Coding Exercise 2.1: Implement Batch Cross Entropy Loss\n", "\n", "To recap, since we will be doing batch learning, we'd like a loss function that given:\n", "* A batch of predictions `x` with shape `(N, C)`\n", "* A batch of `labels` with shape `(N, )` that ranges from `0` to `C-1`\n", "\n", "Returns the average loss $L$ calculated according to:\n", "\n", "\\begin{align}\n", "\\text{loss}(x_i, \\text {labels}_i) &= -x_i[\\text {labels}_i]+\\log \\left(\\sum_{j=1}^C \\exp (x_i[j])\\right) \\\\\n", "L &= \\frac{1}{N} \\sum_{i=1}^{N}{\\text{loss}(x_i, \\text {labels}_i)}\n", "\\end{align}\n", "\n", "Steps:\n", "\n", "1. Use indexing operation to get predictions of class corresponding to the labels (i.e., $x_i[\\text { labels }_i]$)\n", "2. Compute $loss(x_i, \\text { labels }_i)$ vector (`losses`) using `torch.log()` and `torch.exp()` without Loops!\n", "3. Return the average of the loss vector" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "def cross_entropy_loss(x, labels):\n", " \"\"\"\n", " Helper function to compute cross entropy loss\n", "\n", " Args:\n", " x: torch.tensor\n", " Model predictions we'd like to evaluate using labels\n", " labels: torch.tensor\n", " Ground truth\n", "\n", " Returns:\n", " avg_loss: float\n", " Average of the loss vector\n", " \"\"\"\n", " x_of_labels = torch.zeros(len(labels))\n", " ####################################################################\n", " # Fill in missing code below (...),\n", " # then remove or comment the line below to test your function\n", " raise NotImplementedError(\"Cross Entropy Loss\")\n", " ####################################################################\n", " # 1. Prediction for each class corresponding to the label\n", " for i, label in enumerate(labels):\n", " x_of_labels[i] = x[i, label]\n", " # 2. Loss vector for the batch\n", " losses = ...\n", " # 3. Return the average of the loss vector\n", " avg_loss = ...\n", "\n", " return avg_loss\n", "\n", "\n", "\n", "labels = torch.tensor([0, 1])\n", "x = torch.tensor([[10.0, 1.0, -1.0, -20.0], # Correctly classified\n", " [10.0, 10.0, 2.0, -10.0]]) # Not correctly classified\n", "CE = nn.CrossEntropyLoss()\n", "pytorch_loss = CE(x, labels).item()\n", "## Uncomment below to test your function\n", "# our_loss = cross_entropy_loss(x, labels).item()\n", "# print(f'Our CE loss: {our_loss:0.8f}, Pytorch CE loss: {pytorch_loss:0.8f}')\n", "# print(f'Difference: {np.abs(our_loss - pytorch_loss):0.8f}')" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W1D3_MultiLayerPerceptrons/solutions/W1D3_Tutorial1_Solution_4049041f.py)\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "```\n", "Our CE loss: 0.34672737, Pytorch CE loss: 0.34672749\n", "Difference: 0.00000012\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Implement_Batch_Cross_Entropy_Loss_Exercise\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 2.2: Spiral Classification Dataset\n", "Before we could start optimizing these loss functions, we need a dataset!\n", "\n", "Let's turn this fancy-looking equation into a classification dataset" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "\\begin{equation}\n", "\\begin{array}{c}\n", "X_{k}(t)=t\\left(\\begin{array}{c}\n", "\\sin \\left[\\frac{2 \\pi}{K}\\left(2 t+k-1\\right)\\right]+\\mathcal{N}\\left(0, \\sigma\\right) \\\\\n", "\\cos \\left[\\frac{2 \\pi}{K}\\left(2 t+k-1\\right)\\right]+\\mathcal{N}\\left(0, \\sigma\\right)\n", "\\end{array}\\right)\n", "\\end{array}, \\quad 0 \\leq t \\leq 1, \\quad k=1, \\ldots, K\n", "\\end{equation}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "def create_spiral_dataset(K, sigma, N):\n", " \"\"\"\n", " Function to simulate spiral dataset\n", "\n", " Args:\n", " K: int\n", " Number of classes\n", " sigma: float\n", " Standard deviation\n", " N: int\n", " Number of data points\n", "\n", " Returns:\n", " X: torch.tensor\n", " Spiral data\n", " y: torch.tensor\n", " Corresponding ground truth\n", " \"\"\"\n", "\n", " # Initialize t, X, y\n", " t = torch.linspace(0, 1, N)\n", " X = torch.zeros(K*N, 2)\n", " y = torch.zeros(K*N)\n", "\n", " # Create data\n", " for k in range(K):\n", " X[k*N:(k+1)*N, 0] = t*(torch.sin(2*np.pi/K*(2*t+k)) + sigma*torch.randn(N))\n", " X[k*N:(k+1)*N, 1] = t*(torch.cos(2*np.pi/K*(2*t+k)) + sigma*torch.randn(N))\n", " y[k*N:(k+1)*N] = k\n", "\n", " return X, y\n", "\n", "\n", "# Set parameters\n", "K = 4\n", "sigma = 0.16\n", "N = 1000\n", "\n", "set_seed(seed=SEED)\n", "X, y = create_spiral_dataset(K, sigma, N)\n", "plt.scatter(X[:, 0], X[:, 1], c = y)\n", "plt.xlabel('x1')\n", "plt.ylabel('x2')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 2.3: Training and Evaluation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Video 4: Training and Evaluating an MLP\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 4: Training and Evaluating an MLP\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'DfXZhRfBEqQ'), ('Bilibili', 'BV1QV411p7mF')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Training_and_Evaluating_an_MLP_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Coding Exercise 2.3: Implement it for a classfication task\n", "Now that we have the spiral dataset and a loss function, it's your turn to implement a simple train/test split for training and validation.\n", "\n", "Steps to follow:\n", " * Dataset shuffle\n", " * Train/Test split (20% for test)\n", " * Dataloader definition\n", " * Training and Evaluation" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "def shuffle_and_split_data(X, y, seed):\n", " \"\"\"\n", " Helper function to shuffle and split incoming data\n", "\n", " Args:\n", " X: torch.tensor\n", " Input data\n", " y: torch.tensor\n", " Corresponding target variables\n", " seed: int\n", " Set seed for reproducibility\n", "\n", " Returns:\n", " X_test: torch.tensor\n", " Test data [20% of X]\n", " y_test: torch.tensor\n", " Labels corresponding to above mentioned test data\n", " X_train: torch.tensor\n", " Train data [80% of X]\n", " y_train: torch.tensor\n", " Labels corresponding to above mentioned train data\n", " \"\"\"\n", " torch.manual_seed(seed)\n", " # Number of samples\n", " N = X.shape[0]\n", " ####################################################################\n", " # Fill in missing code below (...),\n", " # then remove or comment the line below to test your function\n", " raise NotImplementedError(\"Shuffle & split data\")\n", " ####################################################################\n", " # Shuffle data\n", " shuffled_indices = ... # Get indices to shuffle data, could use torch.randperm\n", " X = X[shuffled_indices]\n", " y = y[shuffled_indices]\n", "\n", " # Split data into train/test\n", " test_size = ... # Assign test datset size using 20% of samples\n", " X_test = X[:test_size]\n", " y_test = y[:test_size]\n", " X_train = X[test_size:]\n", " y_train = y[test_size:]\n", "\n", " return X_test, y_test, X_train, y_train\n", "\n", "\n", "\n", "## Uncomment below to test your function\n", "# X_test, y_test, X_train, y_train = shuffle_and_split_data(X, y, seed=SEED)\n", "# plt.figure()\n", "# plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test)\n", "# plt.xlabel('x1')\n", "# plt.ylabel('x2')\n", "# plt.title('Test data')\n", "# plt.show()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W1D3_MultiLayerPerceptrons/solutions/W1D3_Tutorial1_Solution_61854a92.py)\n", "\n", "*Example output:*\n", "\n", "Solution hint\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Implement_it_for_a_classification_task_Exercise\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "And we need to make a Pytorch data loader out of it. Data loading in PyTorch can be separated in 2 parts:\n", "* Data must be wrapped on a Dataset parent class where the methods __getitem__ and __len__ must be overrided. Note that, at this point, the data is not loaded on memory. PyTorch will only load what is needed to the memory. Here `TensorDataset` does this for us directly.\n", "* Use a Dataloader that will actually read the data in batches and put into memory. Also, the option of `num_workers > 0` allows multithreading, which prepares multiple batches in the queue to speed things up." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "g_seed = torch.Generator()\n", "g_seed.manual_seed(SEED)\n", "\n", "batch_size = 128\n", "test_data = TensorDataset(X_test, y_test)\n", "test_loader = DataLoader(test_data, batch_size=batch_size,\n", " shuffle=False, num_workers=2,\n", " worker_init_fn=seed_worker,\n", " generator=g_seed)\n", "\n", "train_data = TensorDataset(X_train, y_train)\n", "train_loader = DataLoader(train_data, batch_size=batch_size, drop_last=True,\n", " shuffle=True, num_workers=2,\n", " worker_init_fn=seed_worker,\n", " generator=g_seed)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Let's write a general-purpose training and evaluation code and keep it in our pocket for next tutorial as well. So make sure you review it to see what it does.\n", "\n", "Note that `model.train()` tells your model that you are training the model. So layers like dropout, batch norm etc. which behave different on the train and test procedures know what is going on and hence can behave accordingly. And to turn off training mode we set `model.eval()`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "def train_test_classification(net, criterion, optimizer, train_loader,\n", " test_loader, num_epochs=1, verbose=True,\n", " training_plot=False, device='cpu'):\n", " \"\"\"\n", " Accumulate training loss/Evaluate performance\n", "\n", " Args:\n", " net: instance of Net class\n", " Describes the model with ReLU activation, batch size 128\n", " criterion: torch.nn type\n", " Criterion combines LogSoftmax and NLLLoss in one single class.\n", " optimizer: torch.optim type\n", " Implements Adam algorithm.\n", " train_loader: torch.utils.data type\n", " Combines the train dataset and sampler, and provides an iterable over the given dataset.\n", " test_loader: torch.utils.data type\n", " Combines the test dataset and sampler, and provides an iterable over the given dataset.\n", " num_epochs: int\n", " Number of epochs [default: 1]\n", " verbose: boolean\n", " If True, print statistics\n", " training_plot=False\n", " If True, display training plot\n", " device: string\n", " CUDA/GPU if available, CPU otherwise\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " net.train()\n", " training_losses = []\n", " for epoch in tqdm(range(num_epochs)): # Loop over the dataset multiple times\n", " running_loss = 0.0\n", " for i, data in enumerate(train_loader, 0):\n", " # Get the inputs; data is a list of [inputs, labels]\n", " inputs, labels = data\n", " inputs = inputs.to(device).float()\n", " labels = labels.to(device).long()\n", "\n", " # Zero the parameter gradients\n", " optimizer.zero_grad()\n", "\n", " # forward + backward + optimize\n", " outputs = net(inputs)\n", "\n", " loss = criterion(outputs, labels)\n", " loss.backward()\n", " optimizer.step()\n", "\n", " # Print statistics\n", " if verbose:\n", " training_losses += [loss.item()]\n", "\n", " net.eval()\n", "\n", " def test(data_loader):\n", " \"\"\"\n", " Function to gauge network performance\n", "\n", " Args:\n", " data_loader: torch.utils.data type\n", " Combines the test dataset and sampler, and provides an iterable over the given dataset.\n", "\n", " Returns:\n", " acc: float\n", " Performance of the network\n", " total: int\n", " Number of datapoints in the dataloader\n", " \"\"\"\n", " correct = 0\n", " total = 0\n", " for data in data_loader:\n", " inputs, labels = data\n", " inputs = inputs.to(device).float()\n", " labels = labels.to(device).long()\n", "\n", " outputs = net(inputs)\n", " _, predicted = torch.max(outputs, 1)\n", " total += labels.size(0)\n", " correct += (predicted == labels).sum().item()\n", "\n", " acc = 100 * correct / total\n", " return total, acc\n", "\n", " train_total, train_acc = test(train_loader)\n", " test_total, test_acc = test(test_loader)\n", "\n", " if verbose:\n", " print(f\"Accuracy on the {train_total} training samples: {train_acc:0.2f}\")\n", " print(f\"Accuracy on the {test_total} testing samples: {test_acc:0.2f}\")\n", "\n", " if training_plot:\n", " plt.plot(training_losses)\n", " plt.xlabel('Batch')\n", " plt.ylabel('Training loss')\n", " plt.show()\n", "\n", " return train_acc, test_acc" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Think! 2.3.1: What's the point of `.eval()` and `.train()`?\n", "\n", "Is it necessary to use `net.train()` and `net.eval()` for our MLP model? why?" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W1D3_MultiLayerPerceptrons/solutions/W1D3_Tutorial1_Solution_70e48a17.py)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Whats_the_point_of_eval()_and_train()_Discussion\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Now let's put everything together and train your first deep-ish model!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "set_seed(SEED)\n", "net = Net('ReLU()', X_train.shape[1], [128], K).to(DEVICE)\n", "criterion = nn.CrossEntropyLoss()\n", "optimizer = optim.Adam(net.parameters(), lr=1e-3)\n", "num_epochs = 100\n", "\n", "_, _ = train_test_classification(net, criterion, optimizer, train_loader,\n", " test_loader, num_epochs=num_epochs,\n", " training_plot=True, device=DEVICE)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "And finally, let's visualize the learned decision-map. We know you're probably running out of time, so we won't make you write code now! But make sure you have reviewed it since we'll start with another visualization technique next time." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "def sample_grid(M=500, x_max=2.0):\n", " \"\"\"\n", " Helper function to simulate sample meshgrid\n", "\n", " Args:\n", " M: int\n", " Size of the constructed tensor with meshgrid\n", " x_max: float\n", " Defines range for the set of points\n", "\n", " Returns:\n", " X_all: torch.tensor\n", " Concatenated meshgrid tensor\n", " \"\"\"\n", " ii, jj = torch.meshgrid(torch.linspace(-x_max, x_max, M),\n", " torch.linspace(-x_max, x_max, M),\n", " indexing=\"ij\")\n", " X_all = torch.cat([ii.unsqueeze(-1),\n", " jj.unsqueeze(-1)],\n", " dim=-1).view(-1, 2)\n", " return X_all\n", "\n", "\n", "def plot_decision_map(X_all, y_pred, X_test, y_test,\n", " M=500, x_max=2.0, eps=1e-3):\n", " \"\"\"\n", " Helper function to plot decision map\n", "\n", " Args:\n", " X_all: torch.tensor\n", " Concatenated meshgrid tensor\n", " y_pred: torch.tensor\n", " Labels predicted by the network\n", " X_test: torch.tensor\n", " Test data\n", " y_test: torch.tensor\n", " Labels of the test data\n", " M: int\n", " Size of the constructed tensor with meshgrid\n", " x_max: float\n", " Defines range for the set of points\n", " eps: float\n", " Decision threshold\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " decision_map = torch.argmax(y_pred, dim=1)\n", "\n", " for i in range(len(X_test)):\n", " indices = (X_all[:, 0] - X_test[i, 0])**2 + (X_all[:, 1] - X_test[i, 1])**2 < eps\n", " decision_map[indices] = (K + y_test[i]).long()\n", "\n", " decision_map = decision_map.view(M, M)\n", " plt.imshow(decision_map, extent=[-x_max, x_max, -x_max, x_max], cmap='jet')\n", " plt.xlabel('x1')\n", " plt.ylabel('x2')\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "X_all = sample_grid()\n", "y_pred = net(X_all.to(DEVICE)).cpu()\n", "plot_decision_map(X_all, y_pred, X_test, y_test)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Think! 2.3.2: Does it generalize well?\n", "Do you think this model is performing well outside its training distribution? Why?" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W1D3_MultiLayerPerceptrons/solutions/W1D3_Tutorial1_Solution_47ad29c3.py)\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "What would be your suggestions to increase models ability to generalize? Think about it and discuss with your pod." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W1D3_MultiLayerPerceptrons/solutions/W1D3_Tutorial1_Solution_c71914b5.py)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Does_it_generalize_well_Discussion\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Summary\n", "\n", "In this tutorial, we have explored the Multi-leayer Perceptrons (MLPs). More specifically, we have discussed the similarities between artificial and biological neural networks (for more information see the Bonus section). we have also learned the Universal Approximation Theorem and implemented MLPs in PyTorch." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Bonus: Neuron Physiology and Motivation to Deep Learning\n", "\n", "This section will motivate one of the most popular nonlinearities in deep learning, the ReLU nonlinearity, by starting from the biophysics of neurons and obtaining the ReLU nonlinearity through a sequence of approximations. We will also show that neuronal biophysics sets a time scale for signal propagation speed through the brain. This time scale implies that neural circuits underlying fast perceptual and motor processing in the brain may not be excessively deep." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 5: Biological to Artificial Neurons\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 5: Biological to Artificial Neurons\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'ELAbflymSLo'), ('Bilibili', 'BV1mf4y157vf')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Biological_to_Artificial_Neurons_Bonus_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Leaky Integrate-and-fire (LIF) neuronal model\n", "\n", "The basic idea of LIF neuron was proposed in 1907 by Louis Édouard Lapicque, long before we understood the electrophysiology of a neuron (see a translation of [Lapicque's paper](https://pubmed.ncbi.nlm.nih.gov/17968583/) ). More details of the model can be found in the book [**Theoretical neuroscience**](http://www.gatsby.ucl.ac.uk/~dayan/book/) by Peter Dayan and Laurence F. Abbott.\n", "\n", "The model dynamics is defined with the following formula,\n", "\n", "\\begin{equation}\n", "\\frac{d V_m}{d t}=\\left\\{\\begin{array}{cc}\n", "\\frac{1}{C_m}\\left(-\\frac{V_m}{R_m} + I \\right) & t>t_{rest} \\\\\n", "0 & \\text { otherwise }\n", "\\end{array}\\right.\n", "\\end{equation}\n", "\n", "\n", "Note that $V_{m}$, $C_{m}$, and $R_{m}$ are the membrane voltage, capacitance, and resitance of the neuron, respectively, so the $-\\frac{V_{m}}{R_{m}}$ denotes the leakage current. When $I$ is sufficiently strong such that $V_{m}$ reaches a certain threshold value $V_{\\rm th}$, it momentarily spikes and then $V_{m}$ is reset to $V_{\\rm reset}< V_{\\rm th}$, and voltage stays at $V_{\\rm reset}$ for $\\tau_{\\rm ref}$ ms, mimicking the refractoriness of the neuron during an action potential (note that $V_{\\rm reset}$ and $\\tau_{\\rm ref}$ is assumed to be zero in the lecture):\n", "\n", "\n", "\\begin{eqnarray}\n", "V_{m}(t)=V_{\\rm reset} \\text{ for } t\\in(t_{\\text{sp}}, t_{\\text{sp}} + \\tau_{\\text{ref}}]\n", "\\end{eqnarray}\n", "\n", "\n", "where $t_{\\rm sp}$ is the spike time when $V_{m}(t)$ just exceeded $V_{\\rm th}$.\n", "\n", "Thus, the LIF model captures the fact that a neuron:\n", "- Performs spatial and temporal integration of synaptic inputs\n", "- Generates a spike when the voltage reaches a certain threshold\n", "- Goes refractory during the action potential\n", "- Has a leaky membrane\n", "\n", "For in-depth content on computational models of neurons, follow the [NMA](https://www.neuromatchacademy.org/) tutorial 1 of *Biological Neuron Models*. Specifically, for NMA-CN 2021 follow this [Tutorial](https://github.com/NeuromatchAcademy/course-content/blob/master/tutorials/W2D3_BiologicalNeuronModels/W2D3_Tutorial1.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Simulating an LIF Neuron\n", "\n", "In the cell below is given a function for LIF neuron model with it's arguments described.\n", "\n", "Note that we will use Euler's method to make a numerical approximation to a derivative. Hence we will use the following implementation of the model dynamics,\n", "\n", "\\begin{equation}\n", "V_m^{[n]}=\\left\\{\\begin{array}{cc}\n", "V_m^{[n-1]} + \\frac{1}{C_m}\\left(-\\frac{V_m^{[n-1]}}{R_m}+I \\right) \\Delta t & t>t_{r e s t} \\\\\n", "0 & \\text { otherwise }\n", "\\end{array}\\right.\n", "\\end{equation}\n", "\n", "where the superscript $[\\cdot]$ denotes the time point." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "def run_LIF(I, T=50, dt=0.1, tau_ref=10,\n", " Rm=1, Cm=10, Vth=1, V_spike=0.5):\n", " \"\"\"\n", " Simulate the LIF dynamics with external input current\n", "\n", " Args:\n", " I : int\n", " Input current (mA)\n", " T : int\n", " Total time to simulate (msec)\n", " dt : float\n", " Simulation of time step (msec)\n", " tau_ref : int\n", " Refractory period (msec)\n", " Rm : int\n", " Resistance (kOhm)\n", " Cm : int\n", " Capacitance (uF)\n", " Vth : int\n", " Spike threshold (V)\n", " V_spike : float\n", " Spike delta (V)\n", "\n", " Returns:\n", " time : list\n", " Time points\n", " Vm : list\n", " Tracking membrane potentials\n", " \"\"\"\n", "\n", " # Set up array of time steps\n", " time = torch.arange(0, T + dt, dt)\n", "\n", " # Set up array for tracking Vm\n", " Vm = torch.zeros(len(time))\n", "\n", " # Iterate over each time step\n", " t_rest = 0\n", " for i, t in enumerate(time):\n", "\n", " # If t is after refractory period\n", " if t > t_rest:\n", " Vm[i] = Vm[i-1] + 1/Cm*(-Vm[i-1]/Rm + I) * dt\n", "\n", " # If Vm is over the threshold\n", " if Vm[i] >= Vth:\n", "\n", " # Increase volatage by change due to spike\n", " Vm[i] += V_spike\n", "\n", " # Set up new refactory period\n", " t_rest = t + tau_ref\n", "\n", " return time, Vm\n", "\n", "\n", "sim_time, Vm = run_LIF(1.5)\n", "# Plot the membrane voltage across time\n", "plt.plot(sim_time, Vm)\n", "plt.title('LIF Neuron Output')\n", "plt.ylabel('Membrane Potential (V)')\n", "plt.xlabel('Time (msec)')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Interactive Demo: Neuron's transfer function explorer for different $R_m$ and $\\tau_{ref}$\n", "We know that real neurons communicate by modulating the spike count meaning that more input current causes a neuron to spike more often. Therefore, to find an input-output relationship, it makes sense to characterize their spike count as a function of input current. This is called the neuron's input-output transfer function. Let's plot the neuron's transfer function and see how it changes with respect to the **membrane resistance** and **refractory time**?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Make sure you execute this cell to enable the widget!\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title\n", "\n", "# @markdown Make sure you execute this cell to enable the widget!\n", "my_layout = widgets.Layout()\n", "\n", "@widgets.interact(Rm=widgets.FloatSlider(1., min=1, max=100.,\n", " step=0.1, layout=my_layout),\n", " tau_ref=widgets.FloatSlider(1., min=1, max=100.,\n", " step=0.1, layout=my_layout)\n", " )\n", "\n", "\n", "def plot_IF_curve(Rm, tau_ref):\n", " \"\"\"\n", " Helper function to plot frequency-current curve\n", "\n", " Args:\n", " Rm : int\n", " Resistance (kOhm)\n", " tau_ref : int\n", " Refractory period (msec)\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " T = 1000 # Total time to simulate (msec)\n", " dt = 1 # Simulation time step (msec)\n", " Vth = 1 # Spike threshold (V)\n", " Is_max = 2\n", " Is = torch.linspace(0, Is_max, 10)\n", " spike_counts = []\n", " for I in Is:\n", " _, Vm = run_LIF(I, T=T, dt=dt, Vth=Vth, Rm=Rm, tau_ref=tau_ref)\n", " spike_counts += [torch.sum(Vm > Vth)]\n", "\n", " plt.plot(Is, spike_counts)\n", " plt.title('LIF Neuron: Transfer Function')\n", " plt.ylabel('Spike count')\n", " plt.xlabel('I (mA)')\n", " plt.xlim(0, Is_max)\n", " plt.ylim(0, 80)\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Think!: Real and Artificial neuron similarities\n", "\n", "What happens at infinite membrane resistance ($R_m$) and small refactory time ($\\tau_{ref}$)? Why?\n", "\n", "Take 10 mins to discuss the similarity between a real neuron and an artificial one with your pod." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W1D3_MultiLayerPerceptrons/solutions/W1D3_Tutorial1_Solution_23ab5734.py)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Real_and_Artificial_neuron_similarities_Bonus_Discussion\")" ] } ], "metadata": { "colab": { "collapsed_sections": [], "gpuType": "T4", "include_colab_link": true, "name": "W1D3_Tutorial1", "provenance": [], "toc_visible": true }, "kernel": { "display_name": "Python 3", "language": "python", "name": "python3" }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }