{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {}, "id": "view-in-github" }, "source": [ "\"Open   \"Open" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Tutorial 2: Learning Hyperparameters\n", "\n", "**Week 1, Day 2: Linear Deep Learning**\n", "\n", "**By Neuromatch Academy**\n", "\n", "__Content creators:__ Saeed Salehi, Andrew Saxe\n", "\n", "__Content reviewers:__ Polina Turishcheva, Antoine De Comite, Kelson Shilling-Scrivo\n", "\n", "__Content editors:__ Anoop Kulkarni\n", "\n", "__Production editors:__ Khalid Almubarak, Gagana B, Spiros Chavlis" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Tutorial Objectives\n", "\n", "* Training landscape\n", "* The effect of depth\n", "* Choosing a learning rate\n", "* Initialization matters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @markdown\n", "from IPython.display import IFrame\n", "from ipywidgets import widgets\n", "out = widgets.Output()\n", "with out:\n", " print(f\"If you want to download the slides: https://osf.io/download/sne2m/\")\n", " display(IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/sne2m/?direct%26mode=render%26action=download%26mode=render\", width=730, height=410))\n", "display(out)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Setup\n", "\n", "This a GPU-Free tutorial!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install and import feedback gadget\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Install and import feedback gadget\n", "\n", "!pip3 install vibecheck datatops --quiet\n", "\n", "from vibecheck import DatatopsContentReviewContainer\n", "def content_review(notebook_section: str):\n", " return DatatopsContentReviewContainer(\n", " \"\", # No text prompt\n", " notebook_section,\n", " {\n", " \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n", " \"name\": \"neuromatch_dl\",\n", " \"user_key\": \"f379rz8y\",\n", " },\n", " ).render()\n", "\n", "\n", "feedback_prefix = \"W1D2_T2\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# Imports\n", "import time\n", "import numpy as np\n", "import matplotlib\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Figure settings\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Figure settings\n", "import logging\n", "logging.getLogger('matplotlib.font_manager').disabled = True\n", "\n", "from ipywidgets import interact, IntSlider, FloatSlider, fixed\n", "from ipywidgets import HBox, interactive_output, ToggleButton, Layout\n", "from mpl_toolkits.axes_grid1 import make_axes_locatable\n", "\n", "%config InlineBackend.figure_format = 'retina'\n", "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/content-creation/main/nma.mplstyle\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting functions\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Plotting functions\n", "\n", "def plot_x_y_(x_t_, y_t_, x_ev_, y_ev_, loss_log_, weight_log_):\n", " \"\"\"\n", " Plot train data and test results\n", "\n", " Args:\n", " x_t_: np.ndarray\n", " Training dataset\n", " y_t_: np.ndarray\n", " Ground truth corresponding to training dataset\n", " x_ev_: np.ndarray\n", " Evaluation set\n", " y_ev_: np.ndarray\n", " ShallowNarrowNet predictions\n", " loss_log_: list\n", " Training loss records\n", " weight_log_: list\n", " Training weight records (evolution of weights)\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " plt.figure(figsize=(12, 4))\n", " plt.subplot(1, 3, 1)\n", " plt.scatter(x_t_, y_t_, c='r', label='training data')\n", " plt.plot(x_ev_, y_ev_, c='b', label='test results', linewidth=2)\n", " plt.xlabel('x')\n", " plt.ylabel('y')\n", " plt.legend()\n", " plt.subplot(1, 3, 2)\n", " plt.plot(loss_log_, c='r')\n", " plt.xlabel('epochs')\n", " plt.ylabel('mean squared error')\n", " plt.subplot(1, 3, 3)\n", " plt.plot(weight_log_)\n", " plt.xlabel('epochs')\n", " plt.ylabel('weights')\n", " plt.show()\n", "\n", "\n", "def plot_vector_field(what, init_weights=None):\n", " \"\"\"\n", " Helper function to plot vector fields\n", "\n", " Args:\n", " what: string\n", " If \"all\", plot vectors, trajectories and loss function\n", " If \"vectors\", plot vectors\n", " If \"trajectory\", plot trajectories\n", " If \"loss\", plot loss function\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " n_epochs=40\n", " lr=0.15\n", " x_pos = np.linspace(2.0, 0.5, 100, endpoint=True)\n", " y_pos = 1. / x_pos\n", " xx, yy = np.mgrid[-1.9:2.0:0.2, -1.9:2.0:0.2]\n", " zz = np.empty_like(xx)\n", " x, y = xx[:, 0], yy[0]\n", "\n", " x_temp, y_temp = gen_samples(10, 1.0, 0.0)\n", "\n", " cmap = matplotlib.cm.plasma\n", " plt.figure(figsize=(8, 7))\n", " ax = plt.gca()\n", "\n", " if what == 'all' or what == 'vectors':\n", " for i, a in enumerate(x):\n", " for j, b in enumerate(y):\n", " temp_model = ShallowNarrowLNN([a, b])\n", " da, db = temp_model.dloss_dw(x_temp, y_temp)\n", " zz[i, j] = temp_model.loss(temp_model.forward(x_temp), y_temp)\n", " scale = min(40 * np.sqrt(da**2 + db**2), 50)\n", " ax.quiver(a, b, - da, - db, scale=scale, color=cmap(np.sqrt(da**2 + db**2)))\n", "\n", " if what == 'all' or what == 'trajectory':\n", " if init_weights is None:\n", " for init_weights in [[0.5, -0.5], [0.55, -0.45], [-1.8, 1.7]]:\n", " temp_model = ShallowNarrowLNN(init_weights)\n", " _, temp_records = temp_model.train(x_temp, y_temp, lr, n_epochs)\n", " ax.scatter(temp_records[:, 0], temp_records[:, 1],\n", " c=np.arange(len(temp_records)), cmap='Greys')\n", " ax.scatter(temp_records[0, 0], temp_records[0, 1], c='blue', zorder=9)\n", " ax.scatter(temp_records[-1, 0], temp_records[-1, 1], c='red', marker='X', s=100, zorder=9)\n", " else:\n", " temp_model = ShallowNarrowLNN(init_weights)\n", " _, temp_records = temp_model.train(x_temp, y_temp, lr, n_epochs)\n", " ax.scatter(temp_records[:, 0], temp_records[:, 1],\n", " c=np.arange(len(temp_records)), cmap='Greys')\n", " ax.scatter(temp_records[0, 0], temp_records[0, 1], c='blue', zorder=9)\n", " ax.scatter(temp_records[-1, 0], temp_records[-1, 1], c='red', marker='X', s=100, zorder=9)\n", "\n", " if what == 'all' or what == 'loss':\n", " contplt = ax.contourf(x, y, np.log(zz+0.001), zorder=-1, cmap='coolwarm', levels=100)\n", " divider = make_axes_locatable(ax)\n", " cax = divider.append_axes(\"right\", size=\"5%\", pad=0.05)\n", " cbar = plt.colorbar(contplt, cax=cax)\n", " cbar.set_label('log (Loss)')\n", "\n", " ax.set_xlabel(\"$w_1$\")\n", " ax.set_ylabel(\"$w_2$\")\n", " ax.set_xlim(-1.9, 1.9)\n", " ax.set_ylim(-1.9, 1.9)\n", "\n", " plt.show()\n", "\n", "\n", "def plot_loss_landscape():\n", " \"\"\"\n", " Helper function to plot loss landscapes\n", "\n", " Args:\n", " None\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " x_temp, y_temp = gen_samples(10, 1.0, 0.0)\n", "\n", " xx, yy = np.mgrid[-1.9:2.0:0.2, -1.9:2.0:0.2]\n", " zz = np.empty_like(xx)\n", " x, y = xx[:, 0], yy[0]\n", "\n", " for i, a in enumerate(x):\n", " for j, b in enumerate(y):\n", " temp_model = ShallowNarrowLNN([a, b])\n", " zz[i, j] = temp_model.loss(temp_model.forward(x_temp), y_temp)\n", "\n", " temp_model = ShallowNarrowLNN([-1.8, 1.7])\n", " loss_rec_1, w_rec_1 = temp_model.train(x_temp, y_temp, 0.02, 240)\n", "\n", " temp_model = ShallowNarrowLNN([1.5, -1.5])\n", " loss_rec_2, w_rec_2 = temp_model.train(x_temp, y_temp, 0.02, 240)\n", "\n", " plt.figure(figsize=(12, 8))\n", " ax = plt.subplot(1, 1, 1, projection='3d')\n", " ax.plot_surface(xx, yy, np.log(zz+0.5), cmap='coolwarm', alpha=0.5)\n", " ax.scatter3D(w_rec_1[:, 0], w_rec_1[:, 1], np.log(loss_rec_1+0.5),\n", " c='k', s=50, zorder=9)\n", " ax.scatter3D(w_rec_2[:, 0], w_rec_2[:, 1], np.log(loss_rec_2+0.5),\n", " c='k', s=50, zorder=9)\n", " plt.axis(\"off\")\n", " ax.view_init(45, 260)\n", "\n", " plt.show()\n", "\n", "\n", "def depth_widget(depth):\n", " \"\"\"\n", " Simulate parameter in widget\n", " exploring impact of depth on the training curve\n", " (loss evolution) of a deep but narrow neural network.\n", "\n", " Args:\n", " depth: int\n", " Specifies depth of network\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " if depth == 0:\n", " depth_lr_init_interplay(depth, 0.02, 0.9)\n", " else:\n", " depth_lr_init_interplay(depth, 0.01, 0.9)\n", "\n", "\n", "def lr_widget(lr):\n", " \"\"\"\n", " Simulate parameters in widget\n", " exploring impact of depth on the training curve\n", " (loss evolution) of a deep but narrow neural network.\n", "\n", " Args:\n", " lr: float\n", " Specifies learning rate within network\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " depth_lr_init_interplay(50, lr, 0.9)\n", "\n", "\n", "def depth_lr_interplay(depth, lr):\n", " \"\"\"\n", " Simulate parameters in widget\n", " exploring impact of depth on the training curve\n", " (loss evolution) of a deep but narrow neural network.\n", "\n", " Args:\n", " depth: int\n", " Specifies depth of network\n", " lr: float\n", " Specifies learning rate within network\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " depth_lr_init_interplay(depth, lr, 0.9)\n", "\n", "\n", "def depth_lr_init_interplay(depth, lr, init_weights):\n", " \"\"\"\n", " Simulate parameters in widget\n", " exploring impact of depth on the training curve\n", " (loss evolution) of a deep but narrow neural network.\n", "\n", " Args:\n", " depth: int\n", " Specifies depth of network\n", " lr: float\n", " Specifies learning rate within network\n", " init_weights: list\n", " Specifies initial weights of the network\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " n_epochs = 600\n", "\n", " x_train, y_train = gen_samples(100, 2.0, 0.1)\n", " model = DeepNarrowLNN(np.full((1, depth+1), init_weights))\n", "\n", " plt.figure(figsize=(10, 5))\n", " plt.plot(model.train(x_train, y_train, lr, n_epochs),\n", " linewidth=3.0, c='m')\n", "\n", " plt.title(\"Training a {}-layer LNN with\"\n", " \" $\\eta=${} initialized with $w_i=${}\".format(depth, lr, init_weights), pad=15)\n", " plt.yscale('log')\n", " plt.xlabel('epochs')\n", " plt.ylabel('Log mean squared error')\n", " plt.ylim(0.001, 1.0)\n", " plt.show()\n", "\n", "\n", "def plot_init_effect():\n", " \"\"\"\n", " Helper function to plot evolution of log mean\n", " squared error over epochs\n", "\n", " Args:\n", " None\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " depth = 15\n", " n_epochs = 250\n", " lr = 0.02\n", "\n", " x_train, y_train = gen_samples(100, 2.0, 0.1)\n", "\n", " plt.figure(figsize=(12, 6))\n", " for init_w in np.arange(0.7, 1.09, 0.05):\n", " model = DeepNarrowLNN(np.full((1, depth), init_w))\n", " plt.plot(model.train(x_train, y_train, lr, n_epochs),\n", " linewidth=3.0, label=\"initial weights {:.2f}\".format(init_w))\n", " plt.title(\"Training a {}-layer narrow LNN with $\\eta=${}\".format(depth, lr), pad=15)\n", " plt.yscale('log')\n", " plt.xlabel('epochs')\n", " plt.ylabel('Log mean squared error')\n", " plt.legend(loc='lower left', ncol=4)\n", " plt.ylim(0.001, 1.0)\n", " plt.show()\n", "\n", "\n", "class InterPlay:\n", " \"\"\"\n", " Class specifying parameters for widget\n", " exploring relationship between the depth\n", " and optimal learning rate\n", " \"\"\"\n", "\n", " def __init__(self):\n", " \"\"\"\n", " Initialize parameters for InterPlay\n", "\n", " Args:\n", " None\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " self.lr = [None]\n", " self.depth = [None]\n", " self.success = [None]\n", " self.min_depth, self.max_depth = 5, 65\n", " self.depth_list = np.arange(10, 61, 10)\n", " self.i_depth = 0\n", " self.min_lr, self.max_lr = 0.001, 0.105\n", " self.n_epochs = 600\n", " self.x_train, self.y_train = gen_samples(100, 2.0, 0.1)\n", " self.converged = False\n", " self.button = None\n", " self.slider = None\n", "\n", " def train(self, lr, update=False, init_weights=0.9):\n", " \"\"\"\n", " Train network associated with InterPlay\n", "\n", " Args:\n", " lr: float\n", " Specifies learning rate within network\n", " init_weights: float\n", " Specifies initial weights of the network [default: 0.9]\n", " update: boolean\n", " If true, show updates on widget\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " if update and self.converged and self.i_depth < len(self.depth_list):\n", " depth = self.depth_list[self.i_depth]\n", " self.plot(depth, lr)\n", " self.i_depth += 1\n", " self.lr.append(None)\n", " self.depth.append(None)\n", " self.success.append(None)\n", " self.converged = False\n", " self.slider.value = 0.005\n", " if self.i_depth < len(self.depth_list):\n", " self.button.value = False\n", " self.button.description = 'Explore!'\n", " self.button.disabled = True\n", " self.button.button_style = 'Danger'\n", " else:\n", " self.button.value = False\n", " self.button.button_style = ''\n", " self.button.disabled = True\n", " self.button.description = 'Done!'\n", " time.sleep(1.0)\n", "\n", " elif self.i_depth < len(self.depth_list):\n", " depth = self.depth_list[self.i_depth]\n", " # Additional assert: self.min_depth <= depth <= self.max_depth\n", " assert self.min_lr <= lr <= self.max_lr\n", " self.converged = False\n", "\n", " model = DeepNarrowLNN(np.full((1, depth), init_weights))\n", " self.losses = np.array(model.train(self.x_train, self.y_train, lr, self.n_epochs))\n", " if np.any(self.losses < 1e-2):\n", " success = np.argwhere(self.losses < 1e-2)[0][0]\n", " if np.all((self.losses[success:] < 1e-2)):\n", " self.converged = True\n", " self.success[-1] = success\n", " self.lr[-1] = lr\n", " self.depth[-1] = depth\n", " self.button.disabled = False\n", " self.button.button_style = 'Success'\n", " self.button.description = 'Register!'\n", " else:\n", " self.button.disabled = True\n", " self.button.button_style = 'Danger'\n", " self.button.description = 'Explore!'\n", " else:\n", " self.button.disabled = True\n", " self.button.button_style = 'Danger'\n", " self.button.description = 'Explore!'\n", " self.plot(depth, lr)\n", "\n", " def plot(self, depth, lr):\n", " \"\"\"\n", " Plot following subplots:\n", " a. Log mean squared error vs Epochs\n", " b. Learning time vs Depth\n", " c. Optimal learning rate vs Depth\n", "\n", " Args:\n", " depth: int\n", " Specifies depth of network\n", " lr: float\n", " Specifies learning rate of network\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " fig = plt.figure(constrained_layout=False, figsize=(10, 8))\n", " gs = fig.add_gridspec(2, 2)\n", " ax1 = fig.add_subplot(gs[0, :])\n", " ax2 = fig.add_subplot(gs[1, 0])\n", " ax3 = fig.add_subplot(gs[1, 1])\n", "\n", " ax1.plot(self.losses, linewidth=3.0, c='m')\n", " ax1.set_title(\"Training a {}-layer LNN with\"\n", " \" $\\eta=${}\".format(depth, lr), pad=15, fontsize=16)\n", " ax1.set_yscale('log')\n", " ax1.set_xlabel('epochs')\n", " ax1.set_ylabel('Log mean squared error')\n", " ax1.set_ylim(0.001, 1.0)\n", "\n", " ax2.set_xlim(self.min_depth, self.max_depth)\n", " ax2.set_ylim(-10, self.n_epochs)\n", " ax2.set_xlabel('Depth')\n", " ax2.set_ylabel('Learning time (Epochs)')\n", " ax2.set_title(\"Learning time vs depth\", fontsize=14)\n", " ax2.scatter(np.array(self.depth), np.array(self.success), c='r')\n", "\n", " ax3.set_xlim(self.min_depth, self.max_depth)\n", " ax3.set_ylim(self.min_lr, self.max_lr)\n", " ax3.set_xlabel('Depth')\n", " ax3.set_ylabel('Optimal learning rate')\n", " ax3.set_title(\"Empirically optimal $\\eta$ vs depth\", fontsize=14)\n", " ax3.scatter(np.array(self.depth), np.array(self.lr), c='r')\n", "\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Helper functions\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Helper functions\n", "\n", "def gen_samples(n, a, sigma):\n", " \"\"\"\n", " Generates n samples with\n", " `y = z * x + noise(sigma)` linear relation.\n", "\n", " Args:\n", " n : int\n", " Number of datapoints within sample\n", " a : float\n", " Offset of x\n", " sigma : float\n", " Standard deviation of distribution\n", "\n", " Returns:\n", " x : np.array\n", " if sigma > 0, x = random values\n", " else, x = evenly spaced numbers over a specified interval.\n", " y : np.array\n", " y = z * x + noise(sigma)\n", " \"\"\"\n", " assert n > 0\n", " assert sigma >= 0\n", "\n", " if sigma > 0:\n", " x = np.random.rand(n)\n", " noise = np.random.normal(scale=sigma, size=(n))\n", " y = a * x + noise\n", " else:\n", " x = np.linspace(0.0, 1.0, n, endpoint=True)\n", " y = a * x\n", " return x, y\n", "\n", "\n", "class ShallowNarrowLNN:\n", " \"\"\"\n", " Shallow and narrow (one neuron per layer)\n", " linear neural network\n", " \"\"\"\n", "\n", " def __init__(self, init_ws):\n", " \"\"\"\n", " Initialize parameters of ShallowNarrowLNN\n", "\n", " Args:\n", " init_ws: initial weights as a list\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " assert isinstance(init_ws, list)\n", " assert len(init_ws) == 2\n", " self.w1 = init_ws[0]\n", " self.w2 = init_ws[1]\n", "\n", " def forward(self, x):\n", " \"\"\"\n", " The forward pass through network y = x * w1 * w2\n", "\n", " Args:\n", " x: np.ndarray\n", " Input data\n", "\n", " Returns:\n", " y: np.ndarray\n", " y = x * w1 * w2\n", " \"\"\"\n", " y = x * self.w1 * self.w2\n", " return y\n", "\n", " def loss(self, y_p, y_t):\n", " \"\"\"\n", " Mean squared error (L2)\n", " with 1/2 for convenience\n", "\n", " Args:\n", " y_p: np.ndarray\n", " Network Predictions\n", " y_t: np.ndarray\n", " Targets\n", "\n", " Returns:\n", " mse: float\n", " Average mean squared error\n", " \"\"\"\n", " assert y_p.shape == y_t.shape\n", " mse = ((y_t - y_p)**2).mean()\n", " return mse\n", "\n", " def dloss_dw(self, x, y_t):\n", " \"\"\"\n", " Partial derivative of loss with respect to weights\n", "\n", " Args:\n", " x : np.array\n", " Input Dataset\n", " y_t : np.array\n", " Corresponding Ground Truth\n", "\n", " Returns:\n", " dloss_dw1: float\n", " -mean(2 * self.w2 * x * Error)\n", " dloss_dw2: float\n", " -mean(2 * self.w1 * x * Error)\n", " \"\"\"\n", " assert x.shape == y_t.shape\n", " Error = y_t - self.w1 * self.w2 * x\n", " dloss_dw1 = - (2 * self.w2 * x * Error).mean()\n", " dloss_dw2 = - (2 * self.w1 * x * Error).mean()\n", " return dloss_dw1, dloss_dw2\n", "\n", " def train(self, x, y_t, eta, n_ep):\n", " \"\"\"\n", " Gradient descent algorithm\n", "\n", " Args:\n", " x : np.array\n", " Input Dataset\n", " y_t : np.array\n", " Corrsponding target\n", " eta: float\n", " Learning rate\n", " n_ep : int\n", " Number of epochs\n", "\n", " Returns:\n", " loss_records: np.ndarray\n", " Log of loss per epoch\n", " weight_records: np.ndarray\n", " Log of weights per epoch\n", " \"\"\"\n", " assert x.shape == y_t.shape\n", "\n", " loss_records = np.empty(n_ep) # Pre allocation of loss records\n", " weight_records = np.empty((n_ep, 2)) # Pre allocation of weight records\n", "\n", " for i in range(n_ep):\n", " y_p = self.forward(x)\n", " loss_records[i] = self.loss(y_p, y_t)\n", " dloss_dw1, dloss_dw2 = self.dloss_dw(x, y_t)\n", " self.w1 -= eta * dloss_dw1\n", " self.w2 -= eta * dloss_dw2\n", " weight_records[i] = [self.w1, self.w2]\n", "\n", " return loss_records, weight_records\n", "\n", "\n", "class DeepNarrowLNN:\n", " \"\"\"\n", " Deep but thin (one neuron per layer)\n", " linear neural network\n", " \"\"\"\n", "\n", " def __init__(self, init_ws):\n", " \"\"\"\n", " Initialize parameters of DeepNarrowLNN\n", "\n", " Args:\n", " init_ws: np.ndarray\n", " Initial weights as a numpy array\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " self.n = init_ws.size\n", " self.W = init_ws.reshape(1, -1)\n", "\n", " def forward(self, x):\n", " \"\"\"\n", " Forward pass of DeepNarrowLNN\n", "\n", " Args:\n", " x : np.array\n", " Input features\n", "\n", " Returns:\n", " y: np.array\n", " Product of weights over input features\n", " \"\"\"\n", " y = np.prod(self.W) * x\n", " return y\n", "\n", " def loss(self, y_t, y_p):\n", " \"\"\"\n", " Mean squared error (L2 loss)\n", "\n", " Args:\n", " y_t : np.array\n", " Targets\n", " y_p : np.array\n", " Network's predictions\n", "\n", " Returns:\n", " mse: float\n", " Mean squared error\n", " \"\"\"\n", " assert y_p.shape == y_t.shape\n", " mse = ((y_t - y_p)**2 / 2).mean()\n", " return mse\n", "\n", " def dloss_dw(self, x, y_t, y_p):\n", " \"\"\"\n", " Analytical gradient of weights\n", "\n", " Args:\n", " x : np.array\n", " Input features\n", " y_t : np.array\n", " Targets\n", " y_p : np.array\n", " Network Predictions\n", "\n", " Returns:\n", " dW: np.ndarray\n", " Analytical gradient of weights\n", " \"\"\"\n", " E = y_t - y_p # i.e., y_t - x * np.prod(self.W)\n", " Ex = np.multiply(x, E).mean()\n", " Wp = np.prod(self.W) / (self.W + 1e-9)\n", " dW = - Ex * Wp\n", " return dW\n", "\n", " def train(self, x, y_t, eta, n_epochs):\n", " \"\"\"\n", " Training using gradient descent\n", "\n", " Args:\n", " x : np.array\n", " Input Features\n", " y_t : np.array\n", " Targets\n", " eta: float\n", " Learning rate\n", " n_epochs : int\n", " Number of epochs\n", "\n", " Returns:\n", " loss_records: np.ndarray\n", " Log of loss over epochs\n", " \"\"\"\n", " loss_records = np.empty(n_epochs)\n", " loss_records[:] = np.nan\n", " for i in range(n_epochs):\n", " y_p = self.forward(x)\n", " loss_records[i] = self.loss(y_t, y_p).mean()\n", " dloss_dw = self.dloss_dw(x, y_t, y_p)\n", " if np.isnan(dloss_dw).any() or np.isinf(dloss_dw).any():\n", " return loss_records\n", " self.W -= eta * dloss_dw\n", " return loss_records" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set random seed\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Executing `set_seed(seed=seed)` you are setting the seed\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "#@title Set random seed\n", "\n", "#@markdown Executing `set_seed(seed=seed)` you are setting the seed\n", "\n", "# For DL its critical to set the random seed so that students can have a\n", "# baseline to compare their results to expected results.\n", "# Read more here: https://pytorch.org/docs/stable/notes/randomness.html\n", "\n", "# Call `set_seed` function in the exercises to ensure reproducibility.\n", "import random\n", "import torch\n", "\n", "def set_seed(seed=None, seed_torch=True):\n", " \"\"\"\n", " Function that controls randomness. NumPy and random modules must be imported.\n", "\n", " Args:\n", " seed : Integer\n", " A non-negative integer that defines the random state. Default is `None`.\n", " seed_torch : Boolean\n", " If `True` sets the random seed for pytorch tensors, so pytorch module\n", " must be imported. Default is `True`.\n", "\n", " Returns:\n", " Nothing.\n", " \"\"\"\n", " if seed is None:\n", " seed = np.random.choice(2 ** 32)\n", " random.seed(seed)\n", " np.random.seed(seed)\n", " if seed_torch:\n", " torch.manual_seed(seed)\n", " torch.cuda.manual_seed_all(seed)\n", " torch.cuda.manual_seed(seed)\n", " torch.backends.cudnn.benchmark = False\n", " torch.backends.cudnn.deterministic = True\n", "\n", " print(f'Random seed {seed} has been set.')\n", "\n", "\n", "# In case that `DataLoader` is used\n", "def seed_worker(worker_id):\n", " \"\"\"\n", " DataLoader will reseed workers following randomness in\n", " multi-process data loading algorithm.\n", "\n", " Args:\n", " worker_id: integer\n", " ID of subprocess to seed. 0 means that\n", " the data will be loaded in the main process\n", " Refer: https://pytorch.org/docs/stable/data.html#data-loading-randomness for more details\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " worker_seed = torch.initial_seed() % 2**32\n", " np.random.seed(worker_seed)\n", " random.seed(worker_seed)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set device (GPU or CPU). Execute `set_device()`\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "#@title Set device (GPU or CPU). Execute `set_device()`\n", "# especially if torch modules used.\n", "\n", "# Inform the user if the notebook uses GPU or CPU.\n", "\n", "def set_device():\n", " \"\"\"\n", " Set the device. CUDA if available, CPU otherwise\n", "\n", " Args:\n", " None\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", " if device != \"cuda\":\n", " print(\"GPU is not enabled in this notebook. \\n\"\n", " \"If you want to enable it, in the menu under `Runtime` -> \\n\"\n", " \"`Hardware accelerator.` and select `GPU` from the dropdown menu\")\n", " else:\n", " print(\"GPU is enabled in this notebook. \\n\"\n", " \"If you want to disable it, in the menu under `Runtime` -> \\n\"\n", " \"`Hardware accelerator.` and select `None` from the dropdown menu\")\n", "\n", " return device" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "SEED = 2021\n", "set_seed(seed=SEED)\n", "DEVICE = set_device()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 1: A Shallow Narrow Linear Neural Network\n", "\n", "*Time estimate: ~30 mins*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 1: Shallow Narrow Linear Net\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 1: Shallow Narrow Linear Net\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', '6e5JIYsqVvU'), ('Bilibili', 'BV1F44y117ot')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Shallow_Narrow_Linear_Net_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 1.1: A Shallow Narrow Linear Net" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "To better understand the behavior of neural network training with gradient descent, we start with the incredibly simple case of a shallow narrow linear neural net, since state-of-the-art models are impossible to dissect and comprehend with our current mathematical tools.\n", "\n", "The model we use has one hidden layer, with only one neuron, and two weights. We consider the squared error (or L2 loss) as the cost function. As you may have already guessed, we can visualize the model as a neural network:\n", "\n", "
\n", "\n", "
\n", "\n", "or by its computation graph:\n", "\n", "
\"Shallow
\n", "\n", "or on a rare occasion, even as a reasonably compact mapping:\n", "\n", "$$ loss = (y - w_1 \\cdot w_2 \\cdot x)^2 $$\n", "\n", "
\n", "\n", "Implementing a neural network from scratch without using any Automatic Differentiation tool is rarely necessary. The following two exercises are therefore **Bonus** (optional) exercises. Please ignore them if you have any time-limits or pressure and continue to Section 1.2." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Analytical Exercise 1.1: Loss Gradients (Optional)\n", "\n", "Once again, we ask you to calculate the network gradients analytically, since you will need them for the next exercise. We understand how annoying this is.\n", "\n", "$\\dfrac{\\partial{loss}}{\\partial{w_1}} = ?$\n", "\n", "$\\dfrac{\\partial{loss}}{\\partial{w_2}} = ?$\n", "\n", "
\n", "\n", "---\n", "#### Solution\n", "\n", "$\\dfrac{\\partial{loss}}{\\partial{w_1}} = -2 \\cdot w_2 \\cdot x \\cdot (y - w_1 \\cdot w_2 \\cdot x)$\n", "\n", "$\\dfrac{\\partial{loss}}{\\partial{w_2}} = -2 \\cdot w_1 \\cdot x \\cdot (y - w_1 \\cdot w_2 \\cdot x)$\n", "\n", "---\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Loss_Gradients_Analytical_Exercise\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Coding Exercise 1.1: Implement simple narrow LNN (Optional)\n", "\n", "Next, we ask you to implement the `forward` pass for our model from scratch without using PyTorch.\n", "\n", "Also, although our model gets a single input feature and outputs a single prediction, we could calculate the loss and perform training for multiple samples at once. This is the common practice for neural networks, since computers are incredibly fast doing matrix (or tensor) operations on batches of data, rather than processing samples one at a time through `for` loops. Therefore, for the `loss` function, please implement the **mean** squared error (MSE), and adjust your analytical gradients accordingly when implementing the `dloss_dw` function.\n", "\n", "Finally, complete the `train` function for the gradient descent algorithm:\n", "\n", "\\begin{equation}\n", "\\mathbf{w}^{(t+1)} = \\mathbf{w}^{(t)} - \\eta \\nabla loss (\\mathbf{w}^{(t)})\n", "\\end{equation}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "class ShallowNarrowExercise:\n", " \"\"\"\n", " Shallow and narrow (one neuron per layer) linear neural network\n", " \"\"\"\n", "\n", " def __init__(self, init_weights):\n", " \"\"\"\n", " Initialize parameters of ShallowNarrow Net\n", "\n", " Args:\n", " init_weights: list\n", " Initial weights\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " assert isinstance(init_weights, (list, np.ndarray, tuple))\n", " assert len(init_weights) == 2\n", " self.w1 = init_weights[0]\n", " self.w2 = init_weights[1]\n", "\n", "\n", " def forward(self, x):\n", " \"\"\"\n", " The forward pass through netwrok y = x * w1 * w2\n", "\n", " Args:\n", " x: np.ndarray\n", " Features (inputs) to neural net\n", "\n", " Returns:\n", " y: np.ndarray\n", " Neural network output (predictions)\n", " \"\"\"\n", " #################################################\n", " ## Implement the forward pass to calculate prediction\n", " ## Note that prediction is not the loss\n", " # Complete the function and remove or comment the line below\n", " raise NotImplementedError(\"Forward Pass `forward`\")\n", " #################################################\n", " y = ...\n", " return y\n", "\n", "\n", " def dloss_dw(self, x, y_true):\n", " \"\"\"\n", " Gradient of loss with respect to weights\n", "\n", " Args:\n", " x: np.ndarray\n", " Features (inputs) to neural net\n", " y_true: np.ndarray\n", " True labels\n", "\n", " Returns:\n", " dloss_dw1: float\n", " Mean gradient of loss with respect to w1\n", " dloss_dw2: float\n", " Mean gradient of loss with respect to w2\n", " \"\"\"\n", " assert x.shape == y_true.shape\n", " #################################################\n", " ## Implement the gradient computation function\n", " # Complete the function and remove or comment the line below\n", " raise NotImplementedError(\"Gradient of Loss `dloss_dw`\")\n", " #################################################\n", " dloss_dw1 = ...\n", " dloss_dw2 = ...\n", " return dloss_dw1, dloss_dw2\n", "\n", "\n", " def train(self, x, y_true, lr, n_ep):\n", " \"\"\"\n", " Training with Gradient descent algorithm\n", "\n", " Args:\n", " x: np.ndarray\n", " Features (inputs) to neural net\n", " y_true: np.ndarray\n", " True labels\n", " lr: float\n", " Learning rate\n", " n_ep: int\n", " Number of epochs (training iterations)\n", "\n", " Returns:\n", " loss_records: list\n", " Training loss records\n", " weight_records: list\n", " Training weight records (evolution of weights)\n", " \"\"\"\n", " assert x.shape == y_true.shape\n", "\n", " loss_records = np.empty(n_ep) # Pre allocation of loss records\n", " weight_records = np.empty((n_ep, 2)) # Pre allocation of weight records\n", "\n", " for i in range(n_ep):\n", " y_prediction = self.forward(x)\n", " loss_records[i] = loss(y_prediction, y_true)\n", " dloss_dw1, dloss_dw2 = self.dloss_dw(x, y_true)\n", " #################################################\n", " ## Implement the gradient descent step\n", " # Complete the function and remove or comment the line below\n", " raise NotImplementedError(\"Training loop `train`\")\n", " #################################################\n", " self.w1 -= ...\n", " self.w2 -= ...\n", " weight_records[i] = [self.w1, self.w2]\n", "\n", " return loss_records, weight_records\n", "\n", "\n", "def loss(y_prediction, y_true):\n", " \"\"\"\n", " Mean squared error\n", "\n", " Args:\n", " y_prediction: np.ndarray\n", " Model output (prediction)\n", " y_true: np.ndarray\n", " True label\n", "\n", " Returns:\n", " mse: np.ndarray\n", " Mean squared error loss\n", " \"\"\"\n", " assert y_prediction.shape == y_true.shape\n", " #################################################\n", " ## Implement the MEAN squared error\n", " # Complete the function and remove or comment the line below\n", " raise NotImplementedError(\"Loss function `loss`\")\n", " #################################################\n", " mse = ...\n", " return mse\n", "\n", "\n", "\n", "set_seed(seed=SEED)\n", "n_epochs = 211\n", "learning_rate = 0.02\n", "initial_weights = [1.4, -1.6]\n", "x_train, y_train = gen_samples(n=73, a=2.0, sigma=0.2)\n", "x_eval = np.linspace(0.0, 1.0, 37, endpoint=True)\n", "## Uncomment to run\n", "# sn_model = ShallowNarrowExercise(initial_weights)\n", "# loss_log, weight_log = sn_model.train(x_train, y_train, learning_rate, n_epochs)\n", "# y_eval = sn_model.forward(x_eval)\n", "# plot_x_y_(x_train, y_train, x_eval, y_eval, loss_log, weight_log)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W1D2_LinearDeepLearning/solutions/W1D2_Tutorial2_Solution_61f7913e.py)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Simple_Narrow_LNN_Exercise\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 1.2: Learning landscapes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Video 2: Training Landscape\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 2: Training Landscape\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'k28bnNAcOEg'), ('Bilibili', 'BV1Nv411J71X')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Training_Landscape_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "As you may have already asked yourself, we can analytically find $w_1$ and $w_2$ without using gradient descent:\n", "\n", "\\begin{equation}\n", "w_1 \\cdot w_2 = \\dfrac{y}{x}\n", "\\end{equation}\n", "\n", "In fact, we can plot the gradients, the loss function and all the possible solutions in one figure. In this example, we use the $y = 1x$ mapping:\n", "\n", "**Blue ribbon**: shows all possible solutions: $~ w_1 w_2 = \\dfrac{y}{x} = \\dfrac{x}{x} = 1 \\Rightarrow w_1 = \\dfrac{1}{w_2}$\n", "\n", "**Contour background**: Shows the loss values, red being higher loss\n", "\n", "**Vector field (arrows)**: shows the gradient vector field. The larger yellow arrows show larger gradients, which correspond to bigger steps by gradient descent.\n", "\n", "**Scatter circles**: the trajectory (evolution) of weights during training for three different initializations, with blue dots marking the start of training and red crosses ( **x** ) marking the end of training. You can also try your own initializations (keep the initial values between `-2.0` and `2.0`) as shown here:\n", "```python\n", "plot_vector_field('all', [1.0, -1.0])\n", "```\n", "\n", "Finally, if the plot is too crowded, feel free to pass one of the following strings as argument:\n", "\n", "```python\n", "plot_vector_field('vectors') # For vector field\n", "plot_vector_field('trajectory') # For training trajectory\n", "plot_vector_field('loss') # For loss contour\n", "```\n", "\n", "**Think!**\n", "\n", "Explore the next two plots. Try different initial values. Can you find the saddle point? Why does training slow down near the minima?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "plot_vector_field('all')" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Here, we also visualize the loss landscape in a 3-D plot, with two training trajectories for different initial conditions.\n", "Note: the trajectories from the 3D plot and the previous plot are independent and different." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "plot_loss_landscape()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Video 3: Training Landscape - Discussion\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 3: Training Landscape - Discussion\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', '0EcUGgxOdkI'), ('Bilibili', 'BV1py4y1j7cv')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Training_Landscape_Discussion_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 2: Depth, Learning rate, and initialization\n", "*Time estimate: ~45 mins*" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Successful deep learning models are often developed by a team of very clever people, spending many many hours \"tuning\" learning hyperparameters, and finding effective initializations. In this section, we look at three basic (but often not simple) hyperparameters: depth, learning rate, and initialization." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 2.1: The effect of depth" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Video 4: Effect of Depth\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 4: Effect of Depth\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'Ii_As9cRR5Q'), ('Bilibili', 'BV1z341167di')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Effect_of_Depth_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Why might depth be useful? What makes a network or learning system \"deep\"? The reality is that shallow neural nets are often incapable of learning complex functions due to data limitations. On the other hand, depth seems like magic. Depth can change the functions a network can represent, the way a network learns, and how a network generalizes to unseen data.\n", "\n", "So let's look at the challenges that depth poses in training a neural network. Imagine a single input, single output linear network with 50 hidden layers and only one neuron per layer (i.e. a narrow deep neural network). The output of the network is easy to calculate:\n", "\n", "$$ prediction = x \\cdot w_1 \\cdot w_2 \\cdot \\cdot \\cdot w_{50} $$\n", "\n", "If the initial value for all the weights is $w_i = 2$, the prediction for $x=1$ would be **exploding**: $y_p = 2^{50} \\approx 1.1256 \\times 10^{15}$. On the other hand, for weights initialized to $w_i = 0.5$, the output is **vanishing**: $y_p = 0.5^{50} \\approx 8.88 \\times 10^{-16}$. Similarly, if we recall the chain rule, as the graph gets deeper, the number of elements in the chain multiplication increases, which could lead to exploding or vanishing gradients. To avoid such numerical vulnerablities that could impair our training algorithm, we need to understand the effect of depth.\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Interactive Demo 2.1: Depth widget\n", "\n", "Use the widget to explore the impact of depth on the training curve (loss evolution) of a deep but narrow neural network.\n", "\n", "**Think!**\n", "\n", "Which networks trained the fastest? Did all networks eventually \"work\" (converge)? What is the shape of their learning trajectory?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Make sure you execute this cell to enable the widget!\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Make sure you execute this cell to enable the widget!\n", "\n", "_ = interact(depth_widget,\n", " depth = IntSlider(min=0, max=51,\n", " step=5, value=0,\n", " continuous_update=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Video 5: Effect of Depth - Discussion\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 5: Effect of Depth - Discussion\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'EqSDkwmSruk'), ('Bilibili', 'BV1Qq4y1H7uk')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Effect_of_Depth_Discussion_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 2.2: Choosing a learning rate" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "The learning rate is a common hyperparameter for most optimization algorithms. How should we set it? Sometimes the only option is to try all the possibilities, but sometimes knowing some key trade-offs will help guide our search for good hyperparameters." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Video 6: Learning Rate\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 6: Learning Rate\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'w_GrCVM-_Qo'), ('Bilibili', 'BV11f4y157MT')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Learning_Rate_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Interactive Demo 2.2: Learning rate widget\n", "\n", "Here, we fix the network depth to 50 layers. Use the widget to explore the impact of learning rate $\\eta$ on the training curve (loss evolution) of a deep but narrow neural network.\n", "\n", "**Think!**\n", "\n", "Can we say that larger learning rates always lead to faster learning? Why not?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Make sure you execute this cell to enable the widget!\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Make sure you execute this cell to enable the widget!\n", "\n", "_ = interact(lr_widget,\n", " lr = FloatSlider(min=0.005, max=0.045, step=0.005, value=0.005,\n", " continuous_update=False, readout_format='.3f',\n", " description='eta'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Video 7: Learning Rate - Discussion\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 7: Learning Rate - Discussion\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'cmS0yqImz2E'), ('Bilibili', 'BV1Aq4y1p7bh')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Learning_Rate_Discussion_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 2.3: Depth vs Learning Rate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Video 8: Depth and Learning Rate\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 8: Depth and Learning Rate\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'J30phrux_3k'), ('Bilibili', 'BV1V44y1177e')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Depth_and_Learning_Rate_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Interactive Demo 2.3: Depth and Learning Rate\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "**Important instruction**\n", "The exercise starts with 10 hidden layers. Your task is to find the learning rate that delivers fast but robust convergence (learning). When you are confident about the learning rate, you can **Register** the optimal learning rate for the given depth. Once you press register, a deeper model is instantiated, so you can find the next optimal learning rate. The Register button turns green only when the training converges, but does not imply the fastest convergence. Finally, be patient :) the widgets are slow.\n", "\n", "\n", "**Think!**\n", "\n", "Can you explain the relationship between the depth and optimal learning rate?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Make sure you execute this cell to enable the widget!\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Make sure you execute this cell to enable the widget!\n", "intpl_obj = InterPlay()\n", "\n", "intpl_obj.slider = FloatSlider(min=0.005, max=0.105, step=0.005, value=0.005,\n", " layout=Layout(width='500px'),\n", " continuous_update=False,\n", " readout_format='.3f',\n", " description='eta')\n", "\n", "intpl_obj.button = ToggleButton(value=intpl_obj.converged, description='Register')\n", "\n", "widgets_ui = HBox([intpl_obj.slider, intpl_obj.button])\n", "widgets_out = interactive_output(intpl_obj.train,\n", " {'lr': intpl_obj.slider,\n", " 'update': intpl_obj.button,\n", " 'init_weights': fixed(0.9)})\n", "\n", "display(widgets_ui, widgets_out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Depth_and_Learning_Rate_Interactive_Demo\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Video 9: Depth and Learning Rate - Discussion\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 9: Depth and Learning Rate - Discussion\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', '7Fl8vH7cgco'), ('Bilibili', 'BV15q4y1p7Uq')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Depth_and_Learning_Rate_Discussion_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 2.4: Why initialization is important" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Video 10: Initialization Matters\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 10: Initialization Matters\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'KmqCz95AMzY'), ('Bilibili', 'BV1UL411J7vu')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Initialization_Matters\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "We’ve seen, even in the simplest of cases, that depth can slow learning. Why? From the chain rule, gradients are multiplied by the current weight at each layer, so the product can vanish or explode. Therefore, weight initialization is a fundamentally important hyperparameter.\n", "\n", "Although in practice initial values for learnable parameters are often sampled from different $\\mathcal{Uniform}$ or $\\mathcal{Normal}$ probability distribution, here we use a single value for all the parameters.\n", "\n", "The figure below shows the effect of initialization on the speed of learning for the deep but narrow LNN. We have excluded initializations that lead to numerical errors such as `nan` or `inf`, which are the consequence of smaller or larger initializations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Make sure you execute this cell to see the figure!\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Make sure you execute this cell to see the figure!\n", "\n", "plot_init_effect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Video 11: Initialization Matters - Discussion\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 11: Initialization Matters - Discussion\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'vKktGdiQDsE'), ('Bilibili', 'BV1hM4y1T7gJ')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Initialization_Matters_Discussion_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Summary\n", "\n", "In the second tutorial, we have learned what is the training landscape, and also we have see in depth the effect of the depth of the network and the learning rate, and their interplay. Finally, we have seen that initialization matters and why we need smart ways of initialization." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 12: Tutorial 2 Wrap-up\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 12: Tutorial 2 Wrap-up\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'r3K8gtak3wA'), ('Bilibili', 'BV1P44y117Pd')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_WrapUp_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Bonus" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Hyperparameter interaction\n", "\n", "Finally, let's put everything we learned together and find best initial weights and learning rate for a given depth. By now you should have learned the interactions and know how to find the optimal values quickly. If you get `numerical overflow` warnings, don't be discouraged! They are often caused by \"exploding\" or \"vanishing\" gradients.\n", "\n", "**Think!**\n", "\n", "Did you experience any surprising behaviour\n", "or difficulty finding the optimal parameters?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Make sure you execute this cell to enable the widget!\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Make sure you execute this cell to enable the widget!\n", "\n", "_ = interact(depth_lr_init_interplay,\n", " depth = IntSlider(min=10, max=51, step=5, value=25,\n", " continuous_update=False),\n", " lr = FloatSlider(min=0.001, max=0.1,\n", " step=0.005, value=0.005,\n", " continuous_update=False,\n", " readout_format='.3f',\n", " description='eta'),\n", " init_weights = FloatSlider(min=0.1, max=3.0,\n", " step=0.1, value=0.9,\n", " continuous_update=False,\n", " readout_format='.3f',\n", " description='initial weights'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Hyperparameter_interaction_Bonus_Discussion\")" ] } ], "metadata": { "colab": { "collapsed_sections": [], "include_colab_link": true, "name": "W1D2_Tutorial2", "provenance": [], "toc_visible": true }, "kernel": { "display_name": "Python 3", "language": "python", "name": "python3" }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }