{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {},
"id": "view-in-github"
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"# Tutorial 2: Learning Hyperparameters\n",
"\n",
"**Week 1, Day 2: Linear Deep Learning**\n",
"\n",
"**By Neuromatch Academy**\n",
"\n",
"__Content creators:__ Saeed Salehi, Andrew Saxe\n",
"\n",
"__Content reviewers:__ Polina Turishcheva, Antoine De Comite, Kelson Shilling-Scrivo\n",
"\n",
"__Content editors:__ Anoop Kulkarni\n",
"\n",
"__Production editors:__ Khalid Almubarak, Gagana B, Spiros Chavlis"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Tutorial Objectives\n",
"\n",
"* Training landscape\n",
"* The effect of depth\n",
"* Choosing a learning rate\n",
"* Initialization matters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @markdown\n",
"from IPython.display import IFrame\n",
"from ipywidgets import widgets\n",
"out = widgets.Output()\n",
"with out:\n",
" print(f\"If you want to download the slides: https://osf.io/download/sne2m/\")\n",
" display(IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/sne2m/?direct%26mode=render%26action=download%26mode=render\", width=730, height=410))\n",
"display(out)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Setup\n",
"\n",
"This a GPU-Free tutorial!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install and import feedback gadget\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Install and import feedback gadget\n",
"\n",
"!pip3 install vibecheck datatops --quiet\n",
"\n",
"from vibecheck import DatatopsContentReviewContainer\n",
"def content_review(notebook_section: str):\n",
" return DatatopsContentReviewContainer(\n",
" \"\", # No text prompt\n",
" notebook_section,\n",
" {\n",
" \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n",
" \"name\": \"neuromatch_dl\",\n",
" \"user_key\": \"f379rz8y\",\n",
" },\n",
" ).render()\n",
"\n",
"\n",
"feedback_prefix = \"W1D2_T2\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"# Imports\n",
"import time\n",
"import numpy as np\n",
"import matplotlib\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Figure settings\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Figure settings\n",
"import logging\n",
"logging.getLogger('matplotlib.font_manager').disabled = True\n",
"\n",
"from ipywidgets import interact, IntSlider, FloatSlider, fixed\n",
"from ipywidgets import HBox, interactive_output, ToggleButton, Layout\n",
"from mpl_toolkits.axes_grid1 import make_axes_locatable\n",
"\n",
"%config InlineBackend.figure_format = 'retina'\n",
"plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/content-creation/main/nma.mplstyle\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotting functions\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Plotting functions\n",
"\n",
"def plot_x_y_(x_t_, y_t_, x_ev_, y_ev_, loss_log_, weight_log_):\n",
" \"\"\"\n",
" Plot train data and test results\n",
"\n",
" Args:\n",
" x_t_: np.ndarray\n",
" Training dataset\n",
" y_t_: np.ndarray\n",
" Ground truth corresponding to training dataset\n",
" x_ev_: np.ndarray\n",
" Evaluation set\n",
" y_ev_: np.ndarray\n",
" ShallowNarrowNet predictions\n",
" loss_log_: list\n",
" Training loss records\n",
" weight_log_: list\n",
" Training weight records (evolution of weights)\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" plt.figure(figsize=(12, 4))\n",
" plt.subplot(1, 3, 1)\n",
" plt.scatter(x_t_, y_t_, c='r', label='training data')\n",
" plt.plot(x_ev_, y_ev_, c='b', label='test results', linewidth=2)\n",
" plt.xlabel('x')\n",
" plt.ylabel('y')\n",
" plt.legend()\n",
" plt.subplot(1, 3, 2)\n",
" plt.plot(loss_log_, c='r')\n",
" plt.xlabel('epochs')\n",
" plt.ylabel('mean squared error')\n",
" plt.subplot(1, 3, 3)\n",
" plt.plot(weight_log_)\n",
" plt.xlabel('epochs')\n",
" plt.ylabel('weights')\n",
" plt.show()\n",
"\n",
"\n",
"def plot_vector_field(what, init_weights=None):\n",
" \"\"\"\n",
" Helper function to plot vector fields\n",
"\n",
" Args:\n",
" what: string\n",
" If \"all\", plot vectors, trajectories and loss function\n",
" If \"vectors\", plot vectors\n",
" If \"trajectory\", plot trajectories\n",
" If \"loss\", plot loss function\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" n_epochs=40\n",
" lr=0.15\n",
" x_pos = np.linspace(2.0, 0.5, 100, endpoint=True)\n",
" y_pos = 1. / x_pos\n",
" xx, yy = np.mgrid[-1.9:2.0:0.2, -1.9:2.0:0.2]\n",
" zz = np.empty_like(xx)\n",
" x, y = xx[:, 0], yy[0]\n",
"\n",
" x_temp, y_temp = gen_samples(10, 1.0, 0.0)\n",
"\n",
" cmap = matplotlib.cm.plasma\n",
" plt.figure(figsize=(8, 7))\n",
" ax = plt.gca()\n",
"\n",
" if what == 'all' or what == 'vectors':\n",
" for i, a in enumerate(x):\n",
" for j, b in enumerate(y):\n",
" temp_model = ShallowNarrowLNN([a, b])\n",
" da, db = temp_model.dloss_dw(x_temp, y_temp)\n",
" zz[i, j] = temp_model.loss(temp_model.forward(x_temp), y_temp)\n",
" scale = min(40 * np.sqrt(da**2 + db**2), 50)\n",
" ax.quiver(a, b, - da, - db, scale=scale, color=cmap(np.sqrt(da**2 + db**2)))\n",
"\n",
" if what == 'all' or what == 'trajectory':\n",
" if init_weights is None:\n",
" for init_weights in [[0.5, -0.5], [0.55, -0.45], [-1.8, 1.7]]:\n",
" temp_model = ShallowNarrowLNN(init_weights)\n",
" _, temp_records = temp_model.train(x_temp, y_temp, lr, n_epochs)\n",
" ax.scatter(temp_records[:, 0], temp_records[:, 1],\n",
" c=np.arange(len(temp_records)), cmap='Greys')\n",
" ax.scatter(temp_records[0, 0], temp_records[0, 1], c='blue', zorder=9)\n",
" ax.scatter(temp_records[-1, 0], temp_records[-1, 1], c='red', marker='X', s=100, zorder=9)\n",
" else:\n",
" temp_model = ShallowNarrowLNN(init_weights)\n",
" _, temp_records = temp_model.train(x_temp, y_temp, lr, n_epochs)\n",
" ax.scatter(temp_records[:, 0], temp_records[:, 1],\n",
" c=np.arange(len(temp_records)), cmap='Greys')\n",
" ax.scatter(temp_records[0, 0], temp_records[0, 1], c='blue', zorder=9)\n",
" ax.scatter(temp_records[-1, 0], temp_records[-1, 1], c='red', marker='X', s=100, zorder=9)\n",
"\n",
" if what == 'all' or what == 'loss':\n",
" contplt = ax.contourf(x, y, np.log(zz+0.001), zorder=-1, cmap='coolwarm', levels=100)\n",
" divider = make_axes_locatable(ax)\n",
" cax = divider.append_axes(\"right\", size=\"5%\", pad=0.05)\n",
" cbar = plt.colorbar(contplt, cax=cax)\n",
" cbar.set_label('log (Loss)')\n",
"\n",
" ax.set_xlabel(\"$w_1$\")\n",
" ax.set_ylabel(\"$w_2$\")\n",
" ax.set_xlim(-1.9, 1.9)\n",
" ax.set_ylim(-1.9, 1.9)\n",
"\n",
" plt.show()\n",
"\n",
"\n",
"def plot_loss_landscape():\n",
" \"\"\"\n",
" Helper function to plot loss landscapes\n",
"\n",
" Args:\n",
" None\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" x_temp, y_temp = gen_samples(10, 1.0, 0.0)\n",
"\n",
" xx, yy = np.mgrid[-1.9:2.0:0.2, -1.9:2.0:0.2]\n",
" zz = np.empty_like(xx)\n",
" x, y = xx[:, 0], yy[0]\n",
"\n",
" for i, a in enumerate(x):\n",
" for j, b in enumerate(y):\n",
" temp_model = ShallowNarrowLNN([a, b])\n",
" zz[i, j] = temp_model.loss(temp_model.forward(x_temp), y_temp)\n",
"\n",
" temp_model = ShallowNarrowLNN([-1.8, 1.7])\n",
" loss_rec_1, w_rec_1 = temp_model.train(x_temp, y_temp, 0.02, 240)\n",
"\n",
" temp_model = ShallowNarrowLNN([1.5, -1.5])\n",
" loss_rec_2, w_rec_2 = temp_model.train(x_temp, y_temp, 0.02, 240)\n",
"\n",
" plt.figure(figsize=(12, 8))\n",
" ax = plt.subplot(1, 1, 1, projection='3d')\n",
" ax.plot_surface(xx, yy, np.log(zz+0.5), cmap='coolwarm', alpha=0.5)\n",
" ax.scatter3D(w_rec_1[:, 0], w_rec_1[:, 1], np.log(loss_rec_1+0.5),\n",
" c='k', s=50, zorder=9)\n",
" ax.scatter3D(w_rec_2[:, 0], w_rec_2[:, 1], np.log(loss_rec_2+0.5),\n",
" c='k', s=50, zorder=9)\n",
" plt.axis(\"off\")\n",
" ax.view_init(45, 260)\n",
"\n",
" plt.show()\n",
"\n",
"\n",
"def depth_widget(depth):\n",
" \"\"\"\n",
" Simulate parameter in widget\n",
" exploring impact of depth on the training curve\n",
" (loss evolution) of a deep but narrow neural network.\n",
"\n",
" Args:\n",
" depth: int\n",
" Specifies depth of network\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" if depth == 0:\n",
" depth_lr_init_interplay(depth, 0.02, 0.9)\n",
" else:\n",
" depth_lr_init_interplay(depth, 0.01, 0.9)\n",
"\n",
"\n",
"def lr_widget(lr):\n",
" \"\"\"\n",
" Simulate parameters in widget\n",
" exploring impact of depth on the training curve\n",
" (loss evolution) of a deep but narrow neural network.\n",
"\n",
" Args:\n",
" lr: float\n",
" Specifies learning rate within network\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" depth_lr_init_interplay(50, lr, 0.9)\n",
"\n",
"\n",
"def depth_lr_interplay(depth, lr):\n",
" \"\"\"\n",
" Simulate parameters in widget\n",
" exploring impact of depth on the training curve\n",
" (loss evolution) of a deep but narrow neural network.\n",
"\n",
" Args:\n",
" depth: int\n",
" Specifies depth of network\n",
" lr: float\n",
" Specifies learning rate within network\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" depth_lr_init_interplay(depth, lr, 0.9)\n",
"\n",
"\n",
"def depth_lr_init_interplay(depth, lr, init_weights):\n",
" \"\"\"\n",
" Simulate parameters in widget\n",
" exploring impact of depth on the training curve\n",
" (loss evolution) of a deep but narrow neural network.\n",
"\n",
" Args:\n",
" depth: int\n",
" Specifies depth of network\n",
" lr: float\n",
" Specifies learning rate within network\n",
" init_weights: list\n",
" Specifies initial weights of the network\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" n_epochs = 600\n",
"\n",
" x_train, y_train = gen_samples(100, 2.0, 0.1)\n",
" model = DeepNarrowLNN(np.full((1, depth+1), init_weights))\n",
"\n",
" plt.figure(figsize=(10, 5))\n",
" plt.plot(model.train(x_train, y_train, lr, n_epochs),\n",
" linewidth=3.0, c='m')\n",
"\n",
" plt.title(\"Training a {}-layer LNN with\"\n",
" \" $\\eta=${} initialized with $w_i=${}\".format(depth, lr, init_weights), pad=15)\n",
" plt.yscale('log')\n",
" plt.xlabel('epochs')\n",
" plt.ylabel('Log mean squared error')\n",
" plt.ylim(0.001, 1.0)\n",
" plt.show()\n",
"\n",
"\n",
"def plot_init_effect():\n",
" \"\"\"\n",
" Helper function to plot evolution of log mean\n",
" squared error over epochs\n",
"\n",
" Args:\n",
" None\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" depth = 15\n",
" n_epochs = 250\n",
" lr = 0.02\n",
"\n",
" x_train, y_train = gen_samples(100, 2.0, 0.1)\n",
"\n",
" plt.figure(figsize=(12, 6))\n",
" for init_w in np.arange(0.7, 1.09, 0.05):\n",
" model = DeepNarrowLNN(np.full((1, depth), init_w))\n",
" plt.plot(model.train(x_train, y_train, lr, n_epochs),\n",
" linewidth=3.0, label=\"initial weights {:.2f}\".format(init_w))\n",
" plt.title(\"Training a {}-layer narrow LNN with $\\eta=${}\".format(depth, lr), pad=15)\n",
" plt.yscale('log')\n",
" plt.xlabel('epochs')\n",
" plt.ylabel('Log mean squared error')\n",
" plt.legend(loc='lower left', ncol=4)\n",
" plt.ylim(0.001, 1.0)\n",
" plt.show()\n",
"\n",
"\n",
"class InterPlay:\n",
" \"\"\"\n",
" Class specifying parameters for widget\n",
" exploring relationship between the depth\n",
" and optimal learning rate\n",
" \"\"\"\n",
"\n",
" def __init__(self):\n",
" \"\"\"\n",
" Initialize parameters for InterPlay\n",
"\n",
" Args:\n",
" None\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" self.lr = [None]\n",
" self.depth = [None]\n",
" self.success = [None]\n",
" self.min_depth, self.max_depth = 5, 65\n",
" self.depth_list = np.arange(10, 61, 10)\n",
" self.i_depth = 0\n",
" self.min_lr, self.max_lr = 0.001, 0.105\n",
" self.n_epochs = 600\n",
" self.x_train, self.y_train = gen_samples(100, 2.0, 0.1)\n",
" self.converged = False\n",
" self.button = None\n",
" self.slider = None\n",
"\n",
" def train(self, lr, update=False, init_weights=0.9):\n",
" \"\"\"\n",
" Train network associated with InterPlay\n",
"\n",
" Args:\n",
" lr: float\n",
" Specifies learning rate within network\n",
" init_weights: float\n",
" Specifies initial weights of the network [default: 0.9]\n",
" update: boolean\n",
" If true, show updates on widget\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" if update and self.converged and self.i_depth < len(self.depth_list):\n",
" depth = self.depth_list[self.i_depth]\n",
" self.plot(depth, lr)\n",
" self.i_depth += 1\n",
" self.lr.append(None)\n",
" self.depth.append(None)\n",
" self.success.append(None)\n",
" self.converged = False\n",
" self.slider.value = 0.005\n",
" if self.i_depth < len(self.depth_list):\n",
" self.button.value = False\n",
" self.button.description = 'Explore!'\n",
" self.button.disabled = True\n",
" self.button.button_style = 'Danger'\n",
" else:\n",
" self.button.value = False\n",
" self.button.button_style = ''\n",
" self.button.disabled = True\n",
" self.button.description = 'Done!'\n",
" time.sleep(1.0)\n",
"\n",
" elif self.i_depth < len(self.depth_list):\n",
" depth = self.depth_list[self.i_depth]\n",
" # Additional assert: self.min_depth <= depth <= self.max_depth\n",
" assert self.min_lr <= lr <= self.max_lr\n",
" self.converged = False\n",
"\n",
" model = DeepNarrowLNN(np.full((1, depth), init_weights))\n",
" self.losses = np.array(model.train(self.x_train, self.y_train, lr, self.n_epochs))\n",
" if np.any(self.losses < 1e-2):\n",
" success = np.argwhere(self.losses < 1e-2)[0][0]\n",
" if np.all((self.losses[success:] < 1e-2)):\n",
" self.converged = True\n",
" self.success[-1] = success\n",
" self.lr[-1] = lr\n",
" self.depth[-1] = depth\n",
" self.button.disabled = False\n",
" self.button.button_style = 'Success'\n",
" self.button.description = 'Register!'\n",
" else:\n",
" self.button.disabled = True\n",
" self.button.button_style = 'Danger'\n",
" self.button.description = 'Explore!'\n",
" else:\n",
" self.button.disabled = True\n",
" self.button.button_style = 'Danger'\n",
" self.button.description = 'Explore!'\n",
" self.plot(depth, lr)\n",
"\n",
" def plot(self, depth, lr):\n",
" \"\"\"\n",
" Plot following subplots:\n",
" a. Log mean squared error vs Epochs\n",
" b. Learning time vs Depth\n",
" c. Optimal learning rate vs Depth\n",
"\n",
" Args:\n",
" depth: int\n",
" Specifies depth of network\n",
" lr: float\n",
" Specifies learning rate of network\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" fig = plt.figure(constrained_layout=False, figsize=(10, 8))\n",
" gs = fig.add_gridspec(2, 2)\n",
" ax1 = fig.add_subplot(gs[0, :])\n",
" ax2 = fig.add_subplot(gs[1, 0])\n",
" ax3 = fig.add_subplot(gs[1, 1])\n",
"\n",
" ax1.plot(self.losses, linewidth=3.0, c='m')\n",
" ax1.set_title(\"Training a {}-layer LNN with\"\n",
" \" $\\eta=${}\".format(depth, lr), pad=15, fontsize=16)\n",
" ax1.set_yscale('log')\n",
" ax1.set_xlabel('epochs')\n",
" ax1.set_ylabel('Log mean squared error')\n",
" ax1.set_ylim(0.001, 1.0)\n",
"\n",
" ax2.set_xlim(self.min_depth, self.max_depth)\n",
" ax2.set_ylim(-10, self.n_epochs)\n",
" ax2.set_xlabel('Depth')\n",
" ax2.set_ylabel('Learning time (Epochs)')\n",
" ax2.set_title(\"Learning time vs depth\", fontsize=14)\n",
" ax2.scatter(np.array(self.depth), np.array(self.success), c='r')\n",
"\n",
" ax3.set_xlim(self.min_depth, self.max_depth)\n",
" ax3.set_ylim(self.min_lr, self.max_lr)\n",
" ax3.set_xlabel('Depth')\n",
" ax3.set_ylabel('Optimal learning rate')\n",
" ax3.set_title(\"Empirically optimal $\\eta$ vs depth\", fontsize=14)\n",
" ax3.scatter(np.array(self.depth), np.array(self.lr), c='r')\n",
"\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Helper functions\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Helper functions\n",
"\n",
"def gen_samples(n, a, sigma):\n",
" \"\"\"\n",
" Generates n samples with\n",
" `y = z * x + noise(sigma)` linear relation.\n",
"\n",
" Args:\n",
" n : int\n",
" Number of datapoints within sample\n",
" a : float\n",
" Offset of x\n",
" sigma : float\n",
" Standard deviation of distribution\n",
"\n",
" Returns:\n",
" x : np.array\n",
" if sigma > 0, x = random values\n",
" else, x = evenly spaced numbers over a specified interval.\n",
" y : np.array\n",
" y = z * x + noise(sigma)\n",
" \"\"\"\n",
" assert n > 0\n",
" assert sigma >= 0\n",
"\n",
" if sigma > 0:\n",
" x = np.random.rand(n)\n",
" noise = np.random.normal(scale=sigma, size=(n))\n",
" y = a * x + noise\n",
" else:\n",
" x = np.linspace(0.0, 1.0, n, endpoint=True)\n",
" y = a * x\n",
" return x, y\n",
"\n",
"\n",
"class ShallowNarrowLNN:\n",
" \"\"\"\n",
" Shallow and narrow (one neuron per layer)\n",
" linear neural network\n",
" \"\"\"\n",
"\n",
" def __init__(self, init_ws):\n",
" \"\"\"\n",
" Initialize parameters of ShallowNarrowLNN\n",
"\n",
" Args:\n",
" init_ws: initial weights as a list\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" assert isinstance(init_ws, list)\n",
" assert len(init_ws) == 2\n",
" self.w1 = init_ws[0]\n",
" self.w2 = init_ws[1]\n",
"\n",
" def forward(self, x):\n",
" \"\"\"\n",
" The forward pass through network y = x * w1 * w2\n",
"\n",
" Args:\n",
" x: np.ndarray\n",
" Input data\n",
"\n",
" Returns:\n",
" y: np.ndarray\n",
" y = x * w1 * w2\n",
" \"\"\"\n",
" y = x * self.w1 * self.w2\n",
" return y\n",
"\n",
" def loss(self, y_p, y_t):\n",
" \"\"\"\n",
" Mean squared error (L2)\n",
" with 1/2 for convenience\n",
"\n",
" Args:\n",
" y_p: np.ndarray\n",
" Network Predictions\n",
" y_t: np.ndarray\n",
" Targets\n",
"\n",
" Returns:\n",
" mse: float\n",
" Average mean squared error\n",
" \"\"\"\n",
" assert y_p.shape == y_t.shape\n",
" mse = ((y_t - y_p)**2).mean()\n",
" return mse\n",
"\n",
" def dloss_dw(self, x, y_t):\n",
" \"\"\"\n",
" Partial derivative of loss with respect to weights\n",
"\n",
" Args:\n",
" x : np.array\n",
" Input Dataset\n",
" y_t : np.array\n",
" Corresponding Ground Truth\n",
"\n",
" Returns:\n",
" dloss_dw1: float\n",
" -mean(2 * self.w2 * x * Error)\n",
" dloss_dw2: float\n",
" -mean(2 * self.w1 * x * Error)\n",
" \"\"\"\n",
" assert x.shape == y_t.shape\n",
" Error = y_t - self.w1 * self.w2 * x\n",
" dloss_dw1 = - (2 * self.w2 * x * Error).mean()\n",
" dloss_dw2 = - (2 * self.w1 * x * Error).mean()\n",
" return dloss_dw1, dloss_dw2\n",
"\n",
" def train(self, x, y_t, eta, n_ep):\n",
" \"\"\"\n",
" Gradient descent algorithm\n",
"\n",
" Args:\n",
" x : np.array\n",
" Input Dataset\n",
" y_t : np.array\n",
" Corrsponding target\n",
" eta: float\n",
" Learning rate\n",
" n_ep : int\n",
" Number of epochs\n",
"\n",
" Returns:\n",
" loss_records: np.ndarray\n",
" Log of loss per epoch\n",
" weight_records: np.ndarray\n",
" Log of weights per epoch\n",
" \"\"\"\n",
" assert x.shape == y_t.shape\n",
"\n",
" loss_records = np.empty(n_ep) # Pre allocation of loss records\n",
" weight_records = np.empty((n_ep, 2)) # Pre allocation of weight records\n",
"\n",
" for i in range(n_ep):\n",
" y_p = self.forward(x)\n",
" loss_records[i] = self.loss(y_p, y_t)\n",
" dloss_dw1, dloss_dw2 = self.dloss_dw(x, y_t)\n",
" self.w1 -= eta * dloss_dw1\n",
" self.w2 -= eta * dloss_dw2\n",
" weight_records[i] = [self.w1, self.w2]\n",
"\n",
" return loss_records, weight_records\n",
"\n",
"\n",
"class DeepNarrowLNN:\n",
" \"\"\"\n",
" Deep but thin (one neuron per layer)\n",
" linear neural network\n",
" \"\"\"\n",
"\n",
" def __init__(self, init_ws):\n",
" \"\"\"\n",
" Initialize parameters of DeepNarrowLNN\n",
"\n",
" Args:\n",
" init_ws: np.ndarray\n",
" Initial weights as a numpy array\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" self.n = init_ws.size\n",
" self.W = init_ws.reshape(1, -1)\n",
"\n",
" def forward(self, x):\n",
" \"\"\"\n",
" Forward pass of DeepNarrowLNN\n",
"\n",
" Args:\n",
" x : np.array\n",
" Input features\n",
"\n",
" Returns:\n",
" y: np.array\n",
" Product of weights over input features\n",
" \"\"\"\n",
" y = np.prod(self.W) * x\n",
" return y\n",
"\n",
" def loss(self, y_t, y_p):\n",
" \"\"\"\n",
" Mean squared error (L2 loss)\n",
"\n",
" Args:\n",
" y_t : np.array\n",
" Targets\n",
" y_p : np.array\n",
" Network's predictions\n",
"\n",
" Returns:\n",
" mse: float\n",
" Mean squared error\n",
" \"\"\"\n",
" assert y_p.shape == y_t.shape\n",
" mse = ((y_t - y_p)**2 / 2).mean()\n",
" return mse\n",
"\n",
" def dloss_dw(self, x, y_t, y_p):\n",
" \"\"\"\n",
" Analytical gradient of weights\n",
"\n",
" Args:\n",
" x : np.array\n",
" Input features\n",
" y_t : np.array\n",
" Targets\n",
" y_p : np.array\n",
" Network Predictions\n",
"\n",
" Returns:\n",
" dW: np.ndarray\n",
" Analytical gradient of weights\n",
" \"\"\"\n",
" E = y_t - y_p # i.e., y_t - x * np.prod(self.W)\n",
" Ex = np.multiply(x, E).mean()\n",
" Wp = np.prod(self.W) / (self.W + 1e-9)\n",
" dW = - Ex * Wp\n",
" return dW\n",
"\n",
" def train(self, x, y_t, eta, n_epochs):\n",
" \"\"\"\n",
" Training using gradient descent\n",
"\n",
" Args:\n",
" x : np.array\n",
" Input Features\n",
" y_t : np.array\n",
" Targets\n",
" eta: float\n",
" Learning rate\n",
" n_epochs : int\n",
" Number of epochs\n",
"\n",
" Returns:\n",
" loss_records: np.ndarray\n",
" Log of loss over epochs\n",
" \"\"\"\n",
" loss_records = np.empty(n_epochs)\n",
" loss_records[:] = np.nan\n",
" for i in range(n_epochs):\n",
" y_p = self.forward(x)\n",
" loss_records[i] = self.loss(y_t, y_p).mean()\n",
" dloss_dw = self.dloss_dw(x, y_t, y_p)\n",
" if np.isnan(dloss_dw).any() or np.isinf(dloss_dw).any():\n",
" return loss_records\n",
" self.W -= eta * dloss_dw\n",
" return loss_records"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set random seed\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Executing `set_seed(seed=seed)` you are setting the seed\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"#@title Set random seed\n",
"\n",
"#@markdown Executing `set_seed(seed=seed)` you are setting the seed\n",
"\n",
"# For DL its critical to set the random seed so that students can have a\n",
"# baseline to compare their results to expected results.\n",
"# Read more here: https://pytorch.org/docs/stable/notes/randomness.html\n",
"\n",
"# Call `set_seed` function in the exercises to ensure reproducibility.\n",
"import random\n",
"import torch\n",
"\n",
"def set_seed(seed=None, seed_torch=True):\n",
" \"\"\"\n",
" Function that controls randomness. NumPy and random modules must be imported.\n",
"\n",
" Args:\n",
" seed : Integer\n",
" A non-negative integer that defines the random state. Default is `None`.\n",
" seed_torch : Boolean\n",
" If `True` sets the random seed for pytorch tensors, so pytorch module\n",
" must be imported. Default is `True`.\n",
"\n",
" Returns:\n",
" Nothing.\n",
" \"\"\"\n",
" if seed is None:\n",
" seed = np.random.choice(2 ** 32)\n",
" random.seed(seed)\n",
" np.random.seed(seed)\n",
" if seed_torch:\n",
" torch.manual_seed(seed)\n",
" torch.cuda.manual_seed_all(seed)\n",
" torch.cuda.manual_seed(seed)\n",
" torch.backends.cudnn.benchmark = False\n",
" torch.backends.cudnn.deterministic = True\n",
"\n",
" print(f'Random seed {seed} has been set.')\n",
"\n",
"\n",
"# In case that `DataLoader` is used\n",
"def seed_worker(worker_id):\n",
" \"\"\"\n",
" DataLoader will reseed workers following randomness in\n",
" multi-process data loading algorithm.\n",
"\n",
" Args:\n",
" worker_id: integer\n",
" ID of subprocess to seed. 0 means that\n",
" the data will be loaded in the main process\n",
" Refer: https://pytorch.org/docs/stable/data.html#data-loading-randomness for more details\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" worker_seed = torch.initial_seed() % 2**32\n",
" np.random.seed(worker_seed)\n",
" random.seed(worker_seed)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set device (GPU or CPU). Execute `set_device()`\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"#@title Set device (GPU or CPU). Execute `set_device()`\n",
"# especially if torch modules used.\n",
"\n",
"# Inform the user if the notebook uses GPU or CPU.\n",
"\n",
"def set_device():\n",
" \"\"\"\n",
" Set the device. CUDA if available, CPU otherwise\n",
"\n",
" Args:\n",
" None\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
" if device != \"cuda\":\n",
" print(\"GPU is not enabled in this notebook. \\n\"\n",
" \"If you want to enable it, in the menu under `Runtime` -> \\n\"\n",
" \"`Hardware accelerator.` and select `GPU` from the dropdown menu\")\n",
" else:\n",
" print(\"GPU is enabled in this notebook. \\n\"\n",
" \"If you want to disable it, in the menu under `Runtime` -> \\n\"\n",
" \"`Hardware accelerator.` and select `None` from the dropdown menu\")\n",
"\n",
" return device"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"SEED = 2021\n",
"set_seed(seed=SEED)\n",
"DEVICE = set_device()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Section 1: A Shallow Narrow Linear Neural Network\n",
"\n",
"*Time estimate: ~30 mins*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Video 1: Shallow Narrow Linear Net\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 1: Shallow Narrow Linear Net\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', '6e5JIYsqVvU'), ('Bilibili', 'BV1F44y117ot')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Shallow_Narrow_Linear_Net_Video\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 1.1: A Shallow Narrow Linear Net"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"To better understand the behavior of neural network training with gradient descent, we start with the incredibly simple case of a shallow narrow linear neural net, since state-of-the-art models are impossible to dissect and comprehend with our current mathematical tools.\n",
"\n",
"The model we use has one hidden layer, with only one neuron, and two weights. We consider the squared error (or L2 loss) as the cost function. As you may have already guessed, we can visualize the model as a neural network:\n",
"\n",
"