{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {},
"id": "view-in-github"
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"# Tutorial 1: Gradient Descent and AutoGrad\n",
"\n",
"**Week 1, Day 2: Linear Deep Learning**\n",
"\n",
"**By Neuromatch Academy**\n",
"\n",
"__Content creators:__ Saeed Salehi, Vladimir Haltakov, Andrew Saxe\n",
"\n",
"__Content reviewers:__ Polina Turishcheva, Antoine De Comite, Kelson Shilling-Scrivo\n",
"\n",
"__Content editors:__ Anoop Kulkarni, Spiros Chavlis\n",
"\n",
"__Production editors:__ Khalid Almubarak, Gagana B, Spiros Chavlis"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Tutorial Objectives\n",
"\n",
"Day 2 Tutorial 1 will continue on buiding PyTorch skillset and motivate its core functionality: Autograd. In this notebook, we will cover the key concepts and ideas of:\n",
"\n",
"* Gradient descent\n",
"* PyTorch Autograd\n",
"* PyTorch `nn` module"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @markdown\n",
"from IPython.display import IFrame\n",
"from ipywidgets import widgets\n",
"out = widgets.Output()\n",
"with out:\n",
" print(f\"If you want to download the slides: https://osf.io/download/3qevp/\")\n",
" display(IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/3qevp/?direct%26mode=render%26action=download%26mode=render\", width=730, height=410))\n",
"display(out)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Setup\n",
"\n",
"This a GPU-Free tutorial!\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install and import feedback gadget\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Install and import feedback gadget\n",
"\n",
"!pip3 install vibecheck datatops --quiet\n",
"\n",
"from vibecheck import DatatopsContentReviewContainer\n",
"def content_review(notebook_section: str):\n",
" return DatatopsContentReviewContainer(\n",
" \"\", # No text prompt\n",
" notebook_section,\n",
" {\n",
" \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n",
" \"name\": \"neuromatch_dl\",\n",
" \"user_key\": \"f379rz8y\",\n",
" },\n",
" ).render()\n",
"\n",
"\n",
"feedback_prefix = \"W1D2_T1\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"# Imports\n",
"import torch\n",
"import numpy as np\n",
"from torch import nn\n",
"from math import pi\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Figure settings\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Figure settings\n",
"import logging\n",
"logging.getLogger('matplotlib.font_manager').disabled = True\n",
"\n",
"import ipywidgets as widgets # Interactive display\n",
"%config InlineBackend.figure_format = 'retina'\n",
"plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/content-creation/main/nma.mplstyle\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotting functions\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Plotting functions\n",
"\n",
"from mpl_toolkits.axes_grid1 import make_axes_locatable\n",
"\n",
"def ex3_plot(model, x, y, ep, lss):\n",
" \"\"\"\n",
" Plot training loss\n",
"\n",
" Args:\n",
" model: nn.module\n",
" Model implementing regression\n",
" x: np.ndarray\n",
" Training Data\n",
" y: np.ndarray\n",
" Targets\n",
" ep: int\n",
" Number of epochs\n",
" lss: function\n",
" Loss function\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" f, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))\n",
" ax1.set_title(\"Regression\")\n",
" ax1.plot(x, model(x).detach().numpy(), color='r', label='prediction')\n",
" ax1.scatter(x, y, c='c', label='targets')\n",
" ax1.set_xlabel('x')\n",
" ax1.set_ylabel('y')\n",
" ax1.legend()\n",
"\n",
" ax2.set_title(\"Training loss\")\n",
" ax2.plot(np.linspace(1, epochs, epochs), losses, color='y')\n",
" ax2.set_xlabel(\"Epoch\")\n",
" ax2.set_ylabel(\"MSE\")\n",
"\n",
" plt.show()\n",
"\n",
"\n",
"def ex1_plot(fun_z, fun_dz):\n",
" \"\"\"\n",
" Plots the function and gradient vectors\n",
"\n",
" Args:\n",
" fun_z: f.__name__\n",
" Function implementing sine function\n",
" fun_dz: f.__name__\n",
" Function implementing sine function as gradient vector\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" x, y = np.arange(-3, 3.01, 0.02), np.arange(-3, 3.01, 0.02)\n",
" xx, yy = np.meshgrid(x, y, sparse=True)\n",
" zz = fun_z(xx, yy)\n",
" xg, yg = np.arange(-2.5, 2.6, 0.5), np.arange(-2.5, 2.6, 0.5)\n",
" xxg, yyg = np.meshgrid(xg, yg, sparse=True)\n",
" zxg, zyg = fun_dz(xxg, yyg)\n",
"\n",
" plt.figure(figsize=(8, 7))\n",
" plt.title(\"Gradient vectors point towards steepest ascent\")\n",
" contplt = plt.contourf(x, y, zz, levels=20)\n",
" plt.quiver(xxg, yyg, zxg, zyg, scale=50, color='r', )\n",
" plt.xlabel('$x$')\n",
" plt.ylabel('$y$')\n",
" ax = plt.gca()\n",
" divider = make_axes_locatable(ax)\n",
" cax = divider.append_axes(\"right\", size=\"5%\", pad=0.05)\n",
" cbar = plt.colorbar(contplt, cax=cax)\n",
" cbar.set_label('$z = h(x, y)$')\n",
"\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set random seed\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Executing `set_seed(seed=seed)` you are setting the seed\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Set random seed\n",
"\n",
"# @markdown Executing `set_seed(seed=seed)` you are setting the seed\n",
"\n",
"# For DL its critical to set the random seed so that students can have a\n",
"# baseline to compare their results to expected results.\n",
"# Read more here: https://pytorch.org/docs/stable/notes/randomness.html\n",
"\n",
"# Call `set_seed` function in the exercises to ensure reproducibility.\n",
"import random\n",
"import torch\n",
"\n",
"def set_seed(seed=None, seed_torch=True):\n",
" \"\"\"\n",
" Function that controls randomness. NumPy and random modules must be imported.\n",
"\n",
" Args:\n",
" seed : Integer\n",
" A non-negative integer that defines the random state. Default is `None`.\n",
" seed_torch : Boolean\n",
" If `True` sets the random seed for pytorch tensors, so pytorch module\n",
" must be imported. Default is `True`.\n",
"\n",
" Returns:\n",
" Nothing.\n",
" \"\"\"\n",
" if seed is None:\n",
" seed = np.random.choice(2 ** 32)\n",
" random.seed(seed)\n",
" np.random.seed(seed)\n",
" if seed_torch:\n",
" torch.manual_seed(seed)\n",
" torch.cuda.manual_seed_all(seed)\n",
" torch.cuda.manual_seed(seed)\n",
" torch.backends.cudnn.benchmark = False\n",
" torch.backends.cudnn.deterministic = True\n",
"\n",
" print(f'Random seed {seed} has been set.')\n",
"\n",
"\n",
"# In case that `DataLoader` is used\n",
"def seed_worker(worker_id):\n",
" \"\"\"\n",
" DataLoader will reseed workers following randomness in\n",
" multi-process data loading algorithm.\n",
"\n",
" Args:\n",
" worker_id: integer\n",
" ID of subprocess to seed. 0 means that\n",
" the data will be loaded in the main process\n",
" Refer: https://pytorch.org/docs/stable/data.html#data-loading-randomness for more details\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" worker_seed = torch.initial_seed() % 2**32\n",
" np.random.seed(worker_seed)\n",
" random.seed(worker_seed)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set device (GPU or CPU). Execute `set_device()`\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Set device (GPU or CPU). Execute `set_device()`\n",
"# especially if torch modules used.\n",
"\n",
"# inform the user if the notebook uses GPU or CPU.\n",
"\n",
"def set_device():\n",
" \"\"\"\n",
" Set the device. CUDA if available, CPU otherwise\n",
"\n",
" Args:\n",
" None\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
" if device != \"cuda\":\n",
" print(\"GPU is not enabled in this notebook. \\n\"\n",
" \"If you want to enable it, in the menu under `Runtime` -> \\n\"\n",
" \"`Hardware accelerator.` and select `GPU` from the dropdown menu\")\n",
" else:\n",
" print(\"GPU is enabled in this notebook. \\n\"\n",
" \"If you want to disable it, in the menu under `Runtime` -> \\n\"\n",
" \"`Hardware accelerator.` and select `None` from the dropdown menu\")\n",
"\n",
" return device"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"SEED = 2021\n",
"set_seed(seed=SEED)\n",
"DEVICE = set_device()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Section 0: Introduction"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Today, we will go through 3 tutorials.\n",
"\n",
"- Starting with Gradient Descent, the workhorse of deep learning algorithms.\n",
"- The second tutorial will help us build a better intuition about neural networks and basic hyper-parameters.\n",
"- Finally, in tutorial 3, we learn about the learning dynamics, what the (a good) deep network is learning, and why sometimes they may perform poorly."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Video 0: Introduction\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 0: Introduction\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'i7djAv2jnzY'), ('Bilibili', 'BV1Qf4y1578t')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Introduction_Video\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Section 1: Gradient Descent Algorithm\n",
"\n",
"*Time estimate: ~30-45 mins*"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Since the goal of most learning algorithms is **minimizing the risk (also known as the cost or loss) function**, optimization is often the core of most machine learning techniques! The gradient descent algorithm, along with its variations such as stochastic gradient descent, is one of the most powerful and popular optimization methods used for deep learning. Today we will introduce the basics, but you will learn much more about Optimization in the coming days (Week 1, Day 4)."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 1.1: Gradients & Steepest Ascent"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Video 1: Gradient Descent\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 1: Gradient Descent\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'UwgA_SgG0TM'), ('Bilibili', 'BV1Pq4y1p7em')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Gradient_Descent_Video\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Before introducing the gradient descent algorithm, let's review a very important property of gradients. The gradient of a function always points in the direction of the steepest ascent. The following exercise will help clarify this."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Analytical Exercise 1.1: Gradient vector (Optional)\n",
"\n",
"Given the following function:\n",
"\n",
"\\begin{equation}\n",
"z = h(x, y) = \\sin(x^2 + y^2)\n",
"\\end{equation}\n",
"\n",
"find the gradient vector:\n",
"\n",
"\\begin{equation}\n",
" \\begin{bmatrix}\n",
" \\dfrac{\\partial z}{\\partial x} \\\\ \\\\ \\dfrac{\\partial z}{\\partial y}\n",
" \\end{bmatrix}\n",
"\\end{equation}\n",
"\n",
"\n",
"*Hint: Use the chain rule!*\n",
"\n",
"**Chain rule**: For a composite function $F(x) = g(h(x)) \\equiv (g \\circ h)(x)$:\n",
"\n",
"\\begin{equation}\n",
"F'(x) = g'(h(x)) \\cdot h'(x)\n",
"\\end{equation}\n",
"\n",
"or differently denoted:\n",
"\n",
"\\begin{equation}\n",
"\\frac{dF}{dx} = \\frac{dg}{dh} ~ \\frac{dh}{dx}\n",
"\\end{equation}\n",
"\n",
"\n",
" Click here for the solution
\n",
"\n",
"We can rewrite the function as a composite function:\n",
"\n",
"\\begin{equation}\n",
"z = f\\left( g(x,y) \\right), ~~ f(u) = \\sin(u), ~~ g(x, y) = x^2 + y^2\n",
"\\end{equation}\n",
"\n",
"Using the [chain rule](https://en.wikipedia.org/wiki/Chain_rule):\n",
"\n",
"\\begin{align}\n",
"\\dfrac{\\partial z}{\\partial x} &= \\dfrac{\\partial f}{\\partial g} \\dfrac{\\partial g}{\\partial x} = \\cos(g(x,y)) ~ (2x) = \\cos(x^2 + y^2) \\cdot 2x \\\\ \\\\\n",
"\\dfrac{\\partial z}{\\partial y} &= \\dfrac{\\partial f}{\\partial g} \\dfrac{\\partial g}{\\partial y} = \\cos(g(x,y)) ~ (2y) = \\cos(x^2 + y^2) \\cdot 2y\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Gradient_Vector_Analytical_Exercise\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Coding Exercise 1.1: Gradient Vector\n",
"\n",
"Implement (complete) the function which returns the gradient vector for $z=\\sin(x^2 + y^2)$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"def fun_z(x, y):\n",
" \"\"\"\n",
" Implements function sin(x^2 + y^2)\n",
"\n",
" Args:\n",
" x: (float, np.ndarray)\n",
" Variable x\n",
" y: (float, np.ndarray)\n",
" Variable y\n",
"\n",
" Returns:\n",
" z: (float, np.ndarray)\n",
" sin(x^2 + y^2)\n",
" \"\"\"\n",
" z = np.sin(x**2 + y**2)\n",
" return z\n",
"\n",
"\n",
"def fun_dz(x, y):\n",
" \"\"\"\n",
" Implements function sin(x^2 + y^2)\n",
"\n",
" Args:\n",
" x: (float, np.ndarray)\n",
" Variable x\n",
" y: (float, np.ndarray)\n",
" Variable y\n",
"\n",
" Returns:\n",
" Tuple of gradient vector for sin(x^2 + y^2)\n",
" \"\"\"\n",
" #################################################\n",
" ## Implement the function which returns gradient vector\n",
" ## Complete the partial derivatives dz_dx and dz_dy\n",
" # Complete the function and remove or comment the line below\n",
" raise NotImplementedError(\"Gradient function `fun_dz`\")\n",
" #################################################\n",
" dz_dx = ...\n",
" dz_dy = ...\n",
" return (dz_dx, dz_dy)\n",
"\n",
"\n",
"## Uncomment to run\n",
"# ex1_plot(fun_z, fun_dz)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W1D2_LinearDeepLearning/solutions/W1D2_Tutorial1_Solution_25404147.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"We can see from the plot that for any given $x_0$ and $y_0$, the gradient vector $\\left[ \\dfrac{\\partial z}{\\partial x}, \\dfrac{\\partial z}{\\partial y}\\right]^{\\top}_{(x_0, y_0)}$ points in the direction of $x$ and $y$ for which $z$ increases the most. It is important to note that gradient vectors only see their local values, not the whole landscape! Also, length (size) of each vector, which indicates the steepness of the function, can be very small near local plateaus (i.e. minima or maxima).\n",
"\n",
"Thus, we can simply use the aforementioned formula to find the local minima.\n",
"\n",
"In 1847, Augustin-Louis Cauchy used **negative of gradients** to develop the Gradient Descent algorithm as an **iterative** method to **minimize** a **continuous** and (ideally) **differentiable function** of **many variables**."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Gradient_Vector_Exercise\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Video 2: Gradient Descent - Discussion\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 2: Gradient Descent - Discussion\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', '8s22ffAfGwI'), ('Bilibili', 'BV1Rf4y157bw')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Gradient_Descent_Discussion_Video\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 1.2: Gradient Descent Algorithm"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Let $f(\\mathbf{w}): \\mathbb{R}^d \\rightarrow \\mathbb{R}$ be a differentiable function. Gradient Descent is an iterative algorithm for minimizing the function $f$, starting with an initial value for variables $\\mathbf{w}$, taking steps of size $\\eta$ (learning rate) in the direction of the negative gradient at the current point to update the variables $\\mathbf{w}$.\n",
"\n",
"\\begin{equation}\n",
"\\mathbf{w}^{(t+1)} = \\mathbf{w}^{(t)} - \\eta \\nabla f \\left( \\mathbf{w}^{(t)} \\right)\n",
"\\end{equation}\n",
"\n",
"where $\\eta > 0$ and $\\nabla f (\\mathbf{w})= \\left( \\frac{\\partial f(\\mathbf{w})}{\\partial w_1}, ..., \\frac{\\partial f(\\mathbf{w})}{\\partial w_d} \\right)$. Since negative gradients always point locally in the direction of steepest descent, the algorithm makes small steps at each point **towards** the minimum.\n",
"\n",
"
\n",
"\n",
"**Vanilla Algorithm**\n",
"\n",
"---\n",
"> **Inputs:** initial guess $\\mathbf{w}^{(0)}$, step size $\\eta > 0$, number of steps $T$.\n",
"\n",
"> **For** $t = 0, 1, 2, \\dots , T-1$ **do** \\\n",
"$\\qquad$ $\\mathbf{w}^{(t+1)} = \\mathbf{w}^{(t)} - \\eta \\nabla f \\left( \\mathbf{w}^{(t)} \\right)$\\\n",
"**end**\n",
"\n",
"> **Return:** $\\mathbf{w}^{(t+1)}$\n",
"\n",
"---\n",
"\n",
"
\n",
"\n",
"Hence, all we need is to calculate the gradient of the loss function with respect to the learnable parameters (i.e., weights):\n",
"\n",
"\\begin{equation}\n",
"\\dfrac{\\partial Loss}{\\partial \\mathbf{w}} = \\left[ \\dfrac{\\partial Loss}{\\partial w_1}, \\dfrac{\\partial Loss}{\\partial w_2} , \\dots, \\dfrac{\\partial Loss}{\\partial w_d} \\right]^{\\top}\n",
"\\end{equation}"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Analytical Exercise 1.2: Gradients\n",
"\n",
"Given $f(x, y, z) = \\tanh \\left( \\ln \\left[1 + z \\frac{2x}{sin(y)} \\right] \\right)$, how easy is it to derive $\\dfrac{\\partial f}{\\partial x}$, $\\dfrac{\\partial f}{\\partial y}$ and $\\dfrac{\\partial f}{\\partial z}$?\n",
"\n",
"**Hint:** You don't have to actually calculate them!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Gradients_Analytical_Exercise\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 1.3: Computational Graphs and Backprop\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Video 3: Computational Graph\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 3: Computational Graph\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', '2z1YX5PonV4'), ('Bilibili', 'BV1c64y1B7ZG')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Computational_Graph_Video\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"*Exercise 1.2* is an example of how overwhelming the derivation of gradients can get, as the number of variables and nested functions increases. This function is still extraordinarily simple compared to the loss functions of modern neural networks. So how can we (as well as PyTorch and similar frameworks) approach such beasts?\n",
"\n",
"Let’s look at the function again:\n",
"\n",
"\\begin{equation}\n",
"f(x, y, z) = \\tanh \\left(\\ln \\left[1 + z \\frac{2x}{sin(y)} \\right] \\right)\n",
"\\end{equation}\n",
"\n",
"We can build a so-called computational graph (shown below) to break the original function into smaller and more approachable expressions.\n",
"\n",
"
\n",
"\n",
"Starting from $x$, $y$, and $z$ and following the arrows and expressions, you would see that our graph returns the same function as $f$. It does so by calculating intermediate variables $a,b,c,d,$ and $e$. This is called the **forward pass**.\n",
"\n",
"Now, let’s start from $f$, and work our way against the arrows while calculating the gradient of each expression as we go. This is called the **backward pass**, from which the **backpropagation of errors** algorithm gets its name.\n",
"\n",
"
\n",
"\n",
"By breaking the computation into simple operations on intermediate variables, we can use the chain rule to calculate any gradient:\n",
"\n",
"\\begin{equation}\n",
"\\dfrac{\\partial f}{\\partial x} = \\dfrac{\\partial f}{\\partial e}~\\dfrac{\\partial e}{\\partial d}~\\dfrac{\\partial d}{\\partial c}~\\dfrac{\\partial c}{\\partial a}~\\dfrac{\\partial a}{\\partial x} = \\left( 1-\\tanh^2(e) \\right) \\cdot \\frac{1}{d+1}\\cdot z \\cdot \\frac{1}{b} \\cdot 2\n",
"\\end{equation}\n",
"\n",
"Conveniently, the values for $e$, $b$, and $d$ are available to us from when we did the forward pass through the graph. That is, the partial derivatives have simple expressions in terms of the intermediate variables $a,b,c,d,e$ that we calculated and stored during the forward pass."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Analytical Exercise 1.3: Chain Rule (Optional)\n",
"\n",
"For the function above, calculate the $\\dfrac{\\partial f}{\\partial y}$ using the computational graph and chain rule.\n",
"\n",
"\n",
" Click here for the solution
\n",
"\n",
"\\begin{equation}\n",
"\\dfrac{\\partial f}{\\partial y} = \\dfrac{\\partial f}{\\partial e}~\\dfrac{\\partial e}{\\partial d}~\\dfrac{\\partial d}{\\partial c}~\\dfrac{\\partial c}{\\partial b}~\\dfrac{\\partial b}{\\partial y} = \\left( 1-\\tanh^2(e) \\right) \\cdot \\frac{1}{d+1}\\cdot z \\cdot \\frac{-a}{b^2} \\cdot \\cos(y)\n",
"\\end{equation}"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"For more: [Calculus on Computational Graphs: Backpropagation](https://colah.github.io/posts/2015-08-Backprop/)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Chain_Rule_Analytical_Exercise\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Section 2: PyTorch AutoGrad\n",
"\n",
"*Time estimate: ~30-45 mins*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Video 4: Auto-Differentiation\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 4: Auto-Differentiation\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'IBYFCNyBcF8'), ('Bilibili', 'BV1UP4y1s7gv')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_AutoDifferentiation_Video\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Deep learning frameworks such as PyTorch, JAX, and TensorFlow come with a very efficient and sophisticated set of algorithms, commonly known as Automatic Differentiation. AutoGrad is PyTorch's automatic differentiation engine. Here we start by covering the essentials of AutoGrad, and you will learn more in the coming days."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 2.1: Forward Propagation\n",
"\n",
"Everything starts with the forward propagation (pass). PyTorch tracks all the instructions, as we declare the variables and operations, and it builds the graph when we call the `.backward()` pass. PyTorch rebuilds the graph every time we iterate or change it (or simply put, PyTorch uses a dynamic graph).\n",
"\n",
"For gradient descent, it is only required to have the gradients of cost function with respect to the variables we wish to learn. These variables are often called \"learnable / trainable parameters\" or simply \"parameters\" in PyTorch. In neural nets, weights and biases are often the learnable parameters."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Coding Exercise 2.1: Buiding a Computational Graph\n",
"\n",
"In PyTorch, to indicate that a certain tensor contains learnable parameters, we can set the optional argument `requires_grad` to `True`. PyTorch will then track every operation using this tensor while configuring the computational graph. For this exercise, use the provided tensors to build the following graph, which implements a single neuron with scalar input and output.\n",
"\n",
"
\n",
"\n",
"
"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"class SimpleGraph:\n",
" \"\"\"\n",
" Implementing Simple Computational Graph\n",
" \"\"\"\n",
"\n",
" def __init__(self, w, b):\n",
" \"\"\"\n",
" Initializing the SimpleGraph\n",
"\n",
" Args:\n",
" w: float\n",
" Initial value for weight\n",
" b: float\n",
" Initial value for bias\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" assert isinstance(w, float)\n",
" assert isinstance(b, float)\n",
" self.w = torch.tensor([w], requires_grad=True)\n",
" self.b = torch.tensor([b], requires_grad=True)\n",
"\n",
" def forward(self, x):\n",
" \"\"\"\n",
" Forward pass\n",
"\n",
" Args:\n",
" x: torch.Tensor\n",
" 1D tensor of features\n",
"\n",
" Returns:\n",
" prediction: torch.Tensor\n",
" Model predictions\n",
" \"\"\"\n",
" assert isinstance(x, torch.Tensor)\n",
" #################################################\n",
" ## Implement the the forward pass to calculate prediction\n",
" ## Note that prediction is not the loss, but the value after `tanh`\n",
" # Complete the function and remove or comment the line below\n",
" raise NotImplementedError(\"Forward Pass `forward`\")\n",
" #################################################\n",
" prediction = ...\n",
" return prediction\n",
"\n",
"\n",
"def sq_loss(y_true, y_prediction):\n",
" \"\"\"\n",
" L2 loss function\n",
"\n",
" Args:\n",
" y_true: torch.Tensor\n",
" 1D tensor of target labels\n",
" y_prediction: torch.Tensor\n",
" 1D tensor of predictions\n",
"\n",
" Returns:\n",
" loss: torch.Tensor\n",
" L2-loss (squared error)\n",
" \"\"\"\n",
" assert isinstance(y_true, torch.Tensor)\n",
" assert isinstance(y_prediction, torch.Tensor)\n",
" #################################################\n",
" ## Implement the L2-loss (squred error) given true label and prediction\n",
" # Complete the function and remove or comment the line below\n",
" raise NotImplementedError(\"Loss function `sq_loss`\")\n",
" #################################################\n",
" loss = ...\n",
" return loss\n",
"\n",
"\n",
"\n",
"feature = torch.tensor([1]) # Input tensor\n",
"target = torch.tensor([7]) # Target tensor\n",
"## Uncomment to run\n",
"# simple_graph = SimpleGraph(-0.5, 0.5)\n",
"# print(f\"initial weight = {simple_graph.w.item()}, \"\n",
"# f\"\\ninitial bias = {simple_graph.b.item()}\")\n",
"# prediction = simple_graph.forward(feature)\n",
"# square_loss = sq_loss(target, prediction)\n",
"# print(f\"for x={feature.item()} and y={target.item()}, \"\n",
"# f\"prediction={prediction.item()}, and L2 Loss = {square_loss.item()}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W1D2_LinearDeepLearning/solutions/W1D2_Tutorial1_Solution_8510dab4.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"It is important to appreciate the fact that PyTorch can follow our operations as we arbitrarily go through classes and functions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Building_a_computational_graph_Exercise\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 2.2: Backward Propagation\n",
"\n",
"Here is where all the magic lies. In PyTorch, `Tensor` and `Function` are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each variable has a `grad_fn` attribute that references a function that has created the Tensor (except for Tensors created by the user - these have `None` as `grad_fn`). The example below shows that the tensor `c = a + b` is created by the `Add` operation and the gradient function is the object ``. Replace `+` with other single operations (e.g., `c = a * b` or `c = torch.sin(a)`) and examine the results."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"a = torch.tensor([1.0], requires_grad=True)\n",
"b = torch.tensor([-1.0], requires_grad=True)\n",
"c = a + b\n",
"print(f'Gradient function = {c.grad_fn}')"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"For more complex functions, printing the `grad_fn` would only show the last operation, even though the object tracks all the operations up to that point:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"print(f'Gradient function for prediction = {prediction.grad_fn}')\n",
"print(f'Gradient function for loss = {square_loss.grad_fn}')"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Now let's kick off the backward pass to calculate the gradients by calling `.backward()` on the tensor we wish to initiate the backpropagation from. Often, `.backward()` is called on the loss, which is the last node on the graph. Before doing that, let's calculate the loss gradients by hand:\n",
"\n",
"$$\\frac{\\partial{loss}}{\\partial{w}} = - 2 x (y_t - y_p)(1 - y_p^2)$$\n",
"\n",
"$$\\frac{\\partial{loss}}{\\partial{b}} = - 2 (y_t - y_p)(1 - y_p^2)$$\n",
"\n",
"Where $y_t$ is the target (true label), and $y_p$ is the prediction (model output). We can then compare it to PyTorch gradients, which can be obtained by calling `.grad` on the relevant tensors.\n",
"\n",
"**Important Notes:**\n",
"* Learnable parameters (i.e. `requires_grad` tensors) are \"contagious\". Let's look at a simple example: `Y = W @ X`, where `X` is the feature tensors and `W` is the weight tensor (learnable parameters, `requires_grad`), the newly generated output tensor `Y` will be also `requires_grad`. So any operation that is applied to `Y` will be part of the computational graph. Therefore, if we need to plot or store a tensor that is `requires_grad`, we must first `.detach()` it from the graph by calling the `.detach()` method on that tensor.\n",
"\n",
"* `.backward()` accumulates gradients in the leaf nodes (i.e., the input nodes to the node of interest). We can call `.zero_grad()` on the loss or optimizer to zero out all `.grad` attributes (see [autograd.backward](https://pytorch.org/docs/stable/autograd.html#torch.autograd.backward) for more information).\n",
"\n",
"* Recall that in python we can access variables and associated methods with `.method_name`. You can use the command `dir(my_object)` to observe all variables and associated methods to your object, e.g., `dir(simple_graph.w)`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"# Analytical gradients (Remember detaching)\n",
"ana_dloss_dw = - 2 * feature * (target - prediction.detach())*(1 - prediction.detach()**2)\n",
"ana_dloss_db = - 2 * (target - prediction.detach())*(1 - prediction.detach()**2)\n",
"\n",
"square_loss.backward() # First we should call the backward to build the graph\n",
"autograd_dloss_dw = simple_graph.w.grad # We calculate the derivative w.r.t weights\n",
"autograd_dloss_db = simple_graph.b.grad # We calculate the derivative w.r.t bias\n",
"\n",
"print(ana_dloss_dw == autograd_dloss_dw)\n",
"print(ana_dloss_db == autograd_dloss_db)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### References and more\n",
"\n",
"* [A gentle introduction to torch.autograd](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html)\n",
"\n",
"* [Automatic Differentiation package - torch.autograd](https://pytorch.org/docs/stable/autograd.html)\n",
"\n",
"* [Autograd mechanics](https://pytorch.org/docs/stable/notes/autograd.html)\n",
"\n",
"* [Automatic Differentiation with torch.autograd](https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Section 3: PyTorch's Neural Net module (`nn.Module`)\n",
"\n",
"*Time estimate: ~30 mins*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Video 5: PyTorch `nn` module\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 5: PyTorch `nn` module\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'jzTbQACq7KE'), ('Bilibili', 'BV1MU4y1H7WH')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Pytorch_nn_module_Video\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"PyTorch provides us with ready-to-use neural network building blocks, such as layers (e.g., linear, recurrent, etc.), different activation and loss functions, and much more, packed in the [`torch.nn`](https://pytorch.org/docs/stable/nn.html) module. If we build a neural network using `torch.nn` layers, the weights and biases are already in `requires_grad` mode and will be registered as model parameters.\n",
"\n",
"For training, we need three things:\n",
"\n",
"* **Model parameters:** Model parameters refer to all the learnable parameters of the model, which are accessible by calling `.parameters()` on the model. Please note that NOT all the `requires_grad` tensors are seen as model parameters. To create a custom model parameter, we can use [`nn.Parameter`](https://pytorch.org/docs/stable/generated/torch.nn.parameter.Parameter.html) (*A kind of Tensor that is to be considered a module parameter*).\n",
"\n",
"* **Loss function:** The loss that we are going to be optimizing, which is often combined with regularization terms (coming up in few days).\n",
"\n",
"* **Optimizer:** PyTorch provides us with many optimization methods (different versions of gradient descent). Optimizer holds the current state of the model and by calling the `step()` method, will update the parameters based on the computed gradients.\n",
"\n",
"You will learn more details about choosing the right model architecture, loss function, and optimizer later in the course."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 3.1: Training loop in PyTorch\n",
"\n",
"We use a regression problem to study the training loop in PyTorch.\n",
"\n",
"The task is to train a wide nonlinear (using $\\tanh$ activation function) neural net for a simple $\\sin$ regression task. Wide neural networks are thought to be really good at generalization."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" #### Generate the sample dataset\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @markdown #### Generate the sample dataset\n",
"set_seed(seed=SEED)\n",
"n_samples = 32\n",
"inputs = torch.linspace(-1.0, 1.0, n_samples).reshape(n_samples, 1)\n",
"noise = torch.randn(n_samples, 1) / 4\n",
"targets = torch.sin(pi * inputs) + noise\n",
"plt.figure(figsize=(8, 5))\n",
"plt.scatter(inputs, targets, c='c')\n",
"plt.xlabel('x (inputs)')\n",
"plt.ylabel('y (targets)')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Let's define a very wide (512 neurons) neural net with one hidden layer and `nn.Tanh()` activation function."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"class WideNet(nn.Module):\n",
" \"\"\"\n",
" A Wide neural network with a single hidden layer\n",
" Structure is as follows:\n",
" nn.Sequential(\n",
" nn.Linear(1, n_cells) + nn.Tanh(), # Fully connected layer with tanh activation\n",
" nn.Linear(n_cells, 1) # Final fully connected layer\n",
" )\n",
" \"\"\"\n",
"\n",
" def __init__(self):\n",
" \"\"\"\n",
" Initializing the parameters of WideNet\n",
"\n",
" Args:\n",
" None\n",
"\n",
" Returns:\n",
" Nothing\n",
" \"\"\"\n",
" n_cells = 512\n",
" super().__init__()\n",
" self.layers = nn.Sequential(\n",
" nn.Linear(1, n_cells),\n",
" nn.Tanh(),\n",
" nn.Linear(n_cells, 1),\n",
" )\n",
"\n",
" def forward(self, x):\n",
" \"\"\"\n",
" Forward pass of WideNet\n",
"\n",
" Args:\n",
" x: torch.Tensor\n",
" 2D tensor of features\n",
"\n",
" Returns:\n",
" Torch tensor of model predictions\n",
" \"\"\"\n",
" return self.layers(x)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"We can now create an instance of our neural net and print its parameters."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"# Creating an instance\n",
"set_seed(seed=SEED)\n",
"wide_net = WideNet()\n",
"print(wide_net)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"# Create a mse loss function\n",
"loss_function = nn.MSELoss()\n",
"\n",
"# Stochstic Gradient Descent optimizer (you will learn about momentum soon)\n",
"lr = 0.003 # Learning rate\n",
"sgd_optimizer = torch.optim.SGD(wide_net.parameters(), lr=lr, momentum=0.9)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"The training process in PyTorch is interactive - you can perform training iterations as you wish and inspect the results after each iteration.\n",
"\n",
"Let's perform one training iteration. You can run the cell multiple times and see how the parameters are being updated and the loss is reducing. This code block is the core of everything to come: please make sure you go line-by-line through all the commands and discuss their purpose with your pod."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"# Reset all gradients to zero\n",
"sgd_optimizer.zero_grad()\n",
"\n",
"# Forward pass (Compute the output of the model on the features (inputs))\n",
"prediction = wide_net(inputs)\n",
"\n",
"# Compute the loss\n",
"loss = loss_function(prediction, targets)\n",
"print(f'Loss: {loss.item()}')\n",
"\n",
"# Perform backpropagation to build the graph and compute the gradients\n",
"loss.backward()\n",
"\n",
"# Optimizer takes a tiny step in the steepest direction (negative of gradient)\n",
"# and \"updates\" the weights and biases of the network\n",
"sgd_optimizer.step()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Coding Exercise 3.1: Training Loop\n",
"\n",
"Using everything we've learned so far, we ask you to complete the `train` function below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"def train(features, labels, model, loss_fun, optimizer, n_epochs):\n",
" \"\"\"\n",
" Training function\n",
"\n",
" Args:\n",
" features: torch.Tensor\n",
" Features (input) with shape torch.Size([n_samples, 1])\n",
" labels: torch.Tensor\n",
" Labels (targets) with shape torch.Size([n_samples, 1])\n",
" model: torch nn.Module\n",
" The neural network\n",
" loss_fun: function\n",
" Loss function\n",
" optimizer: function\n",
" Optimizer\n",
" n_epochs: int\n",
" Number of training iterations\n",
"\n",
" Returns:\n",
" loss_record: list\n",
" Record (evolution) of training losses\n",
" \"\"\"\n",
" loss_record = [] # Keeping recods of loss\n",
"\n",
" for i in range(n_epochs):\n",
" #################################################\n",
" ## Implement the missing parts of the training loop\n",
" # Complete the function and remove or comment the line below\n",
" raise NotImplementedError(\"Training loop `train`\")\n",
" #################################################\n",
" ... # Set gradients to 0\n",
" predictions = ... # Compute model prediction (output)\n",
" loss = ... # Compute the loss\n",
" ... # Compute gradients (backward pass)\n",
" ... # Update parameters (optimizer takes a step)\n",
"\n",
" loss_record.append(loss.item())\n",
" return loss_record\n",
"\n",
"\n",
"\n",
"set_seed(seed=2021)\n",
"epochs = 1847 # Cauchy, Exercices d'analyse et de physique mathematique (1847)\n",
"## Uncomment to run\n",
"# losses = train(inputs, targets, wide_net, loss_function, sgd_optimizer, epochs)\n",
"# ex3_plot(wide_net, inputs, targets, epochs, losses)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W1D2_LinearDeepLearning/solutions/W1D2_Tutorial1_Solution_5901bdf4.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Training_loop_Exercise\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Summary"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"In this tutorial, we covered one of the most basic concepts of deep learning; the computational graph and how a network learns via gradient descent and the backpropagation algorithm. We have seen all of these using PyTorch modules and we compared the analytical solutions with the ones provided directly by the PyTorch module."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Video 6: Tutorial 1 Wrap-up\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 6: Tutorial 1 Wrap-up\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'TvZURbcnXc4'), ('Bilibili', 'BV1Pg41177VU')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Tutorial_1_WrapUp_Video\")"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"include_colab_link": true,
"name": "W1D2_Tutorial1",
"provenance": [],
"toc_visible": true
},
"kernel": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 0
}