{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {}, "id": "view-in-github" }, "source": [ "\"Open   \"Open" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Tutorial 1: Variational Autoencoders (VAEs)\n", "\n", "**Week 2, Day 4: Generative Models**\n", "\n", "**By Neuromatch Academy**\n", "\n", "__Content creators:__ Saeed Salehi, Spiros Chavlis, Vikash Gilja\n", "\n", "__Content reviewers:__ Diptodip Deb, Kelson Shilling-Scrivo\n", "\n", "__Content editor:__ Charles J Edelson, Spiros Chavlis\n", "\n", "__Production editors:__ Saeed Salehi, Gagana B, Spiros Chavlis\n", "\n", "\n", "
\n", "\n", "*Inspired from UPenn course*:\n", "__Instructor:__ Konrad Kording, __Original Content creators:__ Richard Lange, Arash Ash" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Tutorial Objectives\n", "In the first tutorial of the *Generative Models* day, we are going to\n", "\n", "- Think about unsupervised learning / Generative Models and get a bird's eye view of why it is useful\n", "- Build intuition about latent variables\n", "- See the connection between AutoEncoders and PCA\n", "- Start thinking about neural networks as generative models by contrasting AutoEncoders and Variational AutoEncoders" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @markdown\n", "from IPython.display import IFrame\n", "from ipywidgets import widgets\n", "out = widgets.Output()\n", "with out:\n", " print(f\"If you want to download the slides: https://osf.io/download/rd7ng/\")\n", " display(IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/rd7ng/?direct%26mode=render%26action=download%26mode=render\", width=730, height=410))\n", "display(out)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install dependencies\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " #### Please ignore *errors* and/or *warnings* during installation.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Install dependencies\n", "# @markdown #### Please ignore *errors* and/or *warnings* during installation.\n", "!pip install pytorch-pretrained-biggan --quiet\n", "!pip install Pillow libsixel-python --quiet" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install and import feedback gadget\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Install and import feedback gadget\n", "\n", "!pip3 install vibecheck datatops --quiet\n", "\n", "from vibecheck import DatatopsContentReviewContainer\n", "def content_review(notebook_section: str):\n", " return DatatopsContentReviewContainer(\n", " \"\", # No text prompt\n", " notebook_section,\n", " {\n", " \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n", " \"name\": \"neuromatch_dl\",\n", " \"user_key\": \"f379rz8y\",\n", " },\n", " ).render()\n", "\n", "\n", "feedback_prefix = \"W2D4_T1\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# Imports\n", "import torch\n", "import random\n", "\n", "import numpy as np\n", "import matplotlib.pylab as plt\n", "\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "from torch.utils.data import DataLoader\n", "\n", "import torchvision\n", "from torchvision import datasets, transforms\n", "\n", "from pytorch_pretrained_biggan import one_hot_from_names\n", "\n", "from tqdm.notebook import tqdm, trange" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Figure settings\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Figure settings\n", "import logging\n", "logging.getLogger('matplotlib.font_manager').disabled = True\n", "\n", "import ipywidgets as widgets\n", "from ipywidgets import FloatSlider, IntSlider, HBox, Layout, VBox\n", "from ipywidgets import interactive_output, Dropdown\n", "\n", "%config InlineBackend.figure_format = 'retina'\n", "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/content-creation/main/nma.mplstyle\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Helper functions\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Helper functions\n", "\n", "\n", "def image_moments(image_batches, n_batches=None):\n", " \"\"\"\n", " Compute mean and covariance of all pixels\n", " from batches of images\n", "\n", " Args:\n", " Image_batches: tuple\n", " Image batches\n", " n_batches: int\n", " Number of Batch size\n", "\n", " Returns:\n", " m1: float\n", " Mean of all pixels\n", " cov: float\n", " Covariance of all pixels\n", " \"\"\"\n", " m1, m2 = torch.zeros((), device=DEVICE), torch.zeros((), device=DEVICE)\n", " n = 0\n", " for im in tqdm(image_batches, total=n_batches, leave=False,\n", " desc='Computing pixel mean and covariance...'):\n", " im = im.to(DEVICE)\n", " b = im.size()[0]\n", " im = im.view(b, -1)\n", " m1 = m1 + im.sum(dim=0)\n", " m2 = m2 + (im.view(b,-1,1) * im.view(b,1,-1)).sum(dim=0)\n", " n += b\n", " m1, m2 = m1/n, m2/n\n", " cov = m2 - m1.view(-1,1)*m1.view(1,-1)\n", " return m1.cpu(), cov.cpu()\n", "\n", "\n", "def interpolate(A, B, num_interps):\n", " \"\"\"\n", " Function to interpolate between images.\n", " It does this by linearly interpolating between the\n", " probability of each category you select and linearly\n", " interpolating between the latent vector values.\n", "\n", " Args:\n", " A: list\n", " List of categories\n", " B: list\n", " List of categories\n", " num_interps: int\n", " Quantity of pixel grids\n", "\n", " Returns:\n", " Interpolated np.ndarray\n", " \"\"\"\n", " if A.shape != B.shape:\n", " raise ValueError('A and B must have the same shape to interpolate.')\n", " alphas = np.linspace(0, 1, num_interps)\n", " return np.array([(1-a)*A + a*B for a in alphas])\n", "\n", "\n", "def kl_q_p(zs, phi):\n", " \"\"\"\n", " Given [b,n,k] samples of z drawn\n", " from q, compute estimate of KL(q||p).\n", " phi must be size [b,k+1]\n", " This uses mu_p = 0 and sigma_p = 1,\n", " which simplifies the log(p(zs)) term to\n", " just -1/2*(zs**2)\n", "\n", " Args:\n", " zs: list\n", " Samples\n", " phi: list\n", " Relative entropy\n", "\n", " Returns:\n", " Size of log_q and log_p is [b,n,k].\n", " Sum along [k] but mean along [b,n]\n", " \"\"\"\n", " b, n, k = zs.size()\n", " mu_q, log_sig_q = phi[:,:-1], phi[:,-1]\n", " log_p = -0.5*(zs**2)\n", " log_q = -0.5*(zs - mu_q.view(b,1,k))**2 / log_sig_q.exp().view(b,1,1)**2 - log_sig_q.view(b,1,-1)\n", " # Size of log_q and log_p is [b,n,k].\n", " # Sum along [k] but mean along [b,n]\n", " return (log_q - log_p).sum(dim=2).mean(dim=(0,1))\n", "\n", "\n", "def log_p_x(x, mu_xs, sig_x):\n", " \"\"\"\n", " Given [batch, ...] input x and\n", " [batch, n, ...] reconstructions, compute\n", " pixel-wise log Gaussian probability\n", " Sum over pixel dimensions, but mean over batch\n", " and samples.\n", "\n", " Args:\n", " x: np.ndarray\n", " Input Data\n", " mu_xs: np.ndarray\n", " Log of mean of samples\n", " sig_x: np.ndarray\n", " Log of standard deviation\n", "\n", " Returns:\n", " Mean over batch and samples.\n", " \"\"\"\n", " b, n = mu_xs.size()[:2]\n", " # Flatten out pixels and add a singleton\n", " # dimension [1] so that x will be\n", " # implicitly expanded when combined with mu_xs\n", " x = x.reshape(b, 1, -1)\n", " _, _, p = x.size()\n", " squared_error = (x - mu_xs.view(b, n, -1))**2 / (2*sig_x**2)\n", "\n", " # Size of squared_error is [b,n,p]. log prob is\n", " # by definition sum over [p].\n", " # Expected value requires mean over [n].\n", " # Handling different size batches\n", " # requires mean over [b].\n", " return -(squared_error + torch.log(sig_x)).sum(dim=2).mean(dim=(0,1))\n", "\n", "\n", "def pca_encoder_decoder(mu, cov, k):\n", " \"\"\"\n", " Compute encoder and decoder matrices\n", " for PCA dimensionality reduction\n", "\n", " Args:\n", " mu: np.ndarray\n", " Mean\n", " cov: float\n", " Covariance\n", " k: int\n", " Dimensionality\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " mu = mu.view(1,-1)\n", " u, s, v = torch.svd_lowrank(cov, q=k)\n", " W_encode = v / torch.sqrt(s)\n", " W_decode = u * torch.sqrt(s)\n", "\n", " def pca_encode(x):\n", " \"\"\"\n", " Encoder: Subtract mean image and\n", " project onto top K eigenvectors of\n", " the data covariance\n", "\n", " Args:\n", " x: torch.tensor\n", " Input data\n", "\n", " Returns:\n", " PCA Encoding\n", " \"\"\"\n", " return (x.view(-1,mu.numel()) - mu) @ W_encode\n", "\n", " def pca_decode(h):\n", " \"\"\"\n", " Decoder: un-project then add back in the mean\n", "\n", " Args:\n", " h: torch.tensor\n", " Hidden layer data\n", "\n", " Returns:\n", " PCA Decoding\n", " \"\"\"\n", " return (h @ W_decode.T) + mu\n", "\n", " return pca_encode, pca_decode\n", "\n", "\n", "def cout(x, layer):\n", " \"\"\"\n", " Unnecessarily complicated but complete way to\n", " calculate the output depth, height\n", " and width size for a Conv2D layer\n", "\n", " Args:\n", " x: tuple\n", " Input size (depth, height, width)\n", " layer: nn.Conv2d\n", " The Conv2D layer\n", "\n", " Returns:\n", " Tuple of out-depth/out-height and out-width\n", " Output shape as given in [Ref]\n", " Ref:\n", " https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html\n", " \"\"\"\n", " assert isinstance(layer, nn.Conv2d)\n", " p = layer.padding if isinstance(layer.padding, tuple) else (layer.padding,)\n", " k = layer.kernel_size if isinstance(layer.kernel_size, tuple) else (layer.kernel_size,)\n", " d = layer.dilation if isinstance(layer.dilation, tuple) else (layer.dilation,)\n", " s = layer.stride if isinstance(layer.stride, tuple) else (layer.stride,)\n", " in_depth, in_height, in_width = x\n", " out_depth = layer.out_channels\n", " out_height = 1 + (in_height + 2 * p[0] - (k[0] - 1) * d[0] - 1) // s[0]\n", " out_width = 1 + (in_width + 2 * p[-1] - (k[-1] - 1) * d[-1] - 1) // s[-1]\n", " return (out_depth, out_height, out_width)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting functions\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Plotting functions\n", "\n", "def plot_gen_samples_ppca(therm1, therm2, therm_data_sim):\n", " \"\"\"\n", " Plotting generated samples\n", "\n", " Args:\n", " therm1: list\n", " Thermometer 1\n", " them2: list\n", " Thermometer 2\n", " therm_data_sim: list\n", " Generated (simulate, draw) `n_samples` from pPCA model\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " plt.plot(therm1, therm2, '.', c='c', label='training data')\n", " plt.plot(therm_data_sim[0], therm_data_sim[1], '.', c='m', label='\"generated\" data')\n", " plt.axis('equal')\n", " plt.xlabel('Thermometer 1 ($^\\circ$C)')\n", " plt.ylabel('Thermometer 2 ($^\\circ$C)')\n", " plt.legend()\n", " plt.show()\n", "\n", "\n", "def plot_linear_ae(lin_losses):\n", " \"\"\"\n", " Plotting linear autoencoder\n", "\n", " Args:\n", " lin_losses: list\n", " Log of linear autoencoder MSE losses\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " plt.figure()\n", " plt.plot(lin_losses)\n", " plt.ylim([0, 2*torch.as_tensor(lin_losses).median()])\n", " plt.xlabel('Training batch')\n", " plt.ylabel('MSE Loss')\n", " plt.show()\n", "\n", "\n", "def plot_conv_ae(lin_losses, conv_losses):\n", " \"\"\"\n", " Plotting convolutional autoencoder\n", "\n", " Args:\n", " lin_losses: list\n", " Log of linear autoencoder MSE losses\n", " conv_losses: list\n", " Log of convolutional model MSe losses\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " plt.figure()\n", " plt.plot(lin_losses)\n", " plt.plot(conv_losses)\n", " plt.legend(['Lin AE', 'Conv AE'])\n", " plt.xlabel('Training batch')\n", " plt.ylabel('MSE Loss')\n", " plt.ylim([0,\n", " 2*max(torch.as_tensor(conv_losses).median(),\n", " torch.as_tensor(lin_losses).median())])\n", " plt.show()\n", "\n", "\n", "def plot_images(images, h=3, w=3, plt_title=''):\n", " \"\"\"\n", " Helper function to plot images\n", "\n", " Args:\n", " images: torch.tensor\n", " Images\n", " h: int\n", " Image height\n", " w: int\n", " Image width\n", " plt_title: string\n", " Plot title\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " plt.figure(figsize=(h*2, w*2))\n", " plt.suptitle(plt_title, y=1.03)\n", " for i in range(h*w):\n", " plt.subplot(h, w, i + 1)\n", " plot_torch_image(images[i])\n", " plt.axis('off')\n", " plt.show()\n", "\n", "def plot_phi(phi, num=4):\n", " \"\"\"\n", " Contour plot of relative entropy across samples\n", "\n", " Args:\n", " phi: list\n", " Log of relative entropu changes\n", " num: int\n", " Number of interations\n", " \"\"\"\n", " plt.figure(figsize=(12, 3))\n", " for i in range(num):\n", " plt.subplot(1, num, i + 1)\n", " plt.scatter(zs[i, :, 0], zs[i, :, 1], marker='.')\n", " th = torch.linspace(0, 6.28318, 100)\n", " x, y = torch.cos(th), torch.sin(th)\n", " # Draw 2-sigma contours\n", " plt.plot(\n", " 2*x*phi[i, 2].exp().item() + phi[i, 0].item(),\n", " 2*y*phi[i, 2].exp().item() + phi[i, 1].item()\n", " )\n", " plt.xlim(-5, 5)\n", " plt.ylim(-5, 5)\n", " plt.grid()\n", " plt.axis('equal')\n", " plt.suptitle('If rsample() is correct, then most but not all points should lie in the circles')\n", " plt.show()\n", "\n", "\n", "def plot_torch_image(image, ax=None):\n", " \"\"\"\n", " Helper function to plot torch image\n", "\n", " Args:\n", " image: torch.tensor\n", " Image\n", " ax: plt object\n", " If None, plt.gca()\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " ax = ax if ax is not None else plt.gca()\n", " c, h, w = image.size()\n", " if c==1:\n", " cm = 'gray'\n", " else:\n", " cm = None\n", "\n", " # Torch images have shape (channels, height, width)\n", " # but matplotlib expects\n", " # (height, width, channels) or just\n", " # (height,width) when grayscale\n", " im_plt = torch.clip(image.detach().cpu().permute(1,2,0).squeeze(), 0.0, 1.0)\n", " ax.imshow(im_plt, cmap=cm)\n", " ax.set_xticks([])\n", " ax.set_yticks([])\n", " ax.spines['right'].set_visible(False)\n", " ax.spines['top'].set_visible(False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set random seed\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Executing `set_seed(seed=seed)` you are setting the seed\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Set random seed\n", "\n", "# @markdown Executing `set_seed(seed=seed)` you are setting the seed\n", "\n", "# For DL its critical to set the random seed so that students can have a\n", "# baseline to compare their results to expected results.\n", "# Read more here: https://pytorch.org/docs/stable/notes/randomness.html\n", "\n", "# Call `set_seed` function in the exercises to ensure reproducibility.\n", "import random\n", "import torch\n", "\n", "def set_seed(seed=None, seed_torch=True):\n", " \"\"\"\n", " Function that controls randomness. NumPy and random modules must be imported.\n", "\n", " Args:\n", " seed : Integer\n", " A non-negative integer that defines the random state. Default is `None`.\n", " seed_torch : Boolean\n", " If `True` sets the random seed for pytorch tensors, so pytorch module\n", " must be imported. Default is `True`.\n", "\n", " Returns:\n", " Nothing.\n", " \"\"\"\n", " if seed is None:\n", " seed = np.random.choice(2 ** 32)\n", " random.seed(seed)\n", " np.random.seed(seed)\n", " if seed_torch:\n", " torch.manual_seed(seed)\n", " torch.cuda.manual_seed_all(seed)\n", " torch.cuda.manual_seed(seed)\n", " torch.backends.cudnn.benchmark = False\n", " torch.backends.cudnn.deterministic = True\n", "\n", " print(f'Random seed {seed} has been set.')\n", "\n", "\n", "# In case that `DataLoader` is used\n", "def seed_worker(worker_id):\n", " \"\"\"\n", " DataLoader will reseed workers following randomness in\n", " multi-process data loading algorithm.\n", "\n", " Args:\n", " worker_id: integer\n", " ID of subprocess to seed. 0 means that\n", " the data will be loaded in the main process\n", " Refer: https://pytorch.org/docs/stable/data.html#data-loading-randomness for more details\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " worker_seed = torch.initial_seed() % 2**32\n", " np.random.seed(worker_seed)\n", " random.seed(worker_seed)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set device (GPU or CPU). Execute `set_device()`\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Set device (GPU or CPU). Execute `set_device()`\n", "# especially if torch modules used.\n", "\n", "# Inform the user if the notebook uses GPU or CPU.\n", "\n", "def set_device():\n", " \"\"\"\n", " Set the device. CUDA if available, CPU otherwise\n", "\n", " Args:\n", " None\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", " if device != \"cuda\":\n", " print(\"WARNING: For this notebook to perform best, \"\n", " \"if possible, in the menu under `Runtime` -> \"\n", " \"`Change runtime type.` select `GPU` \")\n", " else:\n", " print(\"GPU is enabled in this notebook.\")\n", "\n", " return device" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "SEED = 2021\n", "set_seed(seed=SEED)\n", "DEVICE = set_device()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download `wordnet` dataset\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Download `wordnet` dataset\n", "\n", "\"\"\"\n", "NLTK Download:\n", "\n", "import nltk\n", "nltk.download('wordnet')\n", "\"\"\"\n", "\n", "import os, requests, zipfile\n", "\n", "os.environ['NLTK_DATA'] = 'nltk_data/'\n", "\n", "fnames = ['wordnet.zip', 'omw-1.4.zip']\n", "urls = ['https://osf.io/ekjxy/download', 'https://osf.io/kuwep/download']\n", "\n", "for fname, url in zip(fnames, urls):\n", " r = requests.get(url, allow_redirects=True)\n", "\n", " with open(fname, 'wb') as fd:\n", " fd.write(r.content)\n", "\n", " with zipfile.ZipFile(fname, 'r') as zip_ref:\n", " zip_ref.extractall('nltk_data/corpora')" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 1: Generative models\n", "\n", "*Time estimate: ~15mins*" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "**Please** run the cell after the video to download BigGAN (a generative model) and a few standard image datasets while the video plays." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 1: Generative Modeling\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 1: Generative Modeling\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', '5EEx0sdyR_U'), ('Bilibili', 'BV1Vy4y1j7cN')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Generative_Modeling_Video\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Download BigGAN (a generative model) and a few standard image datasets\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Download BigGAN (a generative model) and a few standard image datasets\n", "\n", "## Initially was downloaded directly\n", "# biggan_model = BigGAN.from_pretrained('biggan-deep-256')\n", "\n", "url = \"https://osf.io/3yvhw/download\"\n", "fname = \"biggan_deep_256\"\n", "r = requests.get(url, allow_redirects=True)\n", "with open(fname, 'wb') as fd:\n", " fd.write(r.content)\n", "\n", "biggan_model = torch.load(fname)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 1.1: Generating Images from BigGAN\n", "\n", "To demonstrate the power of generative models, we are giving you a sneak peek of a fully trained generative model called BigGAN. You’ll see it again (with more background under your belt) later today. For now, let’s just focus on BigGAN as a generative model. Specifically, BigGAN is a class conditional generative model for $128 \\times 128$ images. The classes are based on categorical labels that describe the images and images are generated based upon a vector ($z$ from the video lecture) and the probability that the image comes from a specific discrete category.\n", "\n", "For now, don’t worry about the specifics of the model other than the fact that it generates images based on the vector and the category label.\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Interactive Demo 1.1: BigGAN Generator\n", "\n", "\n", "To explore the space of generated images, we’ve provided you with a widget that allows you to select a category label, generate four different z vectors, and view generated images based on those z vectors. The z vector is a 128-D, which may seem high dimensional, but is much lower-dimensional than a $128 \\times 128$ image!\n", "\n", "There is one additional slider option below: the z vector is being generated from a truncated normal distribution, where you are choosing the truncation value. Essentially, you are controlling the magnitude of the vector. **You don't need to worry about the details for now though, we're just making a conceptual point here and you don't need to know the ins and outs of truncation values or z vectors.**\n", "\n", "Just know that each time you change the category or truncation value slider, 4 different z vectors are generated, resulting in 4 different images\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " BigGAN Image Generator (the updates may take a few seconds, please be patient)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown BigGAN Image Generator (the updates may take a few seconds, please be patient)\n", "\n", "# category = 'German shepherd' # @param ['tench', 'magpie', 'jellyfish', 'German shepherd', 'bee', 'acoustic guitar', 'coffee mug', 'minibus', 'monitor']\n", "# z_magnitude = .1 # @param {type:\"slider\", min:0, max:1, step:.1}\n", "\n", "\n", "from scipy.stats import truncnorm\n", "def truncated_noise_sample(batch_size=1, dim_z=128, truncation=1., seed=None):\n", " \"\"\" Create a truncated noise vector.\n", " Params:\n", " batch_size: batch size.\n", " dim_z: dimension of z\n", " truncation: truncation value to use\n", " seed: seed for the random generator\n", " Output:\n", " array of shape (batch_size, dim_z)\n", " \"\"\"\n", " state = None if seed is None else np.random.RandomState(seed)\n", " values = truncnorm.rvs(-2, 2, size=(batch_size, dim_z), random_state=state).astype(np.float32)\n", " return truncation * values\n", "\n", "\n", "def sample_from_biggan(category, z_magnitude):\n", " \"\"\"\n", " Sample from BigGAN Image Generator\n", "\n", " Args:\n", " category: string\n", " Category\n", " z_magnitude: int\n", " Magnitude of variation vector\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", "\n", " truncation = z_magnitude\n", " z = truncated_noise_sample(truncation=truncation, batch_size=4)\n", " y = one_hot_from_names(category, batch_size=4)\n", "\n", " z = torch.from_numpy(z)\n", " z = z.float()\n", " y = torch.from_numpy(y)\n", "\n", " # Move to GPU\n", " z = z.to(device=set_device())\n", " y = y.to(device=set_device())\n", " biggan_model.to(device=set_device())\n", "\n", "\n", " with torch.no_grad():\n", " output = biggan_model(z, y, truncation)\n", "\n", " # Back to CPU\n", " output = output.to('cpu')\n", "\n", " # The output layer of BigGAN has a tanh layer,\n", " # resulting the range of [-1, 1] for the output image\n", " # Therefore, we normalize the images properly to [0, 1]\n", " # range.\n", " # Clipping is only in case of numerical instability\n", " # problems\n", "\n", " output = torch.clip(((output.detach().clone() + 1) / 2.0), 0, 1)\n", "\n", " fig, axes = plt.subplots(2, 2)\n", " axes = axes.flatten()\n", " for im in range(4):\n", "\n", " axes[im].imshow(output[im].squeeze().moveaxis(0,-1))\n", " axes[im].axis('off')\n", "\n", "z_slider = FloatSlider(min=.1, max=1, step=.1, value=0.1,\n", " continuous_update=False,\n", " description='Truncation Value',\n", " style = {'description_width': '100px'},\n", " layout=Layout(width='440px'))\n", "\n", "category_dropdown = Dropdown(\n", " options=['tench', 'magpie', 'jellyfish', 'German shepherd', 'bee',\n", " 'acoustic guitar', 'coffee mug', 'minibus', 'monitor'],\n", " value=\"German shepherd\",\n", " description=\"Category: \")\n", "\n", "widgets_ui = VBox([category_dropdown, z_slider])\n", "\n", "widgets_out = interactive_output(sample_from_biggan,\n", " {\n", " 'z_magnitude': z_slider,\n", " 'category': category_dropdown\n", " }\n", " )\n", "\n", "display(widgets_ui, widgets_out)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Think! 1.1: Generated images\n", "\n", "How do the generated images look? Do they look realistic or obviously fake to you?\n", "\n", "As you increase the truncation value, what do you note about the generated images and the relationship between them?" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W2D4_GenerativeModels/solutions/W2D4_Tutorial1_Solution_f833b825.py)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Generated_Images_Discussion\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 1.2: Interpolating Images with BigGAN\n", "This next widget allows you to interpolate between two generated images. It does this by linearly interpolating between the probability of each category you select and linearly interpolating between the latent vector values." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Interactive Demo 1.2: BigGAN Interpolation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " BigGAN Interpolation Widget (the updates may take a few seconds)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown BigGAN Interpolation Widget (the updates may take a few seconds)\n", "\n", "def interpolate_biggan(category_A,\n", " category_B):\n", " \"\"\"\n", " Interpolation function with BigGan\n", "\n", " Args:\n", " category_A: string\n", " Category specification\n", " category_B: string\n", " Category specification\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " num_interps = 16\n", "\n", " # category_A = 'jellyfish' #@param ['tench', 'magpie', 'jellyfish', 'German shepherd', 'bee', 'acoustic guitar', 'coffee mug', 'minibus', 'monitor']\n", " # z_magnitude_A = 0 #@param {type:\"slider\", min:-10, max:10, step:1}\n", "\n", " # category_B = 'German shepherd' #@param ['tench', 'magpie', 'jellyfish', 'German shepherd', 'bee', 'acoustic guitar', 'coffee mug', 'minibus', 'monitor']\n", " # z_magnitude_B = 0 #@param {type:\"slider\", min:-10, max:10, step:1}\n", "\n", "\n", " def interpolate_and_shape(A, B, num_interps):\n", " \"\"\"\n", " Function to interpolate and shape images.\n", " It does this by linearly interpolating between the\n", " probability of each category you select and linearly\n", " interpolating between the latent vector values.\n", "\n", " Args:\n", " A: list\n", " List of categories\n", " B: list\n", " List of categories\n", " num_interps: int\n", " Quantity of pixel grids\n", "\n", " Returns:\n", " Interpolated np.ndarray\n", " \"\"\"\n", " interps = interpolate(A, B, num_interps)\n", " return (interps.transpose(1, 0, *range(2, len(interps.shape))).reshape(num_interps, *interps.shape[2:]))\n", "\n", " # unit_vector = np.ones((1, 128))/np.sqrt(128)\n", " # z_A = z_magnitude_A * unit_vector\n", " # z_B = z_magnitude_B * unit_vector\n", " truncation = .4\n", " z_A = truncated_noise_sample(truncation=truncation, batch_size=1)\n", " z_B = truncated_noise_sample(truncation=truncation, batch_size=1)\n", " y_A = one_hot_from_names(category_A, batch_size=1)\n", " y_B = one_hot_from_names(category_B, batch_size=1)\n", "\n", " z_interp = interpolate_and_shape(z_A, z_B, num_interps)\n", " y_interp = interpolate_and_shape(y_A, y_B, num_interps)\n", "\n", " # Convert to tensor\n", " z_interp = torch.from_numpy(z_interp)\n", " z_interp = z_interp.float()\n", " y_interp = torch.from_numpy(y_interp)\n", "\n", " # Move to GPU\n", " z_interp = z_interp.to(DEVICE)\n", " y_interp = y_interp.to(DEVICE)\n", " biggan_model.to(DEVICE)\n", "\n", " with torch.no_grad():\n", " output = biggan_model(z_interp, y_interp, 1)\n", "\n", " # Back to CPU\n", " output = output.to('cpu')\n", "\n", " # The output layer of BigGAN has a tanh layer,\n", " # resulting the range of [-1, 1] for the output image\n", " # Therefore, we normalize the images properly to\n", " # [0, 1] range.\n", " # Clipping is only in case of numerical instability\n", " # problems\n", "\n", " output = torch.clip(((output.detach().clone() + 1) / 2.0), 0, 1)\n", " output = output\n", "\n", " # Make grid and show generated samples\n", " output_grid = torchvision.utils.make_grid(output,\n", " nrow=min(4, output.shape[0]),\n", " padding=5)\n", " plt.axis('off');\n", " plt.imshow(output_grid.permute(1, 2, 0))\n", " plt.show()\n", "\n", "\n", "# z_A_slider = IntSlider(min=-10, max=10, step=1, value=0,\n", "# continuous_update=False, description='Z Magnitude A',\n", "# layout=Layout(width='440px'),\n", "# style={'description_width': 'initial'})\n", "\n", "# z_B_slider = IntSlider(min=-10, max=10, step=1, value=0,\n", "# continuous_update=False, description='Z Magntude B',\n", "# layout=Layout(width='440px'),\n", "# style={'description_width': 'initial'})\n", "\n", "category_A_dropdown = Dropdown(\n", " options=['tench', 'magpie', 'jellyfish', 'German shepherd', 'bee',\n", " 'acoustic guitar', 'coffee mug', 'minibus', 'monitor'],\n", " value=\"German shepherd\",\n", " description=\"Category A: \")\n", "\n", "category_B_dropdown = Dropdown(\n", " options=['tench', 'magpie', 'jellyfish', 'German shepherd', 'bee',\n", " 'acoustic guitar', 'coffee mug', 'minibus', 'monitor'],\n", " value=\"jellyfish\",\n", " description=\"Category B: \")\n", "\n", "\n", "\n", "widgets_ui = VBox([HBox([category_A_dropdown]),\n", " HBox([category_B_dropdown])])\n", "\n", "widgets_out = interactive_output(interpolate_biggan,\n", " {'category_A': category_A_dropdown,\n", " # 'z_magnitude_A': z_A_slider,\n", " 'category_B': category_B_dropdown})\n", " # 'z_magnitude_B': z_B_slider})\n", "\n", "display(widgets_ui, widgets_out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_BigGAN_Interpolation_Interactive_Demo\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Think! 1.2: Interpolating samples from the same category\n", "\n", "Try interpolating between samples from the same category, samples from similar categories, and samples from very different categories. Do you notice any trends? What does this suggest about the representations of images in the latent space?" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W2D4_GenerativeModels/solutions/W2D4_Tutorial1_Solution_09759a98.py)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Samples_from_the_same_category_Discussion\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 2: Latent Variable Models\n", "\n", "*Time estimate: ~15mins* excluding the Bonus" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 2: Latent Variable Models\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 2: Latent Variable Models\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', '_e0nKUeBDFo'), ('Bilibili', 'BV1Db4y167Ys')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Latent_Variable_Models_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "In the video, the concept of a latent variable model was introduced. We saw how PCA (principal component analysis) can be extended into a generative model with latent variables called probabilistic PCA (pPCA). For pPCA the latent variables (z in the video) are the projections onto the principal component axes.\n", "\n", "The dimensionality of the principal components is typically set to be substantially lower-dimensional than the original data. Thus, the latent variables (the projection onto the principal component axes) are a lower-dimensional representation of the original data (dimensionality reduction!). With pPCA we can estimate the original distribution of the high dimensional data. This allows us to generate data with a distribution that “looks” more like the original data than if we were to only use PCA to generate data from the latent variables. Let’s see how that might look with a simple example." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## (Bonus) Coding Exercise 2: pPCA\n", "\n", "Assume we have two noisy thermometers measuring the temperature of the same room. They both make noisy measurements. The room tends to be around 25°C (that's 77°F), but can vary around that temperature. If we take lots of readings from the two thermometers over time and plot the paired readings, we might see something like the plot generated below:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Generate example datapoints from the two thermometers\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Generate example datapoints from the two thermometers\n", "\n", "def generate_data(n_samples, mean_of_temps, cov_of_temps, seed):\n", " \"\"\"\n", " Generate random data, normally distributed\n", "\n", " Args:\n", " n_samples : int\n", " The number of samples to be generated\n", " mean_of_temps : numpy.ndarray\n", " 1D array with the mean of temparatures, Kx1\n", " cov_of_temps : numpy.ndarray\n", " 2D array with the covariance, , KxK\n", " seed : int\n", " Set random seed for the psudo random generator\n", "\n", " Returns:\n", " therm1 : numpy.ndarray\n", " Thermometer 1\n", " therm2 : numpy.ndarray\n", " Thermometer 2\n", " \"\"\"\n", "\n", " np.random.seed(seed)\n", " therm1, therm2 = np.random.multivariate_normal(mean_of_temps,\n", " cov_of_temps,\n", " n_samples).T\n", " return therm1, therm2\n", "\n", "\n", "n_samples = 2000\n", "mean_of_temps = np.array([25, 25])\n", "cov_of_temps = np.array([[10, 5], [5, 10]])\n", "therm1, therm2 = generate_data(n_samples, mean_of_temps, cov_of_temps, seed=SEED)\n", "\n", "plt.plot(therm1, therm2, '.')\n", "plt.axis('equal')\n", "plt.xlabel('Thermometer 1 ($^\\circ$C)')\n", "plt.ylabel('Thermometer 2 ($^\\circ$C)')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Let’s model these data with a single principal component. Given that the thermometers are measuring the same actual temperature, the principal component axes will be the identity line. The direction of this axes can be indicated by the unit vector $[1 ~~ 1]~/~\\sqrt2$. We could estimate this axes by applying PCA. We can plot this axes, it tells us something about the data, but we can’t generate from it:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Add first PC axes to the plot\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Add first PC axes to the plot\n", "\n", "plt.plot(therm1, therm2, '.')\n", "plt.axis('equal')\n", "plt.xlabel('Thermometer 1 ($^\\circ$C)')\n", "plt.ylabel('Thermometer 2 ($^\\circ$C)')\n", "plt.plot([plt.axis()[0], plt.axis()[1]],\n", " [plt.axis()[0], plt.axis()[1]])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "**Step 1:** Calculate the parameters of the pPCA model\n", "\n", "This part is completed already, so you don't need to make any edits:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# Project Data onto the principal component axes.\n", "# We could have \"learned\" this from the data by applying PCA,\n", "# but we \"know\" the value from the problem definition.\n", "pc_axes = np.array([1.0, 1.0]) / np.sqrt(2.0)\n", "\n", "# Thermometers data\n", "therm_data = np.array([therm1, therm2])\n", "\n", "# Zero center the data\n", "therm_data_mean = np.mean(therm_data, 1)\n", "therm_data_center = np.outer(therm_data_mean, np.ones(therm_data.shape[1]))\n", "therm_data_zero_centered = therm_data - therm_data_center\n", "\n", "# Calculate the variance of the projection on the PC axes\n", "pc_projection = np.matmul(pc_axes, therm_data_zero_centered);\n", "pc_axes_variance = np.var(pc_projection)\n", "\n", "# Calculate the residual variance (variance not accounted for by projection on the PC axes)\n", "sensor_noise_std = np.mean(np.linalg.norm(therm_data_zero_centered - np.outer(pc_axes, pc_projection), axis=0, ord=2))\n", "sensor_noise_var = sensor_noise_std **2" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "**Step 2**: \"Generate\" from the pPCA model of the thermometer data.\n", "\n", "Complete the code so we generate data by sampling according to the pPCA model:\n", "\n", "\\begin{equation}\n", "x = \\mu + W z + \\epsilon, \\,\\text{where}\\,~~ \\epsilon \\sim \\mathcal{N}(0,~\\sigma^2 \\mathbf{I})\n", "\\end{equation}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "def gen_from_pPCA(noise_var, data_mean, pc_axes, pc_variance):\n", " \"\"\"\n", " Generate samples from pPCA\n", "\n", " Args:\n", " noise_var: np.ndarray\n", " Sensor noise variance\n", " data_mean: np.ndarray\n", " Thermometer data mean\n", " pc_axes: np.ndarray\n", " Principal component axes\n", " pc_variance: np.ndarray\n", " The variance of the projection on the PC axes\n", "\n", " Returns:\n", " therm_data_sim: np.ndarray\n", " Generated (simulate, draw) `n_samples` from pPCA model\n", " \"\"\"\n", " # We are matching this value to the thermometer data so the visualizations look similar\n", " n_samples = 1000\n", "\n", " # Randomly sample from z (latent space value)\n", " z = np.random.normal(0.0, np.sqrt(pc_variance), n_samples)\n", "\n", " # Sensor noise covariance matrix (∑)\n", " epsilon_cov = [[noise_var, 0.0], [0.0, noise_var]]\n", "\n", " # Data mean reshaped for the generation\n", " sim_mean = np.outer(data_mean, np.ones(n_samples))\n", "\n", " ####################################################################\n", " # Fill in all missing code below (...),\n", " # then remove or comment the line below to test your class\n", " raise NotImplementedError(\"Please complete the `gen_from_pPCA` function\")\n", " ####################################################################\n", " # Draw `n_samples` from `np.random.multivariate_normal`\n", " rand_eps = ...\n", " rand_eps = rand_eps.T\n", "\n", " # Generate (simulate, draw) `n_samples` from pPCA model\n", " therm_data_sim = ...\n", "\n", " return therm_data_sim\n", "\n", "\n", "\n", "## Uncomment to test your code\n", "# therm_data_sim = gen_from_pPCA(sensor_noise_var, therm_data_mean, pc_axes, pc_axes_variance)\n", "# plot_gen_samples_ppca(therm1, therm2, therm_data_sim)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W2D4_GenerativeModels/solutions/W2D4_Tutorial1_Solution_3b0c285b.py)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Coding_pPCA_Exercise\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 3: Autoencoders\n", "\n", "*Time estimate: ~30mins*" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "**Please** run the cell after the video to download MNIST and CIFAR10 image datasets while the video plays." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 3: Autoencoders\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 3: Autoencoders\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'MlyIL1PmDCA'), ('Bilibili', 'BV16b4y167Z2')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Autoencoders_Video\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Download MNIST and CIFAR10 datasets\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Download MNIST and CIFAR10 datasets\n", "import tarfile, requests, os\n", "\n", "fname = 'MNIST.tar.gz'\n", "name = 'mnist'\n", "url = 'https://osf.io/y2fj6/download'\n", "\n", "if not os.path.exists(name):\n", " print('\\nDownloading MNIST dataset...')\n", " r = requests.get(url, allow_redirects=True)\n", " with open(fname, 'wb') as fh:\n", " fh.write(r.content)\n", " print('\\nDownloading MNIST completed!\\n')\n", "\n", "if not os.path.exists(name):\n", " with tarfile.open(fname) as tar:\n", " tar.extractall(name)\n", " os.remove(fname)\n", "else:\n", " print('MNIST dataset has been downloaded.\\n')\n", "\n", "\n", "fname = 'cifar-10-python.tar.gz'\n", "name = 'cifar10'\n", "url = 'https://osf.io/jbpme/download'\n", "\n", "if not os.path.exists(name):\n", " print('\\nDownloading CIFAR10 dataset...')\n", " r = requests.get(url, allow_redirects=True)\n", " with open(fname, 'wb') as fh:\n", " fh.write(r.content)\n", " print('\\nDownloading CIFAR10 completed!')\n", "\n", "if not os.path.exists(name):\n", " with tarfile.open(fname) as tar:\n", " tar.extractall(name)\n", " os.remove(fname)\n", "else:\n", " print('CIFAR10 dataset has been dowloaded.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Load MNIST and CIFAR10 image datasets\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Load MNIST and CIFAR10 image datasets\n", "# See https://pytorch.org/docs/stable/torchvision/datasets.html\n", "\n", "# MNIST\n", "mnist = datasets.MNIST('./mnist/',\n", " train=True,\n", " transform=transforms.ToTensor(),\n", " download=False)\n", "mnist_val = datasets.MNIST('./mnist/',\n", " train=False,\n", " transform=transforms.ToTensor(),\n", " download=False)\n", "\n", "# CIFAR 10\n", "cifar10 = datasets.CIFAR10('./cifar10/',\n", " train=True,\n", " transform=transforms.ToTensor(),\n", " download=False)\n", "cifar10_val = datasets.CIFAR10('./cifar10/',\n", " train=False,\n", " transform=transforms.ToTensor(),\n", " download=False)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Select a dataset\n", "\n", "We've built today's tutorial to be flexible. It should work more-or-less out of the box with both MNIST and CIFAR (and other image datasets). MNIST is in many ways simpler, and the results will likely look better and run a bit faster if using MNIST. But we are leaving it up to you to pick which one you want to experiment with!\n", "\n", "We encourage pods to coordinate so that some members use MNIST and others use CIFAR10. Keep in mind that the CIFAR dataset may require more learning epochs (longer training required).\n", "\n", "Change the variable `dataset_name` below to pick your dataset.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Execute this cell to enable helper function `get_data`\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Execute this cell to enable helper function `get_data`\n", "\n", "def get_data(name='mnist'):\n", " \"\"\"\n", " Get data\n", "\n", " Args:\n", " name: string\n", " Name of the dataset\n", "\n", " Returns:\n", " my_dataset: dataset instance\n", " Instance of dataset\n", " my_dataset_name: string\n", " Name of the dataset\n", " my_dataset_shape: tuple\n", " Shape of dataset\n", " my_dataset_size: int\n", " Size of dataset\n", " my_valset: torch.loader\n", " Validation loader\n", " \"\"\"\n", " if name == 'mnist':\n", " my_dataset_name = \"MNIST\"\n", " my_dataset = mnist\n", " my_valset = mnist_val\n", " my_dataset_shape = (1, 28, 28)\n", " my_dataset_size = 28 * 28\n", " elif name == 'cifar10':\n", " my_dataset_name = \"CIFAR10\"\n", " my_dataset = cifar10\n", " my_valset = cifar10_val\n", " my_dataset_shape = (3, 32, 32)\n", " my_dataset_size = 3 * 32 * 32\n", "\n", " return my_dataset, my_dataset_name, my_dataset_shape, my_dataset_size, my_valset" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "dataset_name = 'mnist' # This can be mnist or cifar10\n", "train_set, dataset_name, data_shape, data_size, valid_set = get_data(name=dataset_name)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 3.1: Conceptual introduction to AutoEncoders" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Now we'll create our first autoencoder. It will reduce images down to $K$ dimensions. The architecture will be quite simple: the input will be linearly mapped to a single hidden (or latent) layer $\\mathbf{h}$ with $K$ units, which will then be linearly mapped back to an output that is the same size as the input:\n", "\n", "\\begin{equation}\n", "\\mathbf{x} \\longrightarrow \\mathbf{h} \\longrightarrow \\mathbf{x'}\n", "\\end{equation}\n", "\n", "The loss function we'll use will simply be mean squared error (MSE) quantifying how well the reconstruction ($\\mathbf{x'}$) matches the original image ($\\mathbf{x}$):\n", "\n", "\\begin{equation}\n", "\\text{MSE Loss} = \\sum_{i=1}^{N} ||\\mathbf{x}_i - \\mathbf{x'}_i||^2_2\n", "\\end{equation}\n", "\n", "If all goes well, then the AutoEncoder will learn, **end to end**, a good \"encoding\" or \"compression\" of inputs to a latent representation ($\\mathbf{x \\longrightarrow h}$) as well as a good \"decoding\" of that latent representation to a reconstruction of the original input ($\\mathbf{h \\longrightarrow x'}$)." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "We first need to choose our desired dimensionality of $\\mathbf{h}$. We'll see more on this below, but for MNIST, 5 to 20 is plenty. For CIFAR, we need more like 50 to 100 dimensions.\n", "\n", "Coordinate with your pod to try a variety of values for $K$ in each dataset so you can compare results." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Coding Exercise 3.1: Linear AutoEncoder Architecture\n", "\n", "Complete the missing parts of the `LinearAutoEncoder` class. We're back to using PyTorch in this exercise.\n", "\n", "The `LinearAutoEncoder` as two stages: an `encoder` which linearly maps from inputs of size `x_dim = my_dataset_dim` to a hidden layer of size `h_dim = K` (with no nonlinearity), and a `decoder` which maps back from `K` up to the number of pixels in each image." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown #### Run to define the `train_autoencoder` function.\n", "# @markdown Feel free to inspect the training function if the time allows.\n", "\n", "# @markdown `train_autoencoder(autoencoder, dataset, device, epochs=20, batch_size=250, seed=0)`\n", "\n", "\n", "def train_autoencoder(autoencoder, dataset, device, epochs=20, batch_size=250,\n", " seed=0):\n", " \"\"\"\n", " Function to train autoencoder\n", "\n", " Args:\n", " autoencoder: nn.module\n", " Autoencoder instance\n", " dataset: function\n", " Dataset\n", " device: string\n", " GPU if available. CPU otherwise\n", " epochs: int\n", " Number of epochs [default: 20]\n", " batch_size: int\n", " Batch size\n", " seed: int\n", " Set seed for reproducibility; [default: 0]\n", "\n", " Returns:\n", " mse_loss: float\n", " MSE Loss\n", " \"\"\"\n", " autoencoder.to(device)\n", " optim = torch.optim.Adam(autoencoder.parameters(),\n", " lr=1e-3,\n", " weight_decay=1e-5)\n", " loss_fn = nn.MSELoss()\n", " g_seed = torch.Generator()\n", " g_seed.manual_seed(seed)\n", " loader = DataLoader(dataset,\n", " batch_size=batch_size,\n", " shuffle=True,\n", " pin_memory=True,\n", " num_workers=2,\n", " worker_init_fn=seed_worker,\n", " generator=g_seed)\n", "\n", " mse_loss = torch.zeros(epochs * len(dataset) // batch_size, device=device)\n", " i = 0\n", " for epoch in trange(epochs, desc='Epoch'):\n", " for im_batch, _ in loader:\n", " im_batch = im_batch.to(device)\n", " optim.zero_grad()\n", " reconstruction = autoencoder(im_batch)\n", " # Loss calculation\n", " loss = loss_fn(reconstruction.view(batch_size, -1),\n", " target=im_batch.view(batch_size, -1))\n", " loss.backward()\n", " optim.step()\n", "\n", " mse_loss[i] = loss.detach()\n", " i += 1\n", " # After training completes,\n", " # make sure the model is on CPU so we can easily\n", " # do more visualizations and demos.\n", " autoencoder.to('cpu')\n", " return mse_loss.cpu()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "class LinearAutoEncoder(nn.Module):\n", " \"\"\"\n", " Linear Autoencoder\n", " \"\"\"\n", "\n", " def __init__(self, x_dim, h_dim):\n", " \"\"\"\n", " A Linear AutoEncoder\n", "\n", " Args:\n", " x_dim: int\n", " Input dimension\n", " h_dim: int\n", " Hidden dimension, bottleneck dimension, K\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " super().__init__()\n", " ####################################################################\n", " # Fill in all missing code below (...),\n", " # then remove or comment the line below to test your class\n", " raise NotImplementedError(\"Please complete the LinearAutoEncoder class!\")\n", " ####################################################################\n", " # Encoder layer (a linear mapping from x_dim to K)\n", " self.enc_lin = ...\n", " # Decoder layer (a linear mapping from K to x_dim)\n", " self.dec_lin = ...\n", "\n", " def encode(self, x):\n", " \"\"\"\n", " Encoder function\n", "\n", " Args:\n", " x: torch.tensor\n", " Input features\n", "\n", " Returns:\n", " x: torch.tensor\n", " Encoded output\n", " \"\"\"\n", " ####################################################################\n", " # Fill in all missing code below (...),\n", " raise NotImplementedError(\"Please complete the `encode` function!\")\n", " ####################################################################\n", " h = ...\n", " return h\n", "\n", " def decode(self, h):\n", " \"\"\"\n", " Decoder function\n", "\n", " Args:\n", " h: torch.tensor\n", " Encoded output\n", "\n", " Returns:\n", " x_prime: torch.tensor\n", " Decoded output\n", " \"\"\"\n", " ####################################################################\n", " # Fill in all missing code below (...),\n", " raise NotImplementedError(\"Please complete the `decode` function!\")\n", " ####################################################################\n", " x_prime = ...\n", " return x_prime\n", "\n", " def forward(self, x):\n", " \"\"\"\n", " Forward pass\n", "\n", " Args:\n", " x: torch.tensor\n", " Input data\n", "\n", " Returns:\n", " Decoded output\n", " \"\"\"\n", " flat_x = x.view(x.size(0), -1)\n", " h = self.encode(flat_x)\n", " return self.decode(h).view(x.size())\n", "\n", "\n", "\n", "# Pick your own K\n", "K = 20\n", "set_seed(seed=SEED)\n", "## Uncomment to test your code\n", "# lin_ae = LinearAutoEncoder(data_size, K)\n", "# lin_losses = train_autoencoder(lin_ae, train_set, device=DEVICE, seed=SEED)\n", "# plot_linear_ae(lin_losses)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W2D4_GenerativeModels/solutions/W2D4_Tutorial1_Solution_3872c34f.py)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Linear_Autoencoder_Exercise\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Comparison to PCA\n", "\n", "One way to think about AutoEncoders is as a form of dimensionality-reduction. The dimensionality of $\\mathbf{h}$ is much smaller than the dimensionality of $\\mathbf{x}$.\n", "\n", "Another common technique for dimensionality reduction is to project data onto the top $K$ **principal components** (Principal Component Analysis or PCA). For comparison, let's also apply PCA for dimensionality reduction. The following cell will do this using the same value of K as you chose for the linear autoencoder." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# PCA requires finding the top K eigenvectors of the data covariance. Start by\n", "# finding the mean and covariance of the pixels in our dataset\n", "g_seed = torch.Generator()\n", "g_seed.manual_seed(SEED)\n", "\n", "loader = DataLoader(train_set,\n", " batch_size=32,\n", " pin_memory=True,\n", " num_workers=2,\n", " worker_init_fn=seed_worker,\n", " generator=g_seed)\n", "\n", "mu, cov = image_moments((im for im, _ in loader),\n", " n_batches=len(train_set) // 32)\n", "\n", "pca_encode, pca_decode = pca_encoder_decoder(mu, cov, K)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Let's visualize some of the reconstructions ($\\mathbf{x'}$) side-by-side with the input images ($\\mathbf{x}$)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Visualize the reconstructions $\\mathbf{x}'$, run this code a few times to see different examples.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Visualize the reconstructions $\\mathbf{x}'$, run this code a few times to see different examples.\n", "\n", "n_plot = 7\n", "plt.figure(figsize=(10, 4.5))\n", "for i in range(n_plot):\n", " idx = torch.randint(len(train_set), size=())\n", " image, _ = train_set[idx]\n", " # Get reconstructed image from autoencoder\n", " with torch.no_grad():\n", " reconstruction = lin_ae(image.unsqueeze(0)).reshape(image.size())\n", "\n", " # Get reconstruction from PCA dimensionality reduction\n", " h_pca = pca_encode(image)\n", " recon_pca = pca_decode(h_pca).reshape(image.size())\n", "\n", " plt.subplot(3, n_plot, i + 1)\n", " plot_torch_image(image)\n", " if i == 0:\n", " plt.ylabel('Original\\nImage')\n", "\n", " plt.subplot(3, n_plot, i + 1 + n_plot)\n", " plot_torch_image(reconstruction)\n", " if i == 0:\n", " plt.ylabel(f'Lin AE\\n(K={K})')\n", "\n", " plt.subplot(3, n_plot, i + 1 + 2*n_plot)\n", " plot_torch_image(recon_pca)\n", " if i == 0:\n", " plt.ylabel(f'PCA\\n(K={K})')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Think! 3.1: PCA vs. Linear autoenconder\n", "\n", "Compare the PCA-based reconstructions to those from the linear autoencoder. Is one better than the other? Are they equally good? Equally bad?\n", "\n", "Try out the above cells with a couple values of K if possible. How does the choice of $K$ impact reconstruction quality?" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W2D4_GenerativeModels/solutions/W2D4_Tutorial1_Solution_2cc24bef.py)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_PCA_vs_LinearAutoEncoder\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 3.2: Building a nonlinear convolutional autoencoder" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Ok so we have linear autoencoders doing about the same thing as PCA. We want to improve on that though! We can do so by adding nonlinearity and convolutions.\n", "\n", "**Nonlinear:** We'd like to apply autoencoders to learn a more flexible nonlinear mapping between the latent space and the images. Such a mapping can provide a more \"expressive\" model that better describes the image data than a linear mapping. This can be achieved by adding nonlinear activation functions to our encoder and decoder!\n", "\n", "**Convolutional:** As you saw on the day dedicated to RNNs and CNNs, parameter sharing is often a good idea for images! It's quite common to use convolutional layers in autoencoders to share parameters across locations in the image.\n", "\n", "**Side Note:** The `nn.Linear` layer (used in the linear autoencoder above) has a \"bias\" term, which is a learnable offset parameter separate for each output unit. Just like PCA \"centers\" the data by subtracting off the mean image (`mu`) before encoding and adds the average back in during decoding, a bias term in the decoder can effectively account for the first moment (mean) of the data (i.e. the average of all images in the training set). Convolution layers do have bias parameters, but the bias is applied per filter rather than per pixel location. If we're generating grayscale images (like those in MNIST), then `Conv2d` will learn only one bias across the entire image.\n", "\n", "For some conceptual continuity with both PCA and the `nn.Linear` layers above, the next block defines a custom `BiasLayer` for adding a learnable per-pixel offset. This custom layer will be used twice: as the first stage of the encoder and as the final stage of the decoder. Ideally, this means that the rest of the neural net can focus on fitting more interesting fine-grained structure." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "class BiasLayer(nn.Module):\n", " \"\"\"\n", " Bias Layer\n", " \"\"\"\n", "\n", " def __init__(self, shape):\n", " \"\"\"\n", " Initialise parameters of bias layer\n", "\n", " Args:\n", " shape: tuple\n", " Requisite shape of bias layer\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " super(BiasLayer, self).__init__()\n", " init_bias = torch.zeros(shape)\n", " self.bias = nn.Parameter(init_bias, requires_grad=True)\n", "\n", " def forward(self, x):\n", " \"\"\"\n", " Forward pass\n", "\n", " Args:\n", " x: torch.tensor\n", " Input features\n", "\n", " Returns:\n", " Output of bias layer\n", " \"\"\"\n", " return x + self.bias" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "With that out of the way, we will next define a **nonlinear** and **convolutional** autoencoder. Here's a quick tour of the architecture:\n", "\n", "1. The **encoder** once again maps from images to $\\mathbf{h}\\in\\mathbb{R}^K$. This will use a `BiasLayer` followed by two convolutional layers (`nn.Conv2D`), followed by flattening and linearly projecting down to $K$ dimensions. The convolutional layers will have `ReLU` nonlinearities on their outputs.\n", "1. The **decoder** inverts this process, taking in vectors of length $K$ and outputting images. Roughly speaking, its architecture is a \"mirror image\" of the encoder: the first decoder layer is linear, followed by two **deconvolution** layers ([`ConvTranspose2d`](https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html)). The `ConvTranspose2d`layers will have `ReLU` nonlinearities on their _inputs_. This \"mirror image\" between the encoder and decoder is a useful and near-ubiquitous convention. The idea is that the decoder can then learn to approximately invert the encoder, but it is not a strict requirement (and it does not guarantee the decoder will be an exact inverse of the encoder!).\n", "\n", "Below is a schematic of the architecture for MNIST. Notice that the width and height dimensions of the image planes reduce after each `nn.Conv2d` and increase after each `nn.ConvTranspose2d`. With CIFAR10, the architecture is the same but the exact sizes will differ.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "[`torch.nn.ConvTranspose2d`](https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html) module can be seen as the gradient of `Conv2d` with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation). The following code demonstrates this change in sizes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "dummy_image = torch.rand(data_shape).unsqueeze(0)\n", "in_channels = data_shape[0]\n", "out_channels = 7\n", "\n", "dummy_conv = nn.Conv2d(in_channels=in_channels,\n", " out_channels=out_channels,\n", " kernel_size=5)\n", "\n", "dummy_deconv = nn.ConvTranspose2d(in_channels=out_channels,\n", " out_channels=in_channels,\n", " kernel_size=5)\n", "\n", "print(f'Size of image is {dummy_image.shape}')\n", "print(f'Size of Conv2D(image) {dummy_conv(dummy_image).shape}')\n", "print(f'Size of ConvTranspose2D(Conv2D(image)) {dummy_deconv(dummy_conv(dummy_image)).shape}')" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Coding Exercise 3.2: Fill in code for the `ConvAutoEncoder` module\n", "\n", "Complete the `ConvAutoEncoder` class. We use the helper function `cout(torch.Tensor, nn.Conv2D)` to calculate the output shape of a [`nn.Conv2D`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html) layer given a tensor with shape (channels, height, width).\n", "\n", "It will use the value for **K** you defined in Coding Exercise 3.1 as we will eventually compare the results of the linear autoencoder that you trained there with this one. To play around with K, change it there and retrain both the linear autoencoder and the convolutional autoencoders.\n", "\n", "**Do you expect the convolutional autoencoder or the linear autoencoder to reach a lower value of mean squared error (MSE)?**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "class ConvAutoEncoder(nn.Module):\n", " \"\"\"\n", " A Convolutional AutoEncoder\n", " \"\"\"\n", "\n", " def __init__(self, x_dim, h_dim, n_filters=32, filter_size=5):\n", " \"\"\"\n", " Initialize parameters of ConvAutoEncoder\n", "\n", " Args:\n", " x_dim: tuple\n", " Input dimensions (channels, height, widths)\n", " h_dim: int\n", " Hidden dimension, bottleneck dimension, K\n", " n_filters: int\n", " Number of filters (number of output channels)\n", " filter_size: int\n", " Kernel size\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", " super().__init__()\n", " channels, height, widths = x_dim\n", "\n", " # Encoder input bias layer\n", " self.enc_bias = BiasLayer(x_dim)\n", "\n", " # First encoder conv2d layer\n", " self.enc_conv_1 = nn.Conv2d(channels, n_filters, filter_size)\n", "\n", " # Output shape of the first encoder conv2d layer given x_dim input\n", " conv_1_shape = cout(x_dim, self.enc_conv_1)\n", "\n", " # Second encoder conv2d layer\n", " self.enc_conv_2 = nn.Conv2d(n_filters, n_filters, filter_size)\n", "\n", " # Output shape of the second encoder conv2d layer given conv_1_shape input\n", " conv_2_shape = cout(conv_1_shape, self.enc_conv_2)\n", "\n", " # The bottleneck is a dense layer, therefore we need a flattenning layer\n", " self.enc_flatten = nn.Flatten()\n", "\n", " # Conv output shape is (depth, height, width), so the flatten size is:\n", " flat_after_conv = conv_2_shape[0] * conv_2_shape[1] * conv_2_shape[2]\n", "\n", " # Encoder Linear layer\n", " self.enc_lin = nn.Linear(flat_after_conv, h_dim)\n", "\n", " ####################################################################\n", " # Fill in all missing code below (...),\n", " # then remove or comment the line below to test your class\n", " # Remember that decoder is \"undo\"-ing what the encoder has done!\n", " raise NotImplementedError(\"Please complete the `ConvAutoEncoder` class!\")\n", " ####################################################################\n", " # Decoder Linear layer\n", " self.dec_lin = ...\n", "\n", " # Unflatten data to (depth, height, width) shape\n", " self.dec_unflatten = nn.Unflatten(dim=-1, unflattened_size=conv_2_shape)\n", "\n", " # First \"deconvolution\" layer\n", " self.dec_deconv_1 = nn.ConvTranspose2d(n_filters, n_filters, filter_size)\n", "\n", " # Second \"deconvolution\" layer\n", " self.dec_deconv_2 = ...\n", "\n", " # Decoder output bias layer\n", " self.dec_bias = BiasLayer(x_dim)\n", "\n", " def encode(self, x):\n", " \"\"\"\n", " Encoder\n", "\n", " Args:\n", " x: torch.tensor\n", " Input features\n", "\n", " Returns:\n", " h: torch.tensor\n", " Encoded output\n", " \"\"\"\n", " s = self.enc_bias(x)\n", " s = F.relu(self.enc_conv_1(s))\n", " s = F.relu(self.enc_conv_2(s))\n", " s = self.enc_flatten(s)\n", " h = self.enc_lin(s)\n", " return h\n", "\n", " def decode(self, h):\n", " \"\"\"\n", " Decoder\n", "\n", " Args:\n", " h: torch.tensor\n", " Encoded output\n", "\n", " Returns:\n", " x_prime: torch.tensor\n", " Decoded output\n", " \"\"\"\n", " s = F.relu(self.dec_lin(h))\n", " s = self.dec_unflatten(s)\n", " s = F.relu(self.dec_deconv_1(s))\n", " s = self.dec_deconv_2(s)\n", " x_prime = self.dec_bias(s)\n", " return x_prime\n", "\n", " def forward(self, x):\n", " \"\"\"\n", " Forward pass\n", "\n", " Args:\n", " x: torch.tensor\n", " Input features\n", "\n", " Returns:\n", " Decoded output\n", " \"\"\"\n", " return self.decode(self.encode(x))\n", "\n", "\n", "\n", "set_seed(seed=SEED)\n", "## Uncomment to test your solution\n", "# trained_conv_AE = ConvAutoEncoder(data_shape, K)\n", "# assert trained_conv_AE.encode(train_set[0][0].unsqueeze(0)).numel() == K, \"Encoder output size should be K!\"\n", "# conv_losses = train_autoencoder(trained_conv_AE, train_set, device=DEVICE, seed=SEED)\n", "# plot_conv_ae(lin_losses, conv_losses)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W2D4_GenerativeModels/solutions/W2D4_Tutorial1_Solution_71a661e8.py)\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "You should see that the `ConvAutoEncoder` achieved lower MSE loss than the linear one. If not, you may need to retrain it (or run another few training epochs from where it left off). We make fewer guarantees on this working with CIFAR10, but it should definitely work with MNIST.\n", "\n", "Now let's visually compare the reconstructed images from the linear and nonlinear autoencoders. Keep in mind that both have the same dimensionality for $\\mathbf{h}$!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Visualize the linear and nonlinear AE outputs\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Visualize the linear and nonlinear AE outputs\n", "if lin_ae.enc_lin.out_features != trained_conv_AE.enc_lin.out_features:\n", " raise ValueError('ERROR: your linear and convolutional autoencoders have different values of K')\n", "\n", "n_plot = 7\n", "plt.figure(figsize=(10, 4.5))\n", "for i in range(n_plot):\n", " idx = torch.randint(len(train_set), size=())\n", " image, _ = train_set[idx]\n", " with torch.no_grad():\n", " # Get reconstructed image from linear autoencoder\n", " lin_recon = lin_ae(image.unsqueeze(0))[0]\n", "\n", " # Get reconstruction from deep (nonlinear) autoencoder\n", " nonlin_recon = trained_conv_AE(image.unsqueeze(0))[0]\n", "\n", " plt.subplot(3, n_plot, i+1)\n", " plot_torch_image(image)\n", " if i == 0:\n", " plt.ylabel('Original\\nImage')\n", "\n", " plt.subplot(3, n_plot, i + 1 + n_plot)\n", " plot_torch_image(lin_recon)\n", " if i == 0:\n", " plt.ylabel(f'Lin AE\\n(K={K})')\n", "\n", " plt.subplot(3, n_plot, i + 1 + 2*n_plot)\n", " plot_torch_image(nonlin_recon)\n", " if i == 0:\n", " plt.ylabel(f'NonLin AE\\n(K={K})')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_NonLinear_AutoEncoder_Exercise\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 4: Variational Auto-Encoders (VAEs)\n", "\n", "*Time estimate: ~25mins*" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "**Please** run the cell after the video to train a VAE for MNIST while watching it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 4: Variational Autoencoders\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 4: Variational Autoencoders\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'srWb_Gp6OGA'), ('Bilibili', 'BV17v411E7ye')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Variational_AutoEncoder_Video\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Train a VAE for MNIST while watching the video. (Note: this VAE has a 2D latent space. If you are feeling ambitious, edit the code and modify the latent space dimensionality and see what happens.)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Train a VAE for MNIST while watching the video. (Note: this VAE has a 2D latent space. If you are feeling ambitious, edit the code and modify the latent space dimensionality and see what happens.)\n", "K_VAE = 2\n", "\n", "\n", "class ConvVAE(nn.Module):\n", " \"\"\"\n", " Convolutional Variational Autoencoder\n", " \"\"\"\n", " def __init__(self, K, num_filters=32, filter_size=5):\n", " \"\"\"\n", " Initialize parameters of ConvVAE\n", "\n", " Args:\n", " K: int\n", " Bottleneck dimensionality\n", " num_filters: int\n", " Number of filters [default: 32]\n", " filter_size: int\n", " Filter size [default: 5]\n", "\n", " Returns:\n", " Nothing\n", " \"\"\"\n", "\n", " super(ConvVAE, self).__init__()\n", "\n", " # With padding=0, the number of pixels cut off from\n", " # each image dimension\n", " # is filter_size // 2. Double it to get the amount\n", " # of pixels lost in\n", " # width and height per Conv2D layer, or added back\n", " # in per\n", " # ConvTranspose2D layer.\n", " filter_reduction = 2 * (filter_size // 2)\n", "\n", " # After passing input through two Conv2d layers,\n", " # the shape will be\n", " # 'shape_after_conv'. This is also the shape that\n", " # will go into the first\n", " # deconvolution layer in the decoder\n", " self.shape_after_conv = (num_filters,\n", " data_shape[1]-2*filter_reduction,\n", " data_shape[2]-2*filter_reduction)\n", " flat_size_after_conv = self.shape_after_conv[0] \\\n", " * self.shape_after_conv[1] \\\n", " * self.shape_after_conv[2]\n", "\n", " # Define the recognition model (encoder or q) part\n", " self.q_bias = BiasLayer(data_shape)\n", " self.q_conv_1 = nn.Conv2d(data_shape[0], num_filters, 5)\n", " self.q_conv_2 = nn.Conv2d(num_filters, num_filters, 5)\n", " self.q_flatten = nn.Flatten()\n", " self.q_fc_phi = nn.Linear(flat_size_after_conv, K+1)\n", "\n", " # Define the generative model (decoder or p) part\n", " self.p_fc_upsample = nn.Linear(K, flat_size_after_conv)\n", " self.p_unflatten = nn.Unflatten(-1, self.shape_after_conv)\n", " self.p_deconv_1 = nn.ConvTranspose2d(num_filters, num_filters, 5)\n", " self.p_deconv_2 = nn.ConvTranspose2d(num_filters, data_shape[0], 5)\n", " self.p_bias = BiasLayer(data_shape)\n", "\n", " # Define a special extra parameter to learn\n", " # scalar sig_x for all pixels\n", " self.log_sig_x = nn.Parameter(torch.zeros(()))\n", "\n", " def infer(self, x):\n", " \"\"\"\n", " Map (batch of) x to (batch of) phi which\n", " can then be passed to\n", " rsample to get z\n", "\n", " Args:\n", " x: torch.tensor\n", " Input features\n", "\n", " Returns:\n", " phi: np.ndarray\n", " Relative entropy\n", " \"\"\"\n", " s = self.q_bias(x)\n", " s = F.relu(self.q_conv_1(s))\n", " s = F.relu(self.q_conv_2(s))\n", " flat_s = s.view(s.size()[0], -1)\n", " phi = self.q_fc_phi(flat_s)\n", " return phi\n", "\n", " def generate(self, zs):\n", " \"\"\"\n", " Map [b,n,k] sized samples of z to\n", " [b,n,p] sized images\n", "\n", " Args:\n", " zs: np.ndarray\n", " Samples\n", "\n", " Returns:\n", " mu_zs: np.ndarray\n", " Mean of samples\n", " \"\"\"\n", " # Note that for the purposes of passing\n", " # through the generator, we need\n", " # to reshape zs to be size [b*n,k]\n", " b, n, k = zs.size()\n", " s = zs.view(b*n, -1)\n", " s = F.relu(self.p_fc_upsample(s)).view((b*n,) + self.shape_after_conv)\n", " s = F.relu(self.p_deconv_1(s))\n", " s = self.p_deconv_2(s)\n", " s = self.p_bias(s)\n", " mu_xs = s.view(b, n, -1)\n", " return mu_xs\n", "\n", " def decode(self, zs):\n", " \"\"\"\n", " Decoder\n", "\n", " Args:\n", " zs: np.ndarray\n", " Samples\n", "\n", " Returns:\n", " Generated images\n", " \"\"\"\n", " # Included for compatability with conv-AE code\n", " return self.generate(zs.unsqueeze(0))\n", "\n", " def forward(self, x):\n", " \"\"\"\n", " Forward pass\n", "\n", " Args:\n", " x: torch.tensor\n", " Input image\n", "\n", " Returns:\n", " Generated images\n", " \"\"\"\n", " # VAE.forward() is not used for training,\n", " # but we'll treat it like a\n", " # classic autoencoder by taking a single\n", " # sample of z ~ q\n", " phi = self.infer(x)\n", " zs = rsample(phi, 1)\n", " return self.generate(zs).view(x.size())\n", "\n", " def elbo(self, x, n=1):\n", " \"\"\"\n", " Run input end to end through the VAE\n", " and compute the ELBO using n\n", " samples of z\n", "\n", " Args:\n", " x: torch.tensor\n", " Input image\n", " n: int\n", " Number of samples of z\n", "\n", " Returns:\n", " Difference between true and estimated KL divergence\n", " \"\"\"\n", " phi = self.infer(x)\n", " zs = rsample(phi, n)\n", " mu_xs = self.generate(zs)\n", " return log_p_x(x, mu_xs, self.log_sig_x.exp()) - kl_q_p(zs, phi)\n", "\n", "\n", "def expected_z(phi):\n", " \"\"\"\n", " Expected sample entropy\n", "\n", " Args:\n", " phi: list\n", " Relative entropy\n", "\n", " Returns:\n", " Expected sample entropy\n", " \"\"\"\n", " return phi[:, :-1]\n", "\n", "\n", "def rsample(phi, n_samples):\n", " \"\"\"\n", " Sample z ~ q(z;phi)\n", " Output z is size [b,n_samples,K] given\n", " phi with shape [b,K+1]. The first K\n", " entries of each row of phi are the mean of q,\n", " and phi[:,-1] is the log\n", " standard deviation\n", "\n", " Args:\n", " phi: list\n", " Relative entropy\n", " n_samples: int\n", " Number of samples\n", "\n", " Returns:\n", " Output z is size [b,n_samples,K] given\n", " phi with shape [b,K+1]. The first K\n", " entries of each row of phi are the mean of q,\n", " and phi[:,-1] is the log\n", " standard deviation\n", " \"\"\"\n", " b, kplus1 = phi.size()\n", " k = kplus1-1\n", " mu, sig = phi[:, :-1], phi[:,-1].exp()\n", " eps = torch.randn(b, n_samples, k, device=phi.device)\n", " return eps*sig.view(b,1,1) + mu.view(b,1,k)\n", "\n", "\n", "def train_vae(vae, dataset, epochs=10, n_samples=1000):\n", " \"\"\"\n", " Train VAE\n", "\n", " Args:\n", " vae: nn.module\n", " Model\n", " dataset: function\n", " Dataset\n", " epochs: int\n", " Epochs\n", " n_samples: int\n", " Number of samples\n", "\n", " Returns:\n", " elbo_vals: list\n", " List of values obtained from ELBO\n", " \"\"\"\n", " opt = torch.optim.Adam(vae.parameters(), lr=1e-3, weight_decay=0)\n", " elbo_vals = []\n", " vae.to(DEVICE)\n", " vae.train()\n", " loader = DataLoader(dataset, batch_size=250, shuffle=True, pin_memory=True)\n", " for epoch in trange(epochs, desc='Epochs'):\n", " for im, _ in tqdm(loader, total=len(dataset) // 250, desc='Batches', leave=False):\n", " im = im.to(DEVICE)\n", " opt.zero_grad()\n", " loss = -vae.elbo(im)\n", " loss.backward()\n", " opt.step()\n", "\n", " elbo_vals.append(-loss.item())\n", " vae.to('cpu')\n", " vae.eval()\n", " return elbo_vals\n", "\n", "\n", "trained_conv_VarAE = ConvVAE(K=K_VAE)\n", "elbo_vals = train_vae(trained_conv_VarAE, train_set, n_samples=10000)\n", "\n", "print(f'Learned sigma_x is {torch.exp(trained_conv_VarAE.log_sig_x)}')\n", "\n", "# Uncomment below if you'd like to see the the training\n", "# curve of the evaluated ELBO loss function\n", "# ELBO is the loss function used to train VAEs\n", "# (see lecture!)\n", "plt.figure()\n", "plt.plot(elbo_vals)\n", "plt.xlabel('Batch #')\n", "plt.ylabel('ELBO')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "ELBO is the loss function used to train VAEs - note that we are maximizing ELBO (higher ELBO is better). We implement this in PyTorch code set up to minimize things by making the loss equal to negative ELBO." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 4.1: Components of a VAE\n", "\n", "*Recognition models and density networks*\n", "\n", "\n", "Variational AutoEncoders (VAEs) are a lot like the classic AutoEncoders (AEs), but where we explicitly think about probability distributions. In the language of VAEs, the __encoder__ is replaced with a __recognition model__, and the __decoder__ is replaced with a __density network__.\n", "\n", "Where in a classic autoencoder the encoder maps from images to a single hidden vector,\n", "\n", "\\begin{equation}\n", "\\mathbf{x} \\overset{\\text{AE}}{\\longrightarrow} \\mathbf{h} \\, ,\n", "\\end{equation}\n", "\n", "in a VAE we would say that a recognition model maps from inputs to entire __distributions__ over hidden vectors,\n", "\n", "\\begin{equation}\n", "\\mathbf{x} \\overset{\\text{VAE}}{\\longrightarrow} q_{\\mathbf{w_e}}(\\mathbf{z}) \\, ,\n", "\\end{equation}\n", "\n", "which we will then sample from. Here $\\mathbf{w_e}$ refers to the weights of the recognition model, which parametarize our distribution generating network. We'll say more in a moment about what kind of distribution $q_{\\mathbf{w_e}}(\\mathbf{z})$ is.\n", "Part of what makes VAEs work is that the loss function will require good reconstructions of the input not just for a single $\\mathbf{z}$, but _on average_ from samples of $\\mathbf{z} \\sim q_{\\mathbf{w_e}}(\\mathbf{z})$.\n", "\n", "In the classic autoencoder, we had a decoder which maps from hidden vectors to reconstructions of the input:\n", "\n", "\\begin{equation}\n", "\\mathbf{h} \\overset{\\text{AE}}{\\longrightarrow} \\mathbf{x'} \\, .\n", "\\end{equation}\n", "\n", "In a density network, reconstructions are expressed in terms of a distribution:\n", "\n", "\\begin{equation}\n", "\\mathbf{z} \\overset{\\text{VAE}}{\\longrightarrow} p_{\\mathbf{w_d}}(\\mathbf{x}|\\mathbf{z})\n", "\\end{equation}\n", "\n", "where, as above, $p_{\\mathbf{w_d}}(\\mathbf{x}|\\mathbf{z})$ is defined by mapping $\\mathbf{z}$ through a density network then treating the resulting $f(\\mathbf{z};\\mathbf{w_d})$ as the mean of a (Gaussian) distribution over $\\mathbf{x}$. Similarly, our reconstruction distribution is parametarized by the weights of the density network." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 4.2: Generating novel images from the decoder\n", "\n", "If we isolate the decoder part of the AutoEncoder, what we have is a neural network that takes as input a vector of size $K$ and produces as output an image that looks something like our training data. Recall that in our earlier notation, we had an input $\\mathbf{x}$ that was mapped to a low-dimensional hidden representation $\\mathbf{h}$ which was then decoded into a reconstruction of the input, $\\mathbf{x'}$:\n", "\n", "\\begin{equation}\n", "\\mathbf{x} \\overset{\\text{encode}}{\\longrightarrow} \\mathbf{h} \\overset{\\text{decode}}{\\longrightarrow} \\mathbf{x'}\\, .\n", "\\end{equation}\n", "\n", "Partly as a matter of convention, and partly to distinguish where we are going next from the previous section, we're going to introduce a new variable, $\\mathbf{z} \\in \\mathbb{R}^K$, which will take the place of $\\mathbf{h}$. The key difference is that while $\\mathbf{h}$ is produced by the encoder for a particular $\\mathbf{x}$, $\\mathbf{z}$ will be drawn out of thin air from a prior of our choosing:\n", "\n", "\\begin{equation}\n", "\\mathbf{z} \\sim p(\\mathbf{z})\\\\ \\mathbf{z} \\overset{\\text{decode}}{\\longrightarrow} \\mathbf{x}\\, .\n", "\\end{equation}\n", "\n", "(Note that it is also common convention to drop the \"prime\" on $\\mathbf{x}$ when it is no longer being thought of as a \"reconstruction\")." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Coding Exercise 4.2: Generating images\n", "\n", "Complete the code below to generate some images from the VAE that we trained above." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "def generate_images(autoencoder, K, n_images=1):\n", " \"\"\"\n", " Generate n_images 'new' images from the decoder part of the given\n", " autoencoder.\n", "\n", " Args:\n", " autoencoder: nn.module\n", " Autoencoder model\n", " K: int\n", " Bottleneck dimension\n", " n_images: int\n", " Number of images\n", "\n", " Returns:\n", " x: torch.tensor\n", " (n_images, channels, height, width) tensor of images\n", " \"\"\"\n", " # Concatenate tuples to get (n_images, channels, height, width)\n", " output_shape = (n_images,) + data_shape\n", " with torch.no_grad():\n", " ####################################################################\n", " # Fill in all missing code below (...),\n", " # then remove or comment the line below to test your function\n", " raise NotImplementedError(\"Please complete the `generate_images` function!\")\n", " ####################################################################\n", " # Sample z from a unit gaussian, pass through autoencoder.decode()\n", " z = ...\n", " x = ...\n", "\n", " return x.reshape(output_shape)\n", "\n", "\n", "\n", "set_seed(seed=SEED)\n", "## Uncomment to test your solution\n", "# images = generate_images(trained_conv_AE, K, n_images=9)\n", "# plot_images(images, plt_title='Images Generated from the Conv-AE')\n", "# images = generate_images(trained_conv_VarAE, K_VAE, n_images=9)\n", "# plot_images(images, plt_title='Images Generated from a Conv-Variational-AE')" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W2D4_GenerativeModels/solutions/W2D4_Tutorial1_Solution_775a81ae.py)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Generating_images_Exercise\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Think! 4.2: AutoEncoders vs. Variational AutoEncoders\n", "\n", "Compare the images generated by the AutoEncoder to the images generated by the Variational AutoEncoder. You can run the code a few times to see a variety of examples.\n", "\n", "Does one set look more like the training set (handwritten digits) than the other? What is driving this difference?" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W2D4_GenerativeModels/solutions/W2D4_Tutorial1_Solution_0e74baf7.py)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_AutoEncoders_vs_Variational_AutoEncoders_Discussion\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 5: State of the art VAEs and Wrap-up" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 5: State-Of-The-Art VAEs\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 5: State-Of-The-Art VAEs\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'PXBl3KwRfh4'), ('Bilibili', 'BV1hg411M7KY')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_SOTA_VAEs_and_WrapUp_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Summary\n", "\n", "Through this tutorial, we have learned\n", "- What a generative model is and why we are interested in them.\n", "- How latent variable models relate to generative models with the example of pPCA.\n", "- What a basic AutoEncoder is and how they relate to other latent variable models.\n", "- The basics of Variational AutoEncoders and how they function as generative models.\n", "- An introduction to the broad applications of VAEs." ] } ], "metadata": { "accelerator": "GPU", "colab": { "collapsed_sections": [], "include_colab_link": true, "machine_shape": "hm", "name": "W2D4_Tutorial1", "provenance": [], "toc_visible": true }, "kernel": { "display_name": "Python 3", "language": "python", "name": "python3" }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 0 }