{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {},
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/NeuromatchAcademy/course-content-dl/blob/main/tutorials/W2D1_Regularization/student/W2D1_Tutorial1.ipynb\" target=\"_blank\"><img alt=\"Open In Colab\" src=\"https://colab.research.google.com/assets/colab-badge.svg\"/></a>   <a href=\"https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/NeuromatchAcademy/course-content-dl/main/tutorials/W2D1_Regularization/student/W2D1_Tutorial1.ipynb\" target=\"_blank\"><img alt=\"Open in Kaggle\" src=\"https://kaggle.com/static/images/open-in-kaggle.svg\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Tutorial 1: Regularization techniques part 1\n",
    "\n",
    "**Week 2, Day 1: Regularization**\n",
    "\n",
    "**By Neuromatch Academy**\n",
    "\n",
    "__Content creators:__ Ravi Teja Konkimalla, Mohitrajhu Lingan Kumaraian, Kevin Machado Gamboa, Kelson Shilling-Scrivo, Lyle Ungar\n",
    "\n",
    "__Content reviewers:__ Piyush Chauhan, Siwei Bai, Kelson Shilling-Scrivo\n",
    "\n",
    "__Content editors:__ Roberto Guidotti, Spiros Chavlis\n",
    "\n",
    "__Production editors:__ Saeed Salehi, Gagana B, Spiros Chavlis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Tutorial Objectives\n",
    "\n",
    "1. Big Artificial Neural Networks (ANNs) are efficient universal approximators due to their adaptive basis functions\n",
    "2. ANNs memorize some but generalize well\n",
    "3. Regularization as shrinkage of overparameterized models: early stopping"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown\n",
    "from IPython.display import IFrame\n",
    "from ipywidgets import widgets\n",
    "out = widgets.Output()\n",
    "with out:\n",
    "    print(f\"If you want to download the slides: https://osf.io/download/mf79a/\")\n",
    "    display(IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/mf79a/?direct%26mode=render%26action=download%26mode=render\", width=730, height=410))\n",
    "display(out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Setup\n",
    "Note that some of the code for today can take up to an hour to run. We have therefore \"hidden\" the code and shown the resulting outputs.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Install dependencies\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " **WARNING**: There may be *errors* and/or *warnings* reported during the installation. However, they should be ignored.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Install dependencies\n",
    "\n",
    "# @markdown **WARNING**: There may be *errors* and/or *warnings* reported during the installation. However, they should be ignored.\n",
    "\n",
    "!pip install imageio --quiet\n",
    "!pip install imageio-ffmpeg --quiet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Install and import feedback gadget\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Install and import feedback gadget\n",
    "\n",
    "!pip3 install vibecheck datatops --quiet\n",
    "\n",
    "from vibecheck import DatatopsContentReviewContainer\n",
    "def content_review(notebook_section: str):\n",
    "    return DatatopsContentReviewContainer(\n",
    "        \"\",  # No text prompt\n",
    "        notebook_section,\n",
    "        {\n",
    "            \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n",
    "            \"name\": \"neuromatch_dl\",\n",
    "            \"user_key\": \"f379rz8y\",\n",
    "        },\n",
    "    ).render()\n",
    "\n",
    "\n",
    "feedback_prefix = \"W2D1_T1\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# Imports\n",
    "import time\n",
    "import copy\n",
    "import torch\n",
    "import pathlib\n",
    "\n",
    "import numpy as np\n",
    "import torch.nn as nn\n",
    "import torch.optim as optim\n",
    "import torch.nn.functional as F\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.animation as animation\n",
    "\n",
    "from tqdm.auto import tqdm\n",
    "from IPython.display import HTML\n",
    "from torchvision import transforms\n",
    "from torchvision.datasets import ImageFolder"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Figure Settings\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Figure Settings\n",
    "import logging\n",
    "logging.getLogger('matplotlib.font_manager').disabled = True\n",
    "\n",
    "import ipywidgets as widgets\n",
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/content-creation/main/nma.mplstyle\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Loading Animal Faces data\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Loading Animal Faces data\n",
    "import requests, os\n",
    "from zipfile import ZipFile\n",
    "\n",
    "print(\"Start downloading and unzipping `AnimalFaces` dataset...\")\n",
    "name = 'afhq'\n",
    "fname = f\"{name}.zip\"\n",
    "url = f\"https://osf.io/kgfvj/download\"\n",
    "\n",
    "if not os.path.exists(fname):\n",
    "  r = requests.get(url, allow_redirects=True)\n",
    "  with open(fname, 'wb') as fh:\n",
    "    fh.write(r.content)\n",
    "\n",
    "  if os.path.exists(fname):\n",
    "    with ZipFile(fname, 'r') as zfile:\n",
    "      zfile.extractall(f\".\")\n",
    "      os.remove(fname)\n",
    "\n",
    "print(\"Download completed.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Loading Animal Faces Randomized data\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Loading Animal Faces Randomized data\n",
    "\n",
    "print(\"Start downloading and unzipping `Randomized AnimalFaces` dataset...\")\n",
    "\n",
    "names = ['afhq_random_32x32', 'afhq_10_32x32']\n",
    "urls = [\"https://osf.io/9sj7p/download\",\n",
    "        \"https://osf.io/wvgkq/download\"]\n",
    "\n",
    "\n",
    "for i, name in enumerate(names):\n",
    "  url = urls[i]\n",
    "  fname = f\"{name}.zip\"\n",
    "\n",
    "  if not os.path.exists(fname):\n",
    "    r = requests.get(url, allow_redirects=True)\n",
    "    with open(fname, 'wb') as fh:\n",
    "      fh.write(r.content)\n",
    "\n",
    "    if os.path.exists(fname):\n",
    "      with ZipFile(fname, 'r') as zfile:\n",
    "        zfile.extractall(f\".\")\n",
    "        os.remove(fname)\n",
    "\n",
    "print(\"Download completed.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Plotting functions\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Plotting functions\n",
    "def imshow(img):\n",
    "  \"\"\"\n",
    "  Display unnormalized image\n",
    "\n",
    "  Args:\n",
    "    img: np.ndarray\n",
    "      Datapoint to visualize\n",
    "\n",
    "  Returns:\n",
    "    Nothing\n",
    "  \"\"\"\n",
    "  img = img / 2 + 0.5     # Unnormalize\n",
    "  npimg = img.numpy()\n",
    "  plt.imshow(np.transpose(npimg, (1, 2, 0)))\n",
    "  plt.axis(False)\n",
    "  plt.show()\n",
    "\n",
    "\n",
    "def plot_weights(norm, labels, ws,\n",
    "                 title='Weight Size Measurement'):\n",
    "  \"\"\"\n",
    "  Plot of weight size measurement [norm value vs layer]\n",
    "\n",
    "  Args:\n",
    "    norm: float\n",
    "      Norm values\n",
    "    labels: list\n",
    "      Targets\n",
    "    ws: list\n",
    "      Weights\n",
    "    title: string\n",
    "      Title of plot\n",
    "\n",
    "  Returns:\n",
    "    Nothing\n",
    "  \"\"\"\n",
    "  plt.figure(figsize=[8, 6])\n",
    "  plt.title(title)\n",
    "  plt.ylabel('Frobenius Norm Value')\n",
    "  plt.xlabel('Model Layers')\n",
    "  plt.bar(labels, ws)\n",
    "  plt.axhline(y=norm,\n",
    "              linewidth=1,\n",
    "              color='r',\n",
    "              ls='--',\n",
    "              label='Total Model F-Norm')\n",
    "  plt.legend()\n",
    "  plt.show()\n",
    "\n",
    "\n",
    "def early_stop_plot(train_acc_earlystop,\n",
    "                    val_acc_earlystop, best_epoch):\n",
    "  \"\"\"\n",
    "  Plot of early stopping\n",
    "\n",
    "  Args:\n",
    "    train_acc_earlystop: np.ndarray\n",
    "      Training accuracy log until early stop point\n",
    "    val_acc_earlystop: np.ndarray\n",
    "      Val accuracy log until early stop point\n",
    "    best_epoch: int\n",
    "      Epoch at which early stopping occurs\n",
    "\n",
    "  Returns:\n",
    "    Nothing\n",
    "  \"\"\"\n",
    "  plt.figure(figsize=(8, 6))\n",
    "  plt.plot(val_acc_earlystop,label='Val - Early',c='red',ls = 'dashed')\n",
    "  plt.plot(train_acc_earlystop,label='Train - Early',c='red',ls = 'solid')\n",
    "  plt.axvline(x=best_epoch, c='green', ls='dashed',\n",
    "              label='Epoch for Max Val Accuracy')\n",
    "  plt.title('Early Stopping')\n",
    "  plt.ylabel('Accuracy (%)')\n",
    "  plt.xlabel('Epoch')\n",
    "  plt.legend()\n",
    "  plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Set random seed\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Executing `set_seed(seed=seed)` you are setting the seed\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Set random seed\n",
    "\n",
    "# @markdown Executing `set_seed(seed=seed)` you are setting the seed\n",
    "\n",
    "# For DL its critical to set the random seed so that students can have a\n",
    "# baseline to compare their results to expected results.\n",
    "# Read more here: https://pytorch.org/docs/stable/notes/randomness.html\n",
    "\n",
    "# Call `set_seed` function in the exercises to ensure reproducibility.\n",
    "import random\n",
    "import torch\n",
    "\n",
    "def set_seed(seed=None, seed_torch=True):\n",
    "  \"\"\"\n",
    "  Function that controls randomness. NumPy and random modules must be imported.\n",
    "\n",
    "  Args:\n",
    "    seed : Integer\n",
    "      A non-negative integer that defines the random state. Default is `None`.\n",
    "    seed_torch : Boolean\n",
    "      If `True` sets the random seed for pytorch tensors, so pytorch module\n",
    "      must be imported. Default is `True`.\n",
    "\n",
    "  Returns:\n",
    "    Nothing.\n",
    "  \"\"\"\n",
    "  if seed is None:\n",
    "    seed = np.random.choice(2 ** 32)\n",
    "  random.seed(seed)\n",
    "  np.random.seed(seed)\n",
    "  if seed_torch:\n",
    "    torch.manual_seed(seed)\n",
    "    torch.cuda.manual_seed_all(seed)\n",
    "    torch.cuda.manual_seed(seed)\n",
    "    torch.backends.cudnn.benchmark = False\n",
    "    torch.backends.cudnn.deterministic = True\n",
    "\n",
    "  print(f'Random seed {seed} has been set.')\n",
    "\n",
    "\n",
    "# In case that `DataLoader` is used\n",
    "def seed_worker(worker_id):\n",
    "  \"\"\"\n",
    "  DataLoader will reseed workers following randomness in\n",
    "  multi-process data loading algorithm.\n",
    "\n",
    "  Args:\n",
    "    worker_id: integer\n",
    "      ID of subprocess to seed. 0 means that\n",
    "      the data will be loaded in the main process\n",
    "      Refer: https://pytorch.org/docs/stable/data.html#data-loading-randomness for more details\n",
    "\n",
    "  Returns:\n",
    "    Nothing\n",
    "  \"\"\"\n",
    "  worker_seed = torch.initial_seed() % 2**32\n",
    "  np.random.seed(worker_seed)\n",
    "  random.seed(worker_seed)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Set device (GPU or CPU). Execute `set_device()`\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Set device (GPU or CPU). Execute `set_device()`\n",
    "# especially if torch modules used.\n",
    "\n",
    "# Inform the user if the notebook uses GPU or CPU.\n",
    "\n",
    "def set_device():\n",
    "  \"\"\"\n",
    "  Set the device. CUDA if available, CPU otherwise\n",
    "\n",
    "  Args:\n",
    "    None\n",
    "\n",
    "  Returns:\n",
    "    Nothing\n",
    "  \"\"\"\n",
    "  device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
    "  if device != \"cuda\":\n",
    "    print(\"WARNING: For this notebook to perform best, \"\n",
    "        \"if possible, in the menu under `Runtime` -> \"\n",
    "        \"`Change runtime type.`  select `GPU` \")\n",
    "  else:\n",
    "    print(\"GPU is enabled in this notebook.\")\n",
    "\n",
    "  return device"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "SEED = 2021\n",
    "set_seed(seed=SEED)\n",
    "DEVICE = set_device()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Section 0: Defining useful functions\n",
    "Let's start the tutorial by defining some functions which we will use frequently today, such as: `AnimalNet`, `train`, `test` and `main`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "class AnimalNet(nn.Module):\n",
    "  \"\"\"\n",
    "  Network Class - Animal Faces\n",
    "  \"\"\"\n",
    "\n",
    "  def __init__(self):\n",
    "    \"\"\"\n",
    "    Initialize parameters of AnimalNet\n",
    "\n",
    "    Args:\n",
    "      None\n",
    "\n",
    "    Returns:\n",
    "      Nothing\n",
    "    \"\"\"\n",
    "    super(AnimalNet, self).__init__()\n",
    "    self.fc1 = nn.Linear(3 * 32 * 32, 128)\n",
    "    self.fc2 = nn.Linear(128, 32)\n",
    "    self.fc3 = nn.Linear(32, 3)\n",
    "\n",
    "  def forward(self, x):\n",
    "    \"\"\"\n",
    "    Forward Pass of AnimalNet\n",
    "\n",
    "    Args:\n",
    "      x: torch.tensor\n",
    "        Input features\n",
    "\n",
    "    Returns:\n",
    "      output: torch.tensor\n",
    "        Outputs/Predictions\n",
    "    \"\"\"\n",
    "    x = x.view(x.shape[0],-1)\n",
    "    x = F.relu(self.fc1(x))\n",
    "    x = F.relu(self.fc2(x))\n",
    "    x = self.fc3(x)\n",
    "    output = F.log_softmax(x, dim=1)\n",
    "    return output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "The train function takes in the current model, along with the train_loader and loss function, and updates the parameters for a single pass of the entire dataset. The test function takes in the current model after every epoch and calculates the accuracy on the test dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "def train(args, model, train_loader, optimizer,\n",
    "          reg_function1=None, reg_function2=None, criterion=F.nll_loss):\n",
    "  \"\"\"\n",
    "  Trains the current input model using the data\n",
    "  from Train_loader and Updates parameters for a single pass\n",
    "\n",
    "  Args:\n",
    "    args: dictionary\n",
    "      Dictionary with epochs: 200, lr: 5e-3, momentum: 0.9, device: DEVICE\n",
    "    model: nn.module\n",
    "      Neural network instance\n",
    "    train_loader: torch.loader\n",
    "      Input dataset\n",
    "    optimizer: function\n",
    "      Optimizer\n",
    "    reg_function1: function\n",
    "      Regularisation function [default: None]\n",
    "    reg_function2: function\n",
    "      Regularisation function [default: None]\n",
    "    criterion: function\n",
    "      Specifies loss function [default: nll_loss]\n",
    "\n",
    "  Returns:\n",
    "    model: nn.module\n",
    "      Neural network instance post training\n",
    "  \"\"\"\n",
    "  device = args['device']\n",
    "  model.train()\n",
    "  for batch_idx, (data, target) in enumerate(train_loader):\n",
    "    data, target = data.to(device), target.to(device)\n",
    "    optimizer.zero_grad()\n",
    "    output = model(data)\n",
    "    if reg_function1 is None:\n",
    "      loss = criterion(output, target)\n",
    "    elif reg_function2 is None:\n",
    "      loss = criterion(output, target)+args['lambda']*reg_function1(model)\n",
    "    else:\n",
    "      loss = criterion(output, target) + args['lambda1']*reg_function1(model) + args['lambda2']*reg_function2(model)\n",
    "    loss.backward()\n",
    "    optimizer.step()\n",
    "\n",
    "  return model\n",
    "\n",
    "\n",
    "def test(model, test_loader, criterion=F.nll_loss, device='cpu'):\n",
    "  \"\"\"\n",
    "  Tests the current model\n",
    "\n",
    "  Args:\n",
    "    model: nn.module\n",
    "      Neural network instance\n",
    "    device: string\n",
    "      GPU/CUDA if available, CPU otherwise\n",
    "    test_loader: torch.loader\n",
    "      Test dataset\n",
    "    criterion: function\n",
    "      Specifies loss function [default: nll_loss]\n",
    "\n",
    "  Returns:\n",
    "    test_loss: float\n",
    "      Test loss\n",
    "  \"\"\"\n",
    "  model.eval()\n",
    "  test_loss = 0\n",
    "  correct = 0\n",
    "  with torch.no_grad():\n",
    "    for data, target in test_loader:\n",
    "      data, target = data.to(device), target.to(device)\n",
    "      output = model(data)\n",
    "      test_loss += criterion(output, target, reduction='sum').item()  # Sum up batch loss\n",
    "      pred = output.argmax(dim=1, keepdim=True)  # Get the index of the max log-probability\n",
    "      correct += pred.eq(target.view_as(pred)).sum().item()\n",
    "\n",
    "  test_loss /= len(test_loader.dataset)\n",
    "  return 100. * correct / len(test_loader.dataset)\n",
    "\n",
    "\n",
    "def main(args, model, train_loader, val_loader,\n",
    "         reg_function1=None, reg_function2=None):\n",
    "  \"\"\"\n",
    "  Trains the model with train_loader and\n",
    "  tests the learned model using val_loader\n",
    "\n",
    "  Args:\n",
    "    args: dictionary\n",
    "      Dictionary with epochs: 200, lr: 5e-3, momentum: 0.9, device: DEVICE\n",
    "    model: nn.module\n",
    "      Neural network instance\n",
    "    train_loader: torch.loader\n",
    "      Train dataset\n",
    "    val_loader: torch.loader\n",
    "      Validation set\n",
    "    reg_function1: function\n",
    "      Regularisation function [default: None]\n",
    "    reg_function2: function\n",
    "      Regularisation function [default: None]\n",
    "\n",
    "  Returns:\n",
    "    val_acc_list: list\n",
    "      Log of validation accuracy\n",
    "    train_acc_list: list\n",
    "      Log of training accuracy\n",
    "    param_norm_list: list\n",
    "      Log of frobenius norm\n",
    "    trained_model: nn.module\n",
    "      Trained model/model post training\n",
    "  \"\"\"\n",
    "\n",
    "  device = args['device']\n",
    "\n",
    "  model = model.to(device)\n",
    "  optimizer = optim.SGD(model.parameters(), lr=args['lr'],\n",
    "                        momentum=args['momentum'])\n",
    "\n",
    "  val_acc_list, train_acc_list,param_norm_list = [], [], []\n",
    "  for epoch in tqdm(range(args['epochs'])):\n",
    "    trained_model = train(args, model, train_loader, optimizer,\n",
    "                          reg_function1=reg_function1,\n",
    "                          reg_function2=reg_function2)\n",
    "    train_acc = test(trained_model, train_loader, device=device)\n",
    "    val_acc = test(trained_model, val_loader, device=device)\n",
    "    param_norm = calculate_frobenius_norm(trained_model)\n",
    "    train_acc_list.append(train_acc)\n",
    "    val_acc_list.append(val_acc)\n",
    "    param_norm_list.append(param_norm)\n",
    "\n",
    "  return val_acc_list, train_acc_list, param_norm_list, trained_model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Section 1: Regularization is Shrinkage\n",
    "\n",
    "*Time estimate: ~20 mins*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Video 1: Introduction to Regularization\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Video 1: Introduction to Regularization\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import YouTubeVideo\n",
    "from IPython.display import IFrame\n",
    "from IPython.display import display\n",
    "\n",
    "\n",
    "class PlayVideo(IFrame):\n",
    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
    "    self.id = id\n",
    "    if source == 'Bilibili':\n",
    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
    "    elif source == 'Osf':\n",
    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
    "\n",
    "\n",
    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
    "  tab_contents = []\n",
    "  for i, video_id in enumerate(video_ids):\n",
    "    out = widgets.Output()\n",
    "    with out:\n",
    "      if video_ids[i][0] == 'Youtube':\n",
    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
    "                             height=H, fs=fs, rel=0)\n",
    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
    "      else:\n",
    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
    "                          height=H, fs=fs, autoplay=False)\n",
    "        if video_ids[i][0] == 'Bilibili':\n",
    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
    "        elif video_ids[i][0] == 'Osf':\n",
    "          print(f'Video available at https://osf.io/{video.id}')\n",
    "      display(video)\n",
    "    tab_contents.append(out)\n",
    "  return tab_contents\n",
    "\n",
    "\n",
    "video_ids = [('Youtube', 'jhQAnIHTR6A'), ('Bilibili', 'BV1mo4y1X76E')]\n",
    "tab_contents = display_videos(video_ids, W=730, H=410)\n",
    "tabs = widgets.Tab()\n",
    "tabs.children = tab_contents\n",
    "for i in range(len(tab_contents)):\n",
    "  tabs.set_title(i, video_ids[i][0])\n",
    "display(tabs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Introduction_to_Regularization_Video\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "A key idea of neural nets is that they use models that are \"too complex\" - complex enough to fit all the noise in the data. One then needs to \"regularize\" them to make the models fit complex enough, but not too complex. The more complex the model, the better it fits the training data, but if it is too complex, it generalizes less well; it memorizes the training data but is less accurate on future test data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Video 2: Regularization as Shrinkage\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Video 2: Regularization as Shrinkage\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import YouTubeVideo\n",
    "from IPython.display import IFrame\n",
    "from IPython.display import display\n",
    "\n",
    "\n",
    "class PlayVideo(IFrame):\n",
    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
    "    self.id = id\n",
    "    if source == 'Bilibili':\n",
    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
    "    elif source == 'Osf':\n",
    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
    "\n",
    "\n",
    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
    "  tab_contents = []\n",
    "  for i, video_id in enumerate(video_ids):\n",
    "    out = widgets.Output()\n",
    "    with out:\n",
    "      if video_ids[i][0] == 'Youtube':\n",
    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
    "                             height=H, fs=fs, rel=0)\n",
    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
    "      else:\n",
    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
    "                          height=H, fs=fs, autoplay=False)\n",
    "        if video_ids[i][0] == 'Bilibili':\n",
    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
    "        elif video_ids[i][0] == 'Osf':\n",
    "          print(f'Video available at https://osf.io/{video.id}')\n",
    "      display(video)\n",
    "    tab_contents.append(out)\n",
    "  return tab_contents\n",
    "\n",
    "\n",
    "video_ids = [('Youtube', 'mhVbJ74upnQ'), ('Bilibili', 'BV1YL411H7Dv')]\n",
    "tab_contents = display_videos(video_ids, W=730, H=410)\n",
    "tabs = widgets.Tab()\n",
    "tabs.children = tab_contents\n",
    "for i in range(len(tab_contents)):\n",
    "  tabs.set_title(i, video_ids[i][0])\n",
    "display(tabs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Regularization_as_Shrinkage_Video\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "One way to think about Regularization is to think in terms of the magnitude of the overall weights of the model. A model with big weights can fit more data perfectly, whereas a model with smaller weights tends to underperform on the train set but can surprisingly do very well on the test set. Having the weights too small can also be an issue as it can then underfit the model.\n",
    "\n",
    "In these tutorials, we use the sum of the Frobenius norm of all the tensors in the model as a measure of the \"size of the model\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Coding Exercise 1: Frobenius Norm\n",
    "\n",
    "Before we start, let's define the Frobenius norm, sometimes also called the Euclidean norm of an $m×n$ matrix $A$  as the square root of the sum of the absolute squares of its elements.\n",
    "\n",
    "<br>\n",
    "\n",
    "\\begin{equation}\n",
    "  ||A||_F= \\sqrt{\\sum_{i=1}^m\\sum_{j=1}^n|a_{ij}|^2}\n",
    "\\end{equation}\n",
    "\n",
    "<br>\n",
    "\n",
    "This is just a measure of how big the matrix is, analogous to how big a vector is."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "**Hint:** Use the functions `model.parameters()` or `model.named_parameters()`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "def calculate_frobenius_norm(model):\n",
    "  \"\"\"\n",
    "  Function to calculate frobenius norm\n",
    "\n",
    "  Args:\n",
    "    model: nn.module\n",
    "      Neural network instance\n",
    "\n",
    "  Returns:\n",
    "    norm: float\n",
    "      Frobenius norm\n",
    "  \"\"\"\n",
    "  ####################################################################\n",
    "  # Fill in all missing code below (...),\n",
    "  # then remove or comment the line below to test your function\n",
    "  raise NotImplementedError(\"Define `calculate_frobenius_norm` function\")\n",
    "  ####################################################################\n",
    "  norm = 0.0\n",
    "  # Sum the square of all parameters\n",
    "  for param in model.parameters():\n",
    "    norm += ...\n",
    "\n",
    "  # Take a square root of the sum of squares of all the parameters\n",
    "  norm = ...\n",
    "  return norm\n",
    "\n",
    "\n",
    "\n",
    "# Seed added for reproducibility\n",
    "set_seed(seed=SEED)\n",
    "\n",
    "## uncomment below to test your code\n",
    "# net = nn.Linear(10, 1)\n",
    "# print(f'Frobenius norm of Single Linear Layer: {calculate_frobenius_norm(net)}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "```\n",
    "Random seed 2021 has been set.\n",
    "Frobenius Norm of Single Linear Layer: 0.6572162508964539\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W2D1_Regularization/solutions/W2D1_Tutorial1_Solution_9a1dd7ea.py)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Forbenius_norm_Exercise\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Apart from calculating the weight size for an entire model, we could also determine the weight size in every layer. For this, we can modify our `calculate_frobenius_norm` function as shown below.\n",
    "\n",
    "**Have a look how it works!!**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "def calculate_frobenius_norm(model):\n",
    "  \"\"\"\n",
    "  Calculate Frobenius Norm per Layer\n",
    "\n",
    "  Args:\n",
    "    model: nn.module\n",
    "      Neural network instance\n",
    "\n",
    "  Returns:\n",
    "    norm: float\n",
    "      Norm value\n",
    "    labels: list\n",
    "      Targets\n",
    "    ws: list\n",
    "      Weights\n",
    "  \"\"\"\n",
    "\n",
    "  # Initialization of variables\n",
    "  norm, ws, labels = 0.0, [], []\n",
    "\n",
    "  # Sum all the parameters\n",
    "  for name, parameters in model.named_parameters():\n",
    "    p = torch.sum(parameters**2)\n",
    "    norm += p\n",
    "\n",
    "    ws.append((p**0.5).cpu().detach().numpy())\n",
    "    labels.append(name)\n",
    "\n",
    "  # Take a square root of the sum of squares of all the parameters\n",
    "  norm = (norm**0.5).cpu().detach().numpy()\n",
    "\n",
    "  return norm, ws, labels\n",
    "\n",
    "\n",
    "set_seed(SEED)\n",
    "net = nn.Linear(10,1)\n",
    "norm, ws, labels = calculate_frobenius_norm(net)\n",
    "print(f'Frobenius norm of Single Linear Layer: {norm:.4f}')\n",
    "# Plots the weights\n",
    "plot_weights(norm, labels, ws)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Using the last function, `calculate_frobenius_norm`, we can also obtain the Frobenius norm per layer for a whole ANN model and use the `plot_weigts` function to visualize them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "set_seed(seed=SEED)\n",
    "\n",
    "# Creates a new model\n",
    "model = AnimalNet()\n",
    "\n",
    "# Calculates the forbenius norm per layer\n",
    "norm, ws, labels = calculate_frobenius_norm(model)\n",
    "print(f'Frobenius norm of Models weights: {norm:.4f}')\n",
    "\n",
    "# Plots the weights\n",
    "plot_weights(norm, labels, ws)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Section 2: Overfitting\n",
    "\n",
    "*Time estimate: ~15 mins*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Video 3: Overparameterization and Overfitting\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Video 3: Overparameterization and Overfitting\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import YouTubeVideo\n",
    "from IPython.display import IFrame\n",
    "from IPython.display import display\n",
    "\n",
    "\n",
    "class PlayVideo(IFrame):\n",
    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
    "    self.id = id\n",
    "    if source == 'Bilibili':\n",
    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
    "    elif source == 'Osf':\n",
    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
    "\n",
    "\n",
    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
    "  tab_contents = []\n",
    "  for i, video_id in enumerate(video_ids):\n",
    "    out = widgets.Output()\n",
    "    with out:\n",
    "      if video_ids[i][0] == 'Youtube':\n",
    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
    "                             height=H, fs=fs, rel=0)\n",
    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
    "      else:\n",
    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
    "                          height=H, fs=fs, autoplay=False)\n",
    "        if video_ids[i][0] == 'Bilibili':\n",
    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
    "        elif video_ids[i][0] == 'Osf':\n",
    "          print(f'Video available at https://osf.io/{video.id}')\n",
    "      display(video)\n",
    "    tab_contents.append(out)\n",
    "  return tab_contents\n",
    "\n",
    "\n",
    "video_ids = [('Youtube', '-HJ_9HxY38g'), ('Bilibili', 'BV1NX4y1A73i')]\n",
    "tab_contents = display_videos(video_ids, W=730, H=410)\n",
    "tabs = widgets.Tab()\n",
    "tabs.children = tab_contents\n",
    "for i in range(len(tab_contents)):\n",
    "  tabs.set_title(i, video_ids[i][0])\n",
    "display(tabs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Overparameterization_and_Overfitting_Video\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 2.1: Visualizing Overfitting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Let's create some synthetic dataset that we will use to illustrate overfitting in neural networks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "set_seed(seed=SEED)\n",
    "\n",
    "# Creating train data\n",
    "# Input\n",
    "X = torch.rand((10, 1))\n",
    "# Output\n",
    "Y = 2*X + 2*torch.empty((X.shape[0], 1)).normal_(mean=0, std=1)  # Adding small error in the data\n",
    "\n",
    "# Visualizing train data\n",
    "plt.figure(figsize=(8, 6))\n",
    "plt.scatter(X.numpy(),Y.numpy())\n",
    "plt.xlabel('input (x)')\n",
    "plt.ylabel('output(y)')\n",
    "plt.title('toy dataset')\n",
    "plt.show()\n",
    "\n",
    "# Creating test dataset\n",
    "X_test = torch.linspace(0, 1, 40)\n",
    "X_test = X_test.reshape((40, 1, 1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Let's create an overparametrized Neural Network that can fit on the dataset that we just created and train it.\n",
    "\n",
    "First, let's build the model architecture:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "class Net(nn.Module):\n",
    "  \"\"\"\n",
    "  Network Class - 2D with following structure\n",
    "  nn.Linear(1, 300) + leaky_relu(self.fc1(x)) # First fully connected layer\n",
    "  nn.Linear(300, 500) + leaky_relu(self.fc2(x)) # Second fully connected layer\n",
    "  nn.Linear(500, 1) # Final fully connected layer\n",
    "  \"\"\"\n",
    "\n",
    "  def __init__(self):\n",
    "    \"\"\"\n",
    "    Initialize parameters of Net\n",
    "\n",
    "    Args:\n",
    "      None\n",
    "\n",
    "    Returns:\n",
    "      Nothing\n",
    "    \"\"\"\n",
    "    super(Net, self).__init__()\n",
    "\n",
    "    self.fc1 = nn.Linear(1, 300)\n",
    "    self.fc2 = nn.Linear(300, 500)\n",
    "    self.fc3 = nn.Linear(500, 1)\n",
    "\n",
    "  def forward(self, x):\n",
    "    \"\"\"\n",
    "    Forward pass of Net\n",
    "\n",
    "    Args:\n",
    "      x: torch.tensor\n",
    "        Input features\n",
    "\n",
    "    Returns:\n",
    "      x: torch.tensor\n",
    "        Output/Predictions\n",
    "    \"\"\"\n",
    "    x = F.leaky_relu(self.fc1(x))\n",
    "    x = F.leaky_relu(self.fc2(x))\n",
    "    output = self.fc3(x)\n",
    "    return output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Next, let's define the different parameters for training our model:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "set_seed(seed=SEED)\n",
    "\n",
    "# Train the network on toy dataset\n",
    "model = Net()\n",
    "\n",
    "criterion = nn.MSELoss()\n",
    "optimizer = optim.Adam(model.parameters(), lr=1e-4)\n",
    "\n",
    "iters = 0\n",
    "# Calculates frobenius before training\n",
    "normi, wsi, label = calculate_frobenius_norm(model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "At this point, we can now train our model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "set_seed(seed=SEED)\n",
    "\n",
    "# Initializing variables\n",
    "# Losses\n",
    "train_loss = []\n",
    "test_loss = []\n",
    "\n",
    "# Model norm\n",
    "model_norm = []\n",
    "\n",
    "# Initializing variables to store weights\n",
    "norm_per_layer = []\n",
    "\n",
    "max_epochs = 10000\n",
    "\n",
    "running_predictions = np.empty((40, int(max_epochs / 500 + 1)))\n",
    "\n",
    "for epoch in tqdm(range(max_epochs)):\n",
    "  # Frobenius norm per epoch\n",
    "  norm, pl, layer_names = calculate_frobenius_norm(model)\n",
    "\n",
    "  # Training\n",
    "  model_norm.append(norm)\n",
    "  norm_per_layer.append(pl)\n",
    "  model.train()\n",
    "  optimizer.zero_grad()\n",
    "  predictions = model(X)\n",
    "  loss = criterion(predictions, Y)\n",
    "  loss.backward()\n",
    "  optimizer.step()\n",
    "\n",
    "  train_loss.append(loss.data)\n",
    "  model.eval()\n",
    "  Y_test = model(X_test)\n",
    "  loss = criterion(Y_test, 2*X_test)\n",
    "  test_loss.append(loss.data)\n",
    "\n",
    "  if (epoch % 500 == 0 or epoch == max_epochs - 1):\n",
    "    running_predictions[:, iters] = Y_test[:, 0, 0].detach().numpy()\n",
    "    iters += 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Now that we have finished training, let's see how the model has evolved over the training process."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Animation (Run Me!)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Animation (Run Me!)\n",
    "\n",
    "set_seed(seed=SEED)\n",
    "# Create a figure and axes\n",
    "fig = plt.figure(figsize=(14, 5))\n",
    "ax1 = plt.subplot(121)\n",
    "ax2 = plt.subplot(122)\n",
    "# Organizing subplots\n",
    "plot1, = ax1.plot([],[])\n",
    "plot2 = ax2.bar([], [])\n",
    "\n",
    "\n",
    "def frame(i):\n",
    "  \"\"\"\n",
    "  Load animation frame\n",
    "\n",
    "  Args:\n",
    "    i: int\n",
    "      Epoch number\n",
    "\n",
    "  Returns:\n",
    "    plot1: function\n",
    "      Subplot of test-data vs running predictions\n",
    "    plot2: function\n",
    "      Subplot of test-data vs running predictions\n",
    "  \"\"\"\n",
    "  ax1.clear()\n",
    "  title1 = ax1.set_title('')\n",
    "  ax1.set_xlabel(\"Input(x)\")\n",
    "  ax1.set_ylabel(\"Output(y)\")\n",
    "\n",
    "  ax2.clear()\n",
    "  ax2.set_xlabel('Layer names')\n",
    "  ax2.set_ylabel('Frobenius norm')\n",
    "  title2 = ax2.set_title('Weight Measurement: Forbenius Norm')\n",
    "\n",
    "  ax1.scatter(X.numpy(),Y.numpy())\n",
    "  plot1 = ax1.plot(X_test[:,0,:].detach().numpy(),\n",
    "                   running_predictions[:,i])\n",
    "  title1.set_text(f'Epochs: {i * 500}')\n",
    "  plot2 = ax2.bar(label, norm_per_layer[i*500])\n",
    "  plt.axhline(y=model_norm[i*500], linewidth=1,\n",
    "              color='r', ls='--',\n",
    "              label=f'Norm: {model_norm[i*500]:.2f}')\n",
    "  plt.legend()\n",
    "\n",
    "  return plot1, plot2\n",
    "\n",
    "\n",
    "anim = animation.FuncAnimation(fig, frame, frames=range(20),\n",
    "                               blit=False, repeat=False,\n",
    "                               repeat_delay=10000)\n",
    "html_anim = HTML(anim.to_html5_video())\n",
    "plt.close()\n",
    "\n",
    "import IPython\n",
    "IPython.display.display(html_anim)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Plot the train and test losses\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Plot the train and test losses\n",
    "plt.figure(figsize=(8, 6))\n",
    "plt.plot(train_loss,label='train_loss')\n",
    "plt.plot(test_loss,label='test_loss')\n",
    "plt.ylabel('loss')\n",
    "plt.xlabel('epochs')\n",
    "plt.title('loss vs epoch')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Think! 2.1: Interpreting losses\n",
    "\n",
    "Regarding the train and test graph above, discuss among yourselves:\n",
    "\n",
    "*   What trend do you see with respect to train and test losses (Where do you see the minimum of these losses?)\n",
    "*   What does it tell us about the model we trained?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W2D1_Regularization/solutions/W2D1_Tutorial1_Solution_d9ff50a5.py)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Interpreting_losses_Discussion\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Now let's visualize the Frobenious norm of the model as we trained. You should see that the value of weights increases over the epochs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Frobenious norm of the model\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Frobenious norm of the model\n",
    "plt.figure(figsize=(8, 6))\n",
    "plt.plot(model_norm)\n",
    "plt.ylabel('Norm of the model')\n",
    "plt.xlabel('Epochs')\n",
    "plt.title('Frobenious norm of the model')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Finally, you can compare the Frobenius norm per layer in the model, before and after training."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Frobenius norm per layer before and after training\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Frobenius norm per layer before and after training\n",
    "normf, wsf, label = calculate_frobenius_norm(model)\n",
    "\n",
    "plot_weights(float(normi), label, wsi,\n",
    "             title='Weight Size Before Training')\n",
    "plot_weights(float(normf), label, wsf,\n",
    "             title='Weight Size After Training')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 2.2: Overfitting on Test Dataset\n",
    "\n",
    "In principle, we should not touch our test set until choosing all our hyperparameters. Were we to use the test data in the model selection process, there is a risk that we might overfit the test data, and then we will be in serious trouble. If we overfit our training data, there is always an evaluation using the test data to keep us honest. But if we overfit the test data, how would we ever know?\n",
    "\n",
    "Note that there is another kind of overfitting: you do \"honest\" fitting on one set of images or posts or medical records, but it may not generalize to other images, posts, or medical records."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Validation Dataset\n",
    "\n",
    "A common practice to address this problem is to split our data in three ways, using a validation dataset (or validation set) to tune the hyperparameters. Ideally, we would only touch the test data once, to assess the very best model or to compare a small number of models to each other, real-world test data is seldom discarded after just one use."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Section 3: Memorization\n",
    "\n",
    "*Time estimate: ~20 mins*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Given sufficiently large networks and enough training, Neural Networks can achieve almost 100% train accuracy by remembering each training example. However, this is bad because it will mean that the model will fail when presented with new data.\n",
    "\n",
    "In this section, we train three MLPs; one each on:\n",
    "\n",
    "1. Animal Faces Dataset\n",
    "2. A Completely Noisy Dataset (Random shuffling of all labels)\n",
    "3. A partially Noisy Dataset (Random shuffling of 15% labels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Now, think for a couple of minutes as to what the train and test accuracies of each of these models might be, given that you train for sufficient time and use a powerful network."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "First, let's create the required dataloaders for all three datasets. Notice how we split the data. We train on a fraction of the dataset as it will be faster to train and will overfit more clearly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# Dataloaders for the Dataset\n",
    "batch_size = 128\n",
    "classes = ('cat', 'dog', 'wild')\n",
    "\n",
    "# Defining number of examples for train, val test\n",
    "len_train, len_val, len_test = 100, 100, 14430\n",
    "\n",
    "train_transform = transforms.Compose([\n",
    "     transforms.ToTensor(),\n",
    "     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))\n",
    "     ])\n",
    "data_path = pathlib.Path('.')/'afhq'  # Using pathlib to be compatible with all OS's\n",
    "img_dataset = ImageFolder(data_path/'train', transform=train_transform)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# Dataloaders for the Original Dataset\n",
    "\n",
    "# For reproducibility\n",
    "g_seed = torch.Generator()\n",
    "g_seed.manual_seed(SEED)\n",
    "\n",
    "img_train_data, img_val_data,_ = torch.utils.data.random_split(img_dataset,\n",
    "                                                               [len_train,\n",
    "                                                                len_val,\n",
    "                                                                len_test])\n",
    "\n",
    "# Creating train_loader and Val_loader\n",
    "train_loader = torch.utils.data.DataLoader(img_train_data,\n",
    "                                           batch_size=batch_size,\n",
    "                                           num_workers=2,\n",
    "                                           worker_init_fn=seed_worker,\n",
    "                                           generator=g_seed)\n",
    "\n",
    "val_loader = torch.utils.data.DataLoader(img_val_data,\n",
    "                                         batch_size=1000,\n",
    "                                         num_workers=2,\n",
    "                                         worker_init_fn=seed_worker,\n",
    "                                         generator=g_seed)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# Dataloaders for the Random Dataset\n",
    "\n",
    "# For reproducibility\n",
    "g_seed = torch.Generator()\n",
    "g_seed.manual_seed(SEED + 1)\n",
    "\n",
    "# Splitting randomized data into training and validation data\n",
    "data_path = pathlib.Path('.')/'afhq_random_32x32/afhq_random' # Using pathlib to be compatible with all OS's\n",
    "img_dataset = ImageFolder(data_path/'train', transform=train_transform)\n",
    "random_img_train_data, random_img_val_data,_ = torch.utils.data.random_split(img_dataset, [len_train, len_val, len_test])\n",
    "\n",
    "# Randomized train and validation dataloader\n",
    "rand_train_loader = torch.utils.data.DataLoader(random_img_train_data,\n",
    "                                                batch_size=batch_size,\n",
    "                                                num_workers=2,\n",
    "                                                worker_init_fn=seed_worker,\n",
    "                                                generator=g_seed)\n",
    "\n",
    "rand_val_loader = torch.utils.data.DataLoader(random_img_val_data,\n",
    "                                              batch_size=1000,\n",
    "                                              num_workers=2,\n",
    "                                              worker_init_fn=seed_worker,\n",
    "                                              generator=g_seed)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# Dataloaders for the Partially Random Dataset\n",
    "\n",
    "# For reproducibility\n",
    "g_seed = torch.Generator()\n",
    "g_seed.manual_seed(SEED + 1)\n",
    "\n",
    "# Splitting data between training and validation dataset for partially randomized data\n",
    "data_path = pathlib.Path('.')/'afhq_10_32x32/afhq_10' # Using pathlib to be compatible with all OS's\n",
    "img_dataset = ImageFolder(data_path/'train', transform=train_transform)\n",
    "partially_random_train_data, partially_random_val_data,_ = torch.utils.data.random_split(img_dataset, [len_train, len_val, len_test])\n",
    "\n",
    "# Training and Validation loader for partially randomized data\n",
    "partial_rand_train_loader = torch.utils.data.DataLoader(partially_random_train_data,\n",
    "                                                        batch_size=batch_size,\n",
    "                                                        num_workers=2,\n",
    "                                                        worker_init_fn=seed_worker,\n",
    "                                                        generator=g_seed)\n",
    "\n",
    "partial_rand_val_loader = torch.utils.data.DataLoader(partially_random_val_data,\n",
    "                                                      batch_size=1000,\n",
    "                                                      num_workers=2,\n",
    "                                                      worker_init_fn=seed_worker,\n",
    "                                                      generator=g_seed)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Now let's define a model which has many parameters compared to the training dataset size, and train it on these datasets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "class BigAnimalNet(nn.Module):\n",
    "  \"\"\"\n",
    "  Network Class - Animal Faces with following structure:\n",
    "  nn.Linear(3*32*32, 124) + leaky_relu(self.fc1(x)) # First fully connected layer\n",
    "  nn.Linear(124, 64) + leaky_relu(self.fc2(x)) # Second fully connected layer\n",
    "  nn.Linear(64, 3) # Final fully connected layer\n",
    "  \"\"\"\n",
    "\n",
    "  def __init__(self):\n",
    "    \"\"\"\n",
    "    Initialize parameters for BigAnimalNet\n",
    "\n",
    "    Args:\n",
    "      None\n",
    "\n",
    "    Returns:\n",
    "      Nothing\n",
    "    \"\"\"\n",
    "    super(BigAnimalNet, self).__init__()\n",
    "    self.fc1 = nn.Linear(3*32*32, 124)\n",
    "    self.fc2 = nn.Linear(124, 64)\n",
    "    self.fc3 = nn.Linear(64, 3)\n",
    "\n",
    "  def forward(self, x):\n",
    "    \"\"\"\n",
    "    Forward pass of BigAnimalNet\n",
    "\n",
    "    Args:\n",
    "      x: torch.tensor\n",
    "        Input features\n",
    "\n",
    "    Returns:\n",
    "      x: torch.tensor\n",
    "        Output/Predictions\n",
    "    \"\"\"\n",
    "    x = x.view(x.shape[0], -1)\n",
    "    x = F.leaky_relu(self.fc1(x))\n",
    "    x = F.leaky_relu(self.fc2(x))\n",
    "    x = self.fc3(x)\n",
    "    output = F.log_softmax(x, dim=1)\n",
    "    return output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Before training our `BigAnimalNet()`, calculate the Frobenius norm again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "set_seed(seed=SEED)\n",
    "normi, wsi, label = calculate_frobenius_norm(BigAnimalNet())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Now, train our `BigAnimalNet()` model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# Here we have 100 true train data.\n",
    "\n",
    "# Set the arguments\n",
    "args = {\n",
    "    'epochs': 200,\n",
    "    'lr': 5e-3,\n",
    "    'momentum': 0.9,\n",
    "    'device': DEVICE\n",
    "}\n",
    "\n",
    "\n",
    "# Initialize the network\n",
    "set_seed(seed=SEED)\n",
    "model = BigAnimalNet()\n",
    "\n",
    "start_time = time.time()\n",
    "\n",
    "# Train the network\n",
    "val_acc_pure, train_acc_pure, _, model = main(args=args,\n",
    "                                              model=model,\n",
    "                                              train_loader=train_loader,\n",
    "                                              val_loader=val_loader)\n",
    "end_time = time.time()\n",
    "\n",
    "print(f\"Time to memorize the dataset: {end_time - start_time}\")\n",
    "\n",
    "# Train and Test accuracy plot\n",
    "plt.figure(figsize=(8, 6))\n",
    "plt.plot(val_acc_pure, label='Val Accuracy Pure', c='red', ls='dashed')\n",
    "plt.plot(train_acc_pure, label='Train Accuracy Pure', c='red', ls='solid')\n",
    "plt.axhline(y=max(val_acc_pure), c='green', ls='dashed',\n",
    "            label='max Val accuracy pure')\n",
    "plt.title('Memorization')\n",
    "plt.ylabel('Accuracy (%)')\n",
    "plt.xlabel('Epoch')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " #### Frobenius norm for AnimalNet before and after training\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown #### Frobenius norm for AnimalNet before and after training\n",
    "normf, wsf, label = calculate_frobenius_norm(model)\n",
    "\n",
    "plot_weights(float(normi), label, wsi, title='Weight Size Before Training')\n",
    "plot_weights(float(normf), label, wsf, title='Weight Size After Training')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Data Visualizer\n",
    "\n",
    "Before we train the model on data with random labels, let's visualize and verify for ourselves that the data is random. Here, we have:\n",
    "```python\n",
    "classes = (\"cat\", \"dog\", \"wild\")\n",
    "```\n",
    "We use the `.permute()` method. `plt.imshow()` expects the input to be in NumPy format and in the format $(P_x, P_y, 3)$, where $P_x$ and $P_y$ are the number of pixels along $x$ and $y$ axis, respectively."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "def visualize_data(dataloader):\n",
    "  \"\"\"\n",
    "  Helper function to visualize data\n",
    "\n",
    "  Args:\n",
    "    dataloader: torch.tensor\n",
    "      Dataloader to visualize\n",
    "\n",
    "  Returns:\n",
    "    Nothing\n",
    "  \"\"\"\n",
    "\n",
    "  for idx, (data, label) in enumerate(dataloader):\n",
    "    plt.figure(idx)\n",
    "\n",
    "    # Choose the datapoint you would like to visualize\n",
    "    index = 22\n",
    "\n",
    "    # Choose that datapoint using index and permute the dimensions\n",
    "    # and bring the pixel values between [0, 1]\n",
    "    data = data[index].permute(1, 2, 0) * \\\n",
    "           torch.tensor([0.5, 0.5, 0.5]) + \\\n",
    "           torch.tensor([0.5, 0.5, 0.5])\n",
    "\n",
    "    # Convert the torch tensor into numpy\n",
    "    data = data.numpy()\n",
    "\n",
    "    plt.imshow(data)\n",
    "    plt.axis(False)\n",
    "    image_class = classes[label[index].item()]\n",
    "    print(f'The image belongs to : {image_class}')\n",
    "\n",
    "  plt.show()\n",
    "\n",
    "\n",
    "# Call the function\n",
    "visualize_data(rand_train_loader)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    " Now let's train the network on the shuffled data and see if it memorizes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# Here we have 100 completely shuffled train data.\n",
    "\n",
    "# Set the arguments\n",
    "args = {\n",
    "    'epochs': 200,\n",
    "    'lr': 5e-3,\n",
    "    'momentum': 0.9,\n",
    "    'device': DEVICE\n",
    "}\n",
    "\n",
    "# Initialize the model\n",
    "set_seed(seed=SEED)\n",
    "model = BigAnimalNet()\n",
    "\n",
    "# Train the model\n",
    "val_acc_random, train_acc_random, _, model = main(args,\n",
    "                                                  model,\n",
    "                                                  rand_train_loader,\n",
    "                                                  val_loader)\n",
    "\n",
    "# Train and Test accuracy plot\n",
    "plt.figure(figsize=(8, 6))\n",
    "plt.plot(val_acc_pure,label='Val - Pure',c='red',ls = 'dashed')\n",
    "plt.plot(train_acc_pure,label='Train - Pure',c='red',ls = 'solid')\n",
    "plt.plot(val_acc_random,label='Val - Random',c='blue',ls = 'dashed')\n",
    "plt.plot(train_acc_random,label='Train - Random',c='blue',ls = 'solid')\n",
    "\n",
    "plt.title('Memorization')\n",
    "plt.ylabel('Accuracy (%)')\n",
    "plt.xlabel('Epoch')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Isn't it surprising to see that the ANN was able to achieve 100% training accuracy on randomly shuffled labels? This is one of the reasons why training accuracy is not a good indicator of model performance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Section 4: Early Stopping\n",
    "\n",
    "*Time estimate: ~20 mins*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Video 4: Early Stopping\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Video 4: Early Stopping\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import YouTubeVideo\n",
    "from IPython.display import IFrame\n",
    "from IPython.display import display\n",
    "\n",
    "\n",
    "class PlayVideo(IFrame):\n",
    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
    "    self.id = id\n",
    "    if source == 'Bilibili':\n",
    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
    "    elif source == 'Osf':\n",
    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
    "\n",
    "\n",
    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
    "  tab_contents = []\n",
    "  for i, video_id in enumerate(video_ids):\n",
    "    out = widgets.Output()\n",
    "    with out:\n",
    "      if video_ids[i][0] == 'Youtube':\n",
    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
    "                             height=H, fs=fs, rel=0)\n",
    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
    "      else:\n",
    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
    "                          height=H, fs=fs, autoplay=False)\n",
    "        if video_ids[i][0] == 'Bilibili':\n",
    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
    "        elif video_ids[i][0] == 'Osf':\n",
    "          print(f'Video available at https://osf.io/{video.id}')\n",
    "      display(video)\n",
    "    tab_contents.append(out)\n",
    "  return tab_contents\n",
    "\n",
    "\n",
    "video_ids = [('Youtube', '72IG2bX5l30'), ('Bilibili', 'BV1cB4y1K777')]\n",
    "tab_contents = display_videos(video_ids, W=730, H=410)\n",
    "tabs = widgets.Tab()\n",
    "tabs.children = tab_contents\n",
    "for i in range(len(tab_contents)):\n",
    "  tabs.set_title(i, video_ids[i][0])\n",
    "display(tabs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Early_Stopping_Video\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Now that we have established that the validation accuracy reaches the peak well before the model overfits, we want to stop the training somehow early. You should have also observed from the above plots that the train/test loss on real data is not very smooth, and hence you might guess that the choice of the epoch can play a crucial role in the validation/test accuracy.\n",
    "\n",
    "Early stopping stops training when the validation accuracies stop increasing.\n",
    "\n",
    "<br>\n",
    "\n",
    "<center><img src=\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content-dl/main/tutorials/static/early-stopping-machine-learning-5422207.jpg\" alt=\"Overfitting\" width=\"600\"/></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Coding Exercise 4: Early Stopping\n",
    "\n",
    "Reimplement the main function to include early stopping as described above. Then run the code below to validate your implementation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "def early_stopping_main(args, model, train_loader, val_loader):\n",
    "  \"\"\"\n",
    "  Function to simulate early stopping\n",
    "\n",
    "  Args:\n",
    "    args: dictionary\n",
    "      Dictionary with epochs: 200, lr: 5e-3, momentum: 0.9, device: DEVICE\n",
    "    model: nn.module\n",
    "      Neural network instance\n",
    "    train_loader: torch.loader\n",
    "      Train dataset\n",
    "    val_loader: torch.loader\n",
    "      Validation set\n",
    "\n",
    "  Returns:\n",
    "    val_acc_list: list\n",
    "      Val accuracy log until early stop point\n",
    "    train_acc_list: list\n",
    "      Training accuracy log until early stop point\n",
    "    best_model: nn.module\n",
    "      Model performing best with early stopping\n",
    "    best_epoch: int\n",
    "      Epoch at which early stopping occurs\n",
    "  \"\"\"\n",
    "\n",
    "  ####################################################################\n",
    "  # Fill in all missing code below (...),\n",
    "  # then remove or comment the line below to test your function\n",
    "  raise NotImplementedError(\"Complete the early_stopping_main function\")\n",
    "  ####################################################################\n",
    "  device = args['device']\n",
    "  model = model.to(device)\n",
    "  optimizer = optim.SGD(model.parameters(),\n",
    "                        lr=args['lr'],\n",
    "                        momentum=args['momentum'])\n",
    "\n",
    "  best_acc = 0.0\n",
    "  best_epoch = 0\n",
    "\n",
    "  # Number of successive epochs that you want to wait before stopping training process\n",
    "  patience = 20\n",
    "\n",
    "  # Keeps track of number of epochs during which the val_acc was less than best_acc\n",
    "  wait = 0\n",
    "\n",
    "  val_acc_list, train_acc_list = [], []\n",
    "  for epoch in tqdm(range(args['epochs'])):\n",
    "\n",
    "    # Train the model\n",
    "    trained_model = ...\n",
    "\n",
    "    # Calculate training accuracy\n",
    "    train_acc = ...\n",
    "\n",
    "    # Calculate validation accuracy\n",
    "    val_acc = ...\n",
    "\n",
    "    if (val_acc > best_acc):\n",
    "      best_acc = val_acc\n",
    "      best_epoch = epoch\n",
    "      best_model = copy.deepcopy(trained_model)\n",
    "      wait = 0\n",
    "    else:\n",
    "      wait += 1\n",
    "\n",
    "    if (wait > patience):\n",
    "      print(f'Early stopped on epoch: {epoch}')\n",
    "      break\n",
    "\n",
    "    train_acc_list.append(train_acc)\n",
    "    val_acc_list.append(val_acc)\n",
    "\n",
    "  return val_acc_list, train_acc_list, best_model, best_epoch\n",
    "\n",
    "\n",
    "\n",
    "# Set the arguments\n",
    "args = {\n",
    "    'epochs': 200,\n",
    "    'lr': 5e-4,\n",
    "    'momentum': 0.99,\n",
    "    'device': DEVICE\n",
    "}\n",
    "\n",
    "# Initialize the model\n",
    "set_seed(seed=SEED)\n",
    "model = AnimalNet()\n",
    "\n",
    "## Uncomment to test\n",
    "# val_acc_earlystop, train_acc_earlystop, best_model, best_epoch = early_stopping_main(args, model, train_loader, val_loader)\n",
    "# print(f'Maximum Validation Accuracy is reached at epoch: {best_epoch:2d}')\n",
    "# early_stop_plot(train_acc_earlystop, val_acc_earlystop, best_epoch)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W2D1_Regularization/solutions/W2D1_Tutorial1_Solution_06645513.py)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Early_Stopping_Exercise\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Think! 4: Early Stopping\n",
    "\n",
    "Discuss among your pod why or why not:\n",
    "* Do you think early stopping can be harmful to training your network?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W2D1_Regularization/solutions/W2D1_Tutorial1_Solution_1329e07b.py)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Early_Stopping_Discussion\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Summary\n",
    "\n",
    "In this tutorial, you have been introduced to the regularization technique, described as shrinkage. We have learned about overfitting, one of the worst caveats in deep learning, and finally, we have learned a method of reducing overfitting in our models called early-stopping.\n",
    "\n",
    "If you have time left, you can learn how a model behaves when is trained with randomized labels."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Bonus: Train with randomized labels"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "In this part, let's train on a partially shuffled dataset where 15% of the labels are noisy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# Here we have 15% partially shuffled train data.\n",
    "\n",
    "# Set the arguments\n",
    "args = {\n",
    "    'epochs': 200,\n",
    "    'lr': 5e-3,\n",
    "    'momentum': 0.9,\n",
    "    'device': DEVICE\n",
    "}\n",
    "\n",
    "# Intialize the model\n",
    "set_seed(seed=SEED)\n",
    "model = BigAnimalNet()\n",
    "\n",
    "# Train the model\n",
    "val_acc_shuffle, train_acc_shuffle, _, _, = main(args,\n",
    "                                                 model,\n",
    "                                                 partial_rand_train_loader,\n",
    "                                                 val_loader)\n",
    "\n",
    "# Train and test acc plot\n",
    "plt.figure(figsize=(8, 6))\n",
    "plt.plot(val_acc_shuffle, label='Val Accuracy shuffle', c='red', ls='dashed')\n",
    "plt.plot(train_acc_shuffle, label='Train Accuracy shuffle', c='red', ls='solid')\n",
    "plt.axhline(y=max(val_acc_shuffle), c='green', ls='dashed',\n",
    "            label='Max Val Accuracy shuffle')\n",
    "plt.title('Memorization')\n",
    "plt.ylabel('Accuracy (%)')\n",
    "plt.xlabel('Epoch')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " #### Plotting them all together (Run Me!)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "#@markdown #### Plotting them all together (Run Me!)\n",
    "plt.figure(figsize=(8, 6))\n",
    "plt.plot(val_acc_pure,label='Val - Pure',c='red',ls = 'dashed')\n",
    "plt.plot(train_acc_pure,label='Train - Pure',c='red',ls = 'solid')\n",
    "plt.plot(val_acc_random,label='Val - Random',c='blue',ls = 'dashed')\n",
    "plt.plot(train_acc_random,label='Train - Random',c='blue',ls = 'solid')\n",
    "plt.plot(val_acc_shuffle, label='Val - shuffle', c='y', ls='dashed')\n",
    "plt.plot(train_acc_shuffle, label='Train - shuffle', c='y', ls='solid')\n",
    "\n",
    "plt.title('Memorization')\n",
    "plt.ylabel('Accuracy (%)')\n",
    "plt.xlabel('Epoch')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Think! Bonus: Does it Generalize?\n",
    "\n",
    "Given that the Neural Network fit/memorize the training data perfectly:\n",
    "\n",
    "* Do you think it generalizes well?\n",
    "* What makes you think it does or doesn't?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W2D1_Regularization/solutions/W2D1_Tutorial1_Solution_566afa93.py)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Early_Stopping_Generalization_Bonus_Discussion\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Also, it is interesting to note that sometimes the model trained on slightly shuffled data does slightly better than the one trained on pure data. Shuffling some of the data is a form of regularization, i.e., one of many ways of adding noise to the training data."
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "collapsed_sections": [
    "-PhV30moDGBj",
    "-ERb6jj0DGBj",
    "ZbHrMPJgDGBl",
    "ZK-TtZj2DGBy"
   ],
   "include_colab_link": true,
   "name": "W2D1_Tutorial1",
   "provenance": [],
   "toc_visible": true
  },
  "kernel": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}