# Tutorial 1: Biological vs. Artificial Neural Networks¶

Week 1, Day 3: Multi Layer Perceptrons

Content creators: Arash Ash, Surya Ganguli

Content reviewers: Saeed Salehi, Felix Bartsch, Yu-Fang Yang, Antoine De Comite, Melvin Selim Atay, Kelson Shilling-Scrivo

Content editors: Gagana B, Kelson Shilling-Scrivo, Spiros Chavlis

Production editors: Anoop Kulkarni, Kelson Shilling-Scrivo, Gagana B, Spiros Chavlis # Tutorial objectives¶

In this tutorial, we will explore the Multi-layer Perceptrons (MLPs). MLPs are arguably one of the most tractable models (due to their flexibility) that we can use to study deep learning fundamentals. Here we will learn why they are:

• Similar to biological networks

• Good at function approximation

• Implemented the way they are in PyTorch

# Setup¶

This is a GPU free notebook!

## Install dependencies¶

# @title Install dependencies
from evaltools.airtable import AirtableForm

# Generate airtable form

# Imports
import random

import torch
import numpy as np
import matplotlib.pyplot as plt

import torch.nn as nn
import torch.optim as optim
from tqdm.auto import tqdm
from IPython.display import display


## Figure settings¶

# @title Figure settings
import ipywidgets as widgets       # Interactive display
%config InlineBackend.figure_format = 'retina'


## Plotting functions¶

# @title Plotting functions

def imshow(img):
"""
Helper function to plot unnormalised image

Args:
img: torch.tensor
Image to be displayed

Returns:
Nothing
"""
img = img / 2 + 0.5     # Unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.axis(False)
plt.show()

def plot_function_approximation(x, relu_acts, y_hat):
"""
Helper function to plot ReLU activations and
function approximations

Args:
x: torch.tensor
Incoming Data
relu_acts: torch.tensor
Computed ReLU activations for each point along the x axis (x)
y_hat: torch.tensor
Estimated labels/class predictions
Weighted sum of ReLU activations for every point along x axis

Returns:
Nothing
"""
fig, axes = plt.subplots(2, 1)

# Plot ReLU Activations
axes.plot(x, relu_acts.T);
axes.set(xlabel='x',
ylabel='Activation',
title='ReLU Activations - Basis Functions')
labels = [f"ReLU {i + 1}" for i in range(relu_acts.shape)]
axes.legend(labels, ncol = 2)

# Plot Function Approximation
axes.plot(x, torch.sin(x), label='truth')
axes.plot(x, y_hat, label='estimated')
axes.legend()
axes.set(xlabel='x',
ylabel='y(x)',
title='Function Approximation')

plt.tight_layout()
plt.show()


## Set random seed¶

Executing set_seed(seed=seed) you are setting the seed

# @title Set random seed

# @markdown Executing set_seed(seed=seed) you are setting the seed

# For DL its critical to set the random seed so that students can have a
# baseline to compare their results to expected results.

# Call set_seed function in the exercises to ensure reproducibility.
import random
import torch

def set_seed(seed=None, seed_torch=True):
"""
Function that controls randomness. NumPy and random modules must be imported.

Args:
seed : Integer
A non-negative integer that defines the random state. Default is None.
seed_torch : Boolean
If True sets the random seed for pytorch tensors, so pytorch module
must be imported. Default is True.

Returns:
Nothing.
"""
if seed is None:
seed = np.random.choice(2 ** 32)
random.seed(seed)
np.random.seed(seed)
if seed_torch:
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True

print(f'Random seed {seed} has been set.')

# In case that DataLoader is used
def seed_worker(worker_id):
"""
DataLoader will reseed workers following randomness in

Args:
worker_id: integer
ID of subprocess to seed. 0 means that
the data will be loaded in the main process

Returns:
Nothing
"""
worker_seed = torch.initial_seed() % 2**32
np.random.seed(worker_seed)
random.seed(worker_seed)


## Set device (GPU or CPU). Execute set_device()¶

# @title Set device (GPU or CPU). Execute set_device()
# especially if torch modules used.

# Inform the user if the notebook uses GPU or CPU.
# NOTE: This is mostly a GPU free tutorial.

def set_device():
"""
Set the device. CUDA if available, CPU otherwise

Args:
None

Returns:
Nothing
"""
device = "cuda" if torch.cuda.is_available() else "cpu"
if device != "cuda":
print("GPU is not enabled in this notebook. \n"
"If you want to enable it, in the menu under Runtime -> \n"
"Hardware accelerator. and select GPU from the dropdown menu")
else:
print("GPU is enabled in this notebook. \n"
"If you want to disable it, in the menu under Runtime -> \n"
"Hardware accelerator. and select None from the dropdown menu")

return device

SEED = 2021
set_seed(seed=SEED)
DEVICE = set_device()

Random seed 2021 has been set.
GPU is not enabled in this notebook.
If you want to enable it, in the menu under Runtime ->
Hardware accelerator. and select GPU from the dropdown menu


# Section 1: The Need for MLPs¶

Time estimate: ~35 mins

## Coding Exercise 1: Function approximation with ReLU¶

Through the Universal Approximation Algorithm, we learned that one hidden layer MLPs are enough to approximate any smooth function! Now let’s manually fit a sine function using ReLU activation.

We will approximate the sine function using a linear combination (a weighted sum) of ReLUs with slope 1. We need to determine the bias terms (which determines where the ReLU inflection point from 0 to linear occurs) and how to weight each ReLU. The idea is to set the weights iteratively so that the slope changes in the new sample’s direction.

First, we generate our “training data” from a sine function using torch.sine function.

>>> import torch
>>> torch.manual_seed(2021)
<torch._C.Generator object at 0x7f8734c83830>
>>> a = torch.randn(5)
>>> print(a)
tensor([ 2.2871,  0.6413, -0.8615, -0.3649, -0.6931])
>>> torch.sin(a)
tensor([ 0.7542,  0.5983, -0.7588, -0.3569, -0.6389])


These are the points we will use to learn how to approximate the function. We have 10 training data points so we will have 9 ReLUs (we don’t need a ReLU for the last data point as we don’t have anything to the right of it to model).

We first need to figure out the bias term for each ReLU and compute the activation of each ReLU where:

(34)$\begin{equation} y(x) = \text{max}(0, x+b) \end{equation}$

We then need to figure out the correct weights on each ReLU so the linear combination approximates the desired function.

def approximate_function(x_train, y_train):
"""
Function to compute and combine ReLU activations

Args:
x_train: torch.tensor
Training data
y_train: torch.tensor
Ground truth labels corresponding to training data

Returns:
relu_acts: torch.tensor
Computed ReLU activations for each point along the x axis (x)
y_hat: torch.tensor
Estimated labels/class predictions
Weighted sum of ReLU activations for every point along x axis
x: torch.tensor
x-axis points
"""
####################################################################
# Fill in missing code below (...),
# then remove or comment the line below to test your function
raise NotImplementedError("Complete approximate_function!")
####################################################################

# Number of relus
n_relus = x_train.shape - 1

# x axis points (more than x train)
x = torch.linspace(torch.min(x_train), torch.max(x_train), 1000)

## COMPUTE RELU ACTIVATIONS

# First determine what bias terms should be for each of n_relus ReLUs
b = ...

# Compute ReLU activations for each point along the x axis (x)
relu_acts = torch.zeros((n_relus, x.shape))

for i_relu in range(n_relus):
relu_acts[i_relu, :] = torch.relu(x + b[i_relu])

## COMBINE RELU ACTIVATIONS

# Set up weights for weighted sum of ReLUs
combination_weights = torch.zeros((n_relus, ))

# Figure out weights on each ReLU
prev_slope = 0
for i in range(n_relus):
delta_x = x_train[i+1] - x_train[i]
slope = (y_train[i+1] - y_train[i]) / delta_x
combination_weights[i] = ...
prev_slope = slope

# Get output of weighted sum of ReLU activations for every point along x axis
y_hat = ...

return y_hat, relu_acts, x

atform.add_event('Coding Exercise 1: Function approximation with ReLU')

# Make training data from sine function
N_train = 10
x_train = torch.linspace(0, 2*np.pi, N_train).view(-1, 1)
y_train = torch.sin(x_train)

## Uncomment the lines below to test your function approximation
# y_hat, relu_acts, x = approximate_function(x_train, y_train)
# plot_function_approximation(x, relu_acts, y_hat)


Click for solution

Example output: As you see in the top panel, we obtain 10 shifted ReLUs with the same slope. These are the basis functions that MLP uses to span the functional space, i.e., MLP finds a linear combination of these ReLUs.

# Section 2: MLPs in Pytorch¶

Time estimate: ~1hr and 20 mins

## Video 2: Building MLPs in PyTorch¶

In the previous segment, we implemented a function to approximate any smooth function using MLPs. We saw that using Lipschitz continuity; We can prove that our approximation is mathematically correct. MLPs are fascinating, but before we get into the details on designing them, let’s familiarize ourselves with some basic terminology of MLPs - layer, neuron, depth, width, weight, bias, and activation function. Armed with these ideas, we can now design an MLP given its input, hidden layers, and output size.

## Coding Exercise 2: Implement a general-purpose MLP in Pytorch¶

The objective is to design an MLP with these properties:

• Works with any input (1D, 2D, etc.)

• Construct any number of given hidden layers using nn.Sequential() and add_module() function

• Use the same given activation function (i.e., Leaky ReLU) in all hidden layers

Leaky ReLU is described by the following mathematical formula:

(35)\begin{align} \text{LeakyReLU}(x) &= \text{max}(0,x) + \text{negative_slope} \cdot \text{min}(0, x) \\ &= \left\{ \begin{array}{ll} x & ,\; \text{if} \; x \ge 0 \\ \text{negative_slope} \cdot x & ,\; \text{otherwise} \end{array} \right. \end{align}
class Net(nn.Module):
"""
Initialize MLP Network
"""

def __init__(self, actv, input_feature_num, hidden_unit_nums, output_feature_num):
"""
Initialize MLP Network parameters

Args:
actv: string
Activation function
input_feature_num: int
Number of input features
hidden_unit_nums: list
Number of units in the hidden layer
output_feature_num: int
Number of output features

Returns:
Nothing
"""
super(Net, self).__init__()
self.input_feature_num = input_feature_num # Save the input size for reshaping later
self.mlp = nn.Sequential() # Initialize layers of MLP

in_num = input_feature_num # Initialize the temporary input feature to each layer
for i in range(len(hidden_unit_nums)): # Loop over layers and create each one

####################################################################
# Fill in missing code below (...),
# Then remove or comment the line below to test your function
raise NotImplementedError("Create MLP Layer")
####################################################################

out_num = hidden_unit_nums[i] # Assign the current layer hidden unit from list
layer = ... # Use nn.Linear to define the layer
in_num = out_num # Assign next layer input using current layer output
self.mlp.add_module('Linear_%d'%i, layer) # Append layer to the model with a name

actv_layer = eval('nn.%s'%actv) # Assign activation function (eval allows us to instantiate object from string)
self.mlp.add_module('Activation_%d'%i, actv_layer) # Append activation to the model with a name

out_layer = nn.Linear(in_num, output_feature_num) # Create final layer
self.mlp.add_module('Output_Linear', out_layer) # Append the final layer

def forward(self, x):
"""
Simulate forward pass of MLP Network

Args:
x: torch.tensor
Input data

Returns:
logits: Instance of MLP
Forward pass of MLP
"""
# Reshape inputs to (batch_size, input_feature_num)
# Just in case the input vector is not 2D, like an image!
x = x.view(-1, self.input_feature_num)

####################################################################
# Fill in missing code below (...),
# then remove or comment the line below to test your function
raise NotImplementedError("Run MLP model")
####################################################################

logits = ... # Forward pass of MLP
return logits

atform.add_event('Coding Exercise 2: Implement a general-purpose MLP in Pytorch')

input = torch.zeros((100, 2))
## Uncomment below to create network and test it on input
# net = Net(actv='LeakyReLU(0.1)', input_feature_num=2, hidden_unit_nums=[100, 10, 5], output_feature_num=1).to(DEVICE)
# y = net(input.to(DEVICE))
# print(f'The output shape is {y.shape} for an input of shape {input.shape}')


Click for solution

The output shape is torch.Size([100, 1]) for an input of shape torch.Size([100, 2])


## Section 2.1: Classification with MLPs¶

### Video 3: Cross Entropy¶

The main loss function we could use out of the box for multi-class classification for N samples and C number of classes is:

• CrossEntropyLoss: This criterion expects a batch of predictions x with shape (N, C) and class index in the range $$[0, C-1]$$ as the target (label) for each N samples, hence a batch of labels with shape (N, ). There are other optional parameters like class weights and class ignores. Feel free to check the PyTorch documentation here for more detail. Additionally, here you can learn where is appropriate to use the CrossEntropyLoss.

To get CrossEntropyLoss of a sample $$i$$, we could first calculate $$-\log(\text{softmax(x}))$$ and then take the element corresponding to $$\text { labels }_i$$ as the loss. However, due to numerical stability, we implement this more stable equivalent form,

(36)$\begin{equation} \operatorname{loss}(x_i, \text { labels }_i)=-\log \left(\frac{\exp (x[\text { labels }_i])}{\sum_{j} \exp (x[j])}\right)=-x_i[\text { labels }_i]+\log \left(\sum_{j=1}^C \exp (x_i[j])\right) \end{equation}$

### Coding Exercise 2.1: Implement Batch Cross Entropy Loss¶

To recap, since we will be doing batch learning, we’d like a loss function that given:

• A batch of predictions x with shape (N, C)

• A batch of labels with shape (N, ) that ranges from 0 to C-1

Returns the average loss $$L$$ calculated according to:

(37)\begin{align} loss(x_i, \text { labels }_i) &= -x_i[\text { labels }_i]+\log \left(\sum_{j=1}^C \exp (x_i[j])\right) \\ L &= \frac{1}{N} \sum_{i=1}^{N}{loss(x_i, \text { labels }_i)} \end{align}

Steps:

1. Use indexing operation to get predictions of class corresponding to the labels (i.e., $$x_i[\text { labels }_i]$$)

2. Compute $$loss(x_i, \text { labels }_i)$$ vector (losses) using torch.log() and torch.exp() without Loops!

3. Return the average of the loss vector

def cross_entropy_loss(x, labels):
"""
Helper function to compute cross entropy loss

Args:
x: torch.tensor
Model predictions we'd like to evaluate using labels
labels: torch.tensor
Ground truth

Returns:
avg_loss: float
Average of the loss vector
"""
x_of_labels = torch.zeros(len(labels))
####################################################################
# Fill in missing code below (...),
# then remove or comment the line below to test your function
raise NotImplementedError("Cross Entropy Loss")
####################################################################
# 1. Prediction for each class corresponding to the label
for i, label in enumerate(labels):
x_of_labels[i] = x[i, label]
# 2. Loss vector for the batch
losses = ...
# 3. Return the average of the loss vector
avg_loss = ...

return avg_loss

atform.add_event('Coding Exercise 2.1: Implement Batch Cross Entropy Loss')

labels = torch.tensor([0, 1])
x = torch.tensor([[10.0, 1.0, -1.0, -20.0],  # Correctly classified
[10.0, 10.0, 2.0, -10.0]])  # Not correctly classified
CE = nn.CrossEntropyLoss()
pytorch_loss = CE(x, labels).item()
## Uncomment below to test your function
# our_loss = cross_entropy_loss(x, labels).item()
# print(f'Our CE loss: {our_loss:0.8f}, Pytorch CE loss: {pytorch_loss:0.8f}')
# print(f'Difference: {np.abs(our_loss - pytorch_loss):0.8f}')


Click for solution

Our CE loss: 0.34672737, Pytorch CE loss: 0.34672749
Difference: 0.00000012


## Section 2.2: Spiral Classification Dataset¶

Before we could start optimizing these loss functions, we need a dataset!

Let’s turn this fancy-looking equation into a classification dataset

(38)$\begin{equation} \begin{array}{c} X_{k}(t)=t\left(\begin{array}{c} \sin \left[\frac{2 \pi}{K}\left(2 t+k-1\right)\right]+\mathcal{N}\left(0, \sigma\right) \\ \cos \left[\frac{2 \pi}{K}\left(2 t+k-1\right)\right]+\mathcal{N}\left(0, \sigma\right) \end{array}\right) \end{array}, \quad 0 \leq t \leq 1, \quad k=1, \ldots, K \end{equation}$
def create_spiral_dataset(K, sigma, N):
"""
Function to simulate spiral dataset

Args:
K: int
Number of classes
sigma: float
Standard deviation
N: int
Number of data points

Returns:
X: torch.tensor
Spiral data
y: torch.tensor
Corresponding ground truth
"""

# Initialize t, X, y
t = torch.linspace(0, 1, N)
X = torch.zeros(K*N, 2)
y = torch.zeros(K*N)

# Create data
for k in range(K):
X[k*N:(k+1)*N, 0] = t*(torch.sin(2*np.pi/K*(2*t+k)) + sigma*torch.randn(N))
X[k*N:(k+1)*N, 1] = t*(torch.cos(2*np.pi/K*(2*t+k)) + sigma*torch.randn(N))
y[k*N:(k+1)*N] = k

return X, y

# Set parameters
K = 4
sigma = 0.16
N = 1000

set_seed(seed=SEED)
X, y = create_spiral_dataset(K, sigma, N)
plt.scatter(X[:, 0], X[:, 1], c = y)
plt.show()

Random seed 2021 has been set. ## Section 2.3: Training and Evaluation¶

### Coding Exercise 2.3: Implement it for a classfication task¶

Now that we have the spiral dataset and a loss function, it’s your turn to implement a simple train/test split for training and validation.

Steps to follow:

• Dataset shuffle

• Train/Test split (20% for test)

• Training and Evaluation

def shuffle_and_split_data(X, y, seed):
"""
Helper function to shuffle and split incoming data

Args:
X: torch.tensor
Input data
y: torch.tensor
Corresponding target variables
seed: int
Set seed for reproducibility

Returns:
X_test: torch.tensor
Test data [20% of X]
y_test: torch.tensor
Labels corresponding to above mentioned test data
X_train: torch.tensor
Train data [80% of X]
y_train: torch.tensor
Labels corresponding to above mentioned train data
"""
torch.manual_seed(seed)
# Number of samples
N = X.shape
####################################################################
# Fill in missing code below (...),
# then remove or comment the line below to test your function
raise NotImplementedError("Shuffle & split data")
####################################################################
# Shuffle data
shuffled_indices = ...   # Get indices to shuffle data, could use torch.randperm
X = X[shuffled_indices]
y = y[shuffled_indices]

# Split data into train/test
test_size = ...    # Assign test datset size using 20% of samples
X_test = X[:test_size]
y_test = y[:test_size]
X_train = X[test_size:]
y_train = y[test_size:]

return X_test, y_test, X_train, y_train

## Uncomment below to test your function
# X_test, y_test, X_train, y_train = shuffle_and_split_data(X, y, seed=SEED)
# plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test)
# plt.title('Test data')
# plt.show()


Click for solution

Example output: And we need to make a Pytorch data loader out of it. Data loading in PyTorch can be separated in 2 parts:

• Data must be wrapped on a Dataset parent class where the methods getitem and len must be overrided. Note that, at this point, the data is not loaded on memory. PyTorch will only load what is needed to the memory. Here TensorDataset does this for us directly.

• Use a Dataloader that will actually read the data in batches and put into memory. Also, the option of num_workers > 0 allows multithreading, which prepares multiple batches in the queue to speed things up.

g_seed = torch.Generator()
g_seed.manual_seed(SEED)

batch_size = 128
test_data = TensorDataset(X_test, y_test)
shuffle=False, num_workers=2,
worker_init_fn=seed_worker,
generator=g_seed)

train_data = TensorDataset(X_train, y_train)
shuffle=True, num_workers=2,
worker_init_fn=seed_worker,
generator=g_seed)


Let’s write a general-purpose training and evaluation code and keep it in our pocket for next tutorial as well. So make sure you review it to see what it does.

Note that model.train() tells your model that you are training the model. So layers like dropout, batch norm etc. which behave different on the train and test procedures know what is going on and hence can behave accordingly. And to turn off training mode we set model.eval().

def train_test_classification(net, criterion, optimizer, train_loader,
training_plot=False, device='cpu'):
"""
Accumulate training loss/Evaluate performance

Args:
net: instance of Net class
Describes the model with ReLU activation, batch size 128
criterion: torch.nn type
Criterion combines LogSoftmax and NLLLoss in one single class.
optimizer: torch.optim type
Combines the train dataset and sampler, and provides an iterable over the given dataset.
Combines the test dataset and sampler, and provides an iterable over the given dataset.
num_epochs: int
Number of epochs [default: 1]
verbose: boolean
If True, print statistics
training_plot=False
If True, display training plot
device: string
CUDA/GPU if available, CPU otherwise

Returns:
Nothing
"""
net.train()
training_losses = []
for epoch in tqdm(range(num_epochs)):  # Loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(train_loader, 0):
# Get the inputs; data is a list of [inputs, labels]
inputs, labels = data
inputs = inputs.to(device).float()
labels = labels.to(device).long()

# forward + backward + optimize
outputs = net(inputs)

loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

# Print statistics
if verbose:
training_losses += [loss.item()]

net.eval()

"""
Function to gauge network performance

Args:
Combines the test dataset and sampler, and provides an iterable over the given dataset.

Returns:
acc: float
Performance of the network
total: int
Number of datapoints in the dataloader
"""
correct = 0
total = 0
inputs, labels = data
inputs = inputs.to(device).float()
labels = labels.to(device).long()

outputs = net(inputs)
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()

acc = 100 * correct / total

if verbose:
print(f"Accuracy on the {train_total} training samples: {train_acc:0.2f}")
print(f"Accuracy on the {test_total} testing samples: {test_acc:0.2f}")

if training_plot:
plt.plot(training_losses)
plt.xlabel('Batch')
plt.ylabel('Training loss')
plt.show()

return train_acc, test_acc


### Think! 2.3.1: What’s the point of .eval() and .train()?¶

Is it necessary to use net.train() and net.eval() for our MLP model? why?

#### Student Response¶

# @title Student Response
from ipywidgets import widgets

text=widgets.Textarea(
value='Type your answer here and click on Submit!',
placeholder='Type something',
description='',
disabled=False
)

button = widgets.Button(description="Submit!")

display(text,button)

def on_button_clicked(b):
print("Submission successful!")

button.on_click(on_button_clicked)


Click for solution

Now let’s put everything together and train your first deep-ish model!

set_seed(SEED)
net = Net('ReLU()', X_train.shape, , K).to(DEVICE)
criterion = nn.CrossEntropyLoss()
num_epochs = 100

_, _ = train_test_classification(net, criterion, optimizer, train_loader,
training_plot=True, device=DEVICE)


And finally, let’s visualize the learned decision-map. We know you’re probably running out of time, so we won’t make you write code now! But make sure you have reviewed it since we’ll start with another visualization technique next time.

def sample_grid(M=500, x_max=2.0):
"""
Helper function to simulate sample meshgrid

Args:
M: int
Size of the constructed tensor with meshgrid
x_max: float
Defines range for the set of points

Returns:
X_all: torch.tensor
Concatenated meshgrid tensor
"""
ii, jj = torch.meshgrid(torch.linspace(-x_max, x_max, M),
torch.linspace(-x_max, x_max, M))
X_all = torch.cat([ii.unsqueeze(-1),
jj.unsqueeze(-1)],
dim=-1).view(-1, 2)
return X_all

def plot_decision_map(X_all, y_pred, X_test, y_test,
M=500, x_max=2.0, eps=1e-3):
"""
Helper function to plot decision map

Args:
X_all: torch.tensor
Concatenated meshgrid tensor
y_pred: torch.tensor
Labels predicted by the network
X_test: torch.tensor
Test data
y_test: torch.tensor
Labels of the test data
M: int
Size of the constructed tensor with meshgrid
x_max: float
Defines range for the set of points
eps: float
Decision threshold

Returns:
Nothing
"""
decision_map = torch.argmax(y_pred, dim=1)

for i in range(len(X_test)):
indices = (X_all[:, 0] - X_test[i, 0])**2 + (X_all[:, 1] - X_test[i, 1])**2 < eps
decision_map[indices] = (K + y_test[i]).long()

decision_map = decision_map.view(M, M)
plt.imshow(decision_map, extent=[-x_max, x_max, -x_max, x_max], cmap='jet')
plt.show()

X_all = sample_grid()
y_pred = net(X_all)
plot_decision_map(X_all, y_pred, X_test, y_test)


### Think! 2.3.2: Does it generalize well?¶

Do you think this model is performing well outside its training distribution? Why?

#### Student Response¶

# @title Student Response
from ipywidgets import widgets

text=widgets.Textarea(
value='Type your answer here and click on Submit!',
placeholder='Type something',
description='',
disabled=False
)

button = widgets.Button(description="Submit!")

display(text,button)

def on_button_clicked(b):
print("Submission successful!")

button.on_click(on_button_clicked)


Click for solution

What would be your suggestions to increase models ability to generalize? Think about it and discuss with your pod.

Click for solution

# Summary¶

In this tutorial, we have explored the Multi-leayer Perceptrons (MLPs). More specifically, we have discussed the similarities between artificial and biological neural networks (for more information see the Bonus section). we have also learned the Universal Approximation Theorem and implemented MLPs in PyTorch.

# Bonus: Neuron Physiology and Motivation to Deep Learning¶

This section will motivate one of the most popular nonlinearities in deep learning, the ReLU nonlinearity, by starting from the biophysics of neurons and obtaining the ReLU nonlinearity through a sequence of approximations. We will also show that neuronal biophysics sets a time scale for signal propagation speed through the brain. This time scale implies that neural circuits underlying fast perceptual and motor processing in the brain may not be excessively deep.

## Leaky Integrate-and-fire (LIF) neuronal model¶

The basic idea of LIF neuron was proposed in 1907 by Louis Édouard Lapicque, long before we understood the electrophysiology of a neuron (see a translation of Lapicque’s paper ). More details of the model can be found in the book Theoretical neuroscience by Peter Dayan and Laurence F. Abbott.

The model dynamics is defined with the following formula,

(39)$\begin{equation} \frac{d V_m}{d t}=\left\{\begin{array}{cc} \frac{1}{C_m}\left(-\frac{V_m}{R_m} + I \right) & t>t_{rest} \\ 0 & \text { otherwise } \end{array}\right. \end{equation}$

Note that $$V_{m}$$, $$C_{m}$$, and $$R_{m}$$ are the membrane voltage, capacitance, and resitance of the neuron, respectively, so the $$-\frac{V_{m}}{R_{m}}$$ denotes the leakage current. When $$I$$ is sufficiently strong such that $$V_{m}$$ reaches a certain threshold value $$V_{\rm th}$$, it momentarily spikes and then $$V_{m}$$ is reset to $$V_{\rm reset}< V_{\rm th}$$, and voltage stays at $$V_{\rm reset}$$ for $$\tau_{\rm ref}$$ ms, mimicking the refractoriness of the neuron during an action potential (note that $$V_{\rm reset}$$ and $$\tau_{\rm ref}$$ is assumed to be zero in the lecture):

(40)$\begin{eqnarray} V_{m}(t)=V_{\rm reset} \text{ for } t\in(t_{\text{sp}}, t_{\text{sp}} + \tau_{\text{ref}}] \end{eqnarray}$

where $$t_{\rm sp}$$ is the spike time when $$V_{m}(t)$$ just exceeded $$V_{\rm th}$$.

Thus, the LIF model captures the fact that a neuron:

• Performs spatial and temporal integration of synaptic inputs

• Generates a spike when the voltage reaches a certain threshold

• Goes refractory during the action potential

• Has a leaky membrane

For in-depth content on computational models of neurons, follow the NMA tutorial 1 of Biological Neuron Models. Specifically, for NMA-CN 2021 follow this Tutorial.

## Simulating an LIF Neuron¶

In the cell below is given a function for LIF neuron model with it’s arguments described.

Note that we will use Euler’s method to make a numerical approximation to a derivative. Hence we will use the following implementation of the model dynamics,

(41)$\begin{equation} V_m^{[n]}=\left\{\begin{array}{cc} V_m^{[n-1]} + \frac{1}{C_m}\left(-\frac{V_m^{[n-1]}}{R_m}+I \right) \Delta t & t>t_{r e s t} \\ 0 & \text { otherwise } \end{array}\right. \end{equation}$

where the superscript $$[\cdot]$$ denotes the time point.

def run_LIF(I, T=50, dt=0.1, tau_ref=10,
Rm=1, Cm=10, Vth=1, V_spike=0.5):
"""
Simulate the LIF dynamics with external input current

Args:
I          : int
Input current (mA)
T          : int
Total time to simulate (msec)
dt         : float
Simulation of time step (msec)
tau_ref    : int
Refractory period (msec)
Rm         : int
Resistance (kOhm)
Cm         : int
Capacitance (uF)
Vth        : int
Spike threshold (V)
V_spike    : float
Spike delta (V)

Returns:
time       : list
Time points
Vm         : list
Tracking membrane potentials
"""

# Set up array of time steps
time = torch.arange(0, T + dt, dt)

# Set up array for tracking Vm
Vm = torch.zeros(len(time))

# Iterate over each time step
t_rest = 0
for i, t in enumerate(time):

# If t is after refractory period
if t > t_rest:
Vm[i] = Vm[i-1] + 1/Cm*(-Vm[i-1]/Rm + I)  * dt

# If Vm is over the threshold
if Vm[i] >= Vth:

# Increase volatage by change due to spike
Vm[i] += V_spike

# Set up new refactory period
t_rest = t + tau_ref

return time, Vm

sim_time, Vm = run_LIF(1.5)
# Plot the membrane voltage across time
plt.plot(sim_time, Vm)
plt.title('LIF Neuron Output')
plt.ylabel('Membrane Potential (V)')
plt.xlabel('Time (msec)')
plt.show() ### Interactive Demo: Neuron’s transfer function explorer for different $$R_m$$ and $$\tau_{ref}$$¶

We know that real neurons communicate by modulating the spike count meaning that more input current causes a neuron to spike more often. Therefore, to find an input-output relationship, it makes sense to characterize their spike count as a function of input current. This is called the neuron’s input-output transfer function. Let’s plot the neuron’s transfer function and see how it changes with respect to the membrane resistance and refractory time?

#### ¶

Make sure you execute this cell to enable the widget!

# @title

# @markdown Make sure you execute this cell to enable the widget!
my_layout = widgets.Layout()

@widgets.interact(Rm=widgets.FloatSlider(1., min=1, max=100.,
step=0.1, layout=my_layout),
tau_ref=widgets.FloatSlider(1., min=1, max=100.,
step=0.1, layout=my_layout)
)

def plot_IF_curve(Rm, tau_ref):
"""
Helper function to plot frequency-current curve

Args:
Rm         : int
Resistance (kOhm)
tau_ref    : int
Refractory period (msec)

Returns:
Nothing
"""
T = 1000  # Total time to simulate (msec)
dt = 1  # Simulation time step (msec)
Vth = 1  # Spike threshold (V)
Is_max = 2
Is = torch.linspace(0, Is_max, 10)
spike_counts = []
for I in Is:
_, Vm = run_LIF(I, T=T, dt=dt, Vth=Vth, Rm=Rm, tau_ref=tau_ref)
spike_counts += [torch.sum(Vm > Vth)]

plt.plot(Is, spike_counts)
plt.title('LIF Neuron: Transfer Function')
plt.ylabel('Spike count')
plt.xlabel('I (mA)')
plt.xlim(0, Is_max)
plt.ylim(0, 80)
plt.show()


### Think!: Real and Artificial neuron similarities¶

What happens at infinite membrane resistance ($$R_m$$) and small refactory time ($$\tau_{ref}$$)? Why?

Take 10 mins to discuss the similarity between a real neuron and an artificial one with your pod.

Click for solution