Open In Colab   Open in Kaggle

Tutorial 2: Introduction to RNNs

Week 2, Day 1: Convnets And Recurrent Neural Networks

By Neuromatch Academy

Content creators: Dawn McKnight, Richard Gerum, Cassidy Pirlot, Rohan Saha, Liam Peet-Pare, Saeed Najafi, Alona Fyshe

Content reviewers: Saeed Salehi, Lily Cheng, Yu-Fang Yang, Polina Turishcheva, Nina Kudryashova, Kelson Shilling-Scrivo

Content editors: Gagana B, Nina Kudryashova

Production editors: Anmol Gupta, Spiros Chavlis

Post-Production team: Gagana B, Spiros Chavlis

Based on material from: Konrad Kording, Hmrishav Bandyopadhyay, Rahul Shekhar, Tejas Srivastava

Our 2021 Sponsors, including Presenting Sponsor Facebook Reality Labs

Tutorial Objectives

At the end of this tutorial, we will be able to:

  • Understand the structure of a Recurrent Neural Network (RNN)

  • Build a simple RNN model

Tutorial slides

These are the slides for the videos in this tutorial. If you want to download locally the slides, click here.


Install dependencies

# @title Install dependencies
!pip install livelossplot --quiet
!pip install unidecode --quiet

!pip install git+ --quiet
from evaltools.airtable import AirtableForm

# Generate airtable form
atform = AirtableForm('appn7VdPRseSoMXEG','W2D1_T2','')
WARNING: You are using pip version 22.0.4; however, version 22.1 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.7.13/x64/bin/python -m pip install --upgrade pip' command.

WARNING: You are using pip version 22.0.4; however, version 22.1 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.7.13/x64/bin/python -m pip install --upgrade pip' command.

WARNING: You are using pip version 22.0.4; however, version 22.1 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.7.13/x64/bin/python -m pip install --upgrade pip' command.

# Imports
import time
import math
import torch
import string
import random
import unidecode
import numpy as np
import matplotlib.pyplot as plt

import torch.nn as nn

from tqdm.notebook import tqdm

Figure settings

# @title Figure settings
import ipywidgets as widgets       # Interactive display
%config InlineBackend.figure_format = 'retina'"")

plt.rcParams["mpl_toolkits.legacy_colorbar"] = False

import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="matplotlib")

Helper functions

# @title Helper functions

def read_file(filename):
  Helper function to read file

    filename: string

    file: string
      Contents of file
    And file length
  file = unidecode.unidecode(open(filename).read())
  return file, len(file)

def char_tensor(string):
  Turning a string into a tensor

    string: string
      Input string

    tensor: torch.tensor
      Tensor from input string
  tensor = torch.zeros(len(string)).long()
  for c in range(len(string)):
      tensor[c] = all_characters.index(string[c])
  return tensor

def time_since(since):
  Readable time elapsed

    since: time
      Input time

    out: string
      Time elapsed since since.
  s = time.time() - since
  m = math.floor(s / 60)
  s -= m * 60
  out = f"{m}min {s}sec"
  return out

def generate(decoder, prime_str='A', predict_len=100,
  Function to generate predicted future sequence

    decoder: nn.module
    prime_str: string
      Prime string [default: A]
    predict_len: int
      Predict length [default: 100]
    temperature: float
      Temperature [default: 0.8]
    device: string
      GPU/CUDA if available, CPU otherwise

    predicted: string
      Predicted sequence
  hidden = decoder.init_hidden(1)
  prime_input = char_tensor(prime_str).unsqueeze(0)

  hidden =
  prime_input =
  predicted = prime_str

  # Use priming string to "build up" hidden state
  for p in range(len(prime_str) - 1):
    _, hidden = decoder(prime_input[:,p], hidden)

  inp = prime_input[:,-1]

  for p in range(predict_len):
    output, hidden = decoder(inp, hidden)

    # Sample from the network as a multinomial distribution
    output_dist =
    top_i = torch.multinomial(output_dist, 1)[0]

    # Add predicted character to string and use as next input
    predicted_char = all_characters[top_i]
    predicted += predicted_char
    inp = char_tensor(predicted_char).unsqueeze(0)
    inp =

  return predicted

Set random seed

Executing set_seed(seed=seed) you are setting the seed

# @title Set random seed

# @markdown Executing `set_seed(seed=seed)` you are setting the seed

# For DL its critical to set the random seed so that students can have a
# baseline to compare their results to expected results.
# Read more here:

# Call `set_seed` function in the exercises to ensure reproducibility.
import random
import torch

def set_seed(seed=None, seed_torch=True):
  Function that controls randomness.
  NumPy and random modules must be imported.

    seed : Integer
      A non-negative integer that defines the random state. Default is `None`.
    seed_torch : Boolean
      If `True` sets the random seed for pytorch tensors, so pytorch module
      must be imported. Default is `True`.

  if seed is None:
    seed = np.random.choice(2 ** 32)
  if seed_torch:
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

  print(f'Random seed {seed} has been set.')

# In case that `DataLoader` is used
def seed_worker(worker_id):
  DataLoader will reseed workers following randomness in
  multi-process data loading algorithm.

    worker_id: integer
      ID of subprocess to seed. 0 means that
      the data will be loaded in the main process
      Refer: for more details

  worker_seed = torch.initial_seed() % 2**32

Set device (GPU or CPU). Execute set_device()

#@title Set device (GPU or CPU). Execute `set_device()`
# especially if torch modules used.

# Inform the user if the notebook uses GPU or CPU.

def set_device():
  Set the device. CUDA if available, CPU otherwise


  device = "cuda" if torch.cuda.is_available() else "cpu"
  if device != "cuda":
    print("WARNING: For this notebook to perform best, "
        "if possible, in the menu under `Runtime` -> "
        "`Change runtime type.`  select `GPU` ")
    print("GPU is enabled in this notebook.")

  return device
SEED = 2021
DEVICE = set_device()
Random seed 2021 has been set.
WARNING: For this notebook to perform best, if possible, in the menu under `Runtime` -> `Change runtime type.`  select `GPU` 

Section 1: Recurrent Neural Networks (RNNs)

Time estimate: ~20mins

Video 1: RNNs

RNNs are compact models that operate over timeseries, and have the ability to remember past input. They also save parameters by using the same weights at every time step. If you’ve heard of Transformers, those models don’t have this kind of temporal weight sharing, and so they are much larger.

The code below is adapted from this github repository.


class CharRNN(nn.Module):
  Recurrent Neural Network Implementation

  def __init__(self, input_size, hidden_size, output_size,
               model="gru", n_layers=1):
    Initialise CharRNN parameters

      input_size: int
        Size of the input layer.
      hidden_size: int
        Size of the hidden layers.
      output_size: int
        Size of the output layer.
      model: string
        `model` can take the values "gru", "rnn", "lstm". Default is "gru".
      n_layers: int
        Number of layers [default: 1]

    super(CharRNN, self).__init__()
    self.model = model.lower()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.output_size = output_size
    self.n_layers = n_layers

    self.encoder = nn.Embedding(input_size, hidden_size)
    if self.model == "gru":
      self.rnn = nn.GRU(hidden_size, hidden_size, n_layers)
    elif self.model == "lstm":
      self.rnn = nn.LSTM(hidden_size, hidden_size, n_layers)
    elif self.model == "rnn":
      self.rnn = nn.RNN(hidden_size, hidden_size, n_layers)
    self.decoder = nn.Linear(hidden_size, output_size)

  def forward(self, input, hidden):
    Forward pass of CharRNN

      input: torch.tensor
        Input to CharRNN
      hidden: int
        Dimension of hidden layer

      output: torch.tensor
        Output of CharRNN
      hidden: torch.tensor
        Output of CharRNN hidden layer
    batch_size = input.size(0)
    encoded = self.encoder(input)
    output, hidden = self.rnn(encoded.reshape(1, batch_size, -1), hidden)
    output = self.decoder(output.reshape(batch_size, -1))
    return output, hidden

  def init_hidden(self, batch_size):
    Initialise hidden dimension of CharRNN

      batch_size: int
        Batch Size

      torch.zeros(self.n_layers, batch_size, self.hidden_size), torch.zeros(self.n_layers, batch_size, self.hidden_size) if LSTM
      torch.zeros(self.n_layers, batch_size, self.hidden_size) otherwise
    if self.model == "lstm":
      return (torch.zeros(self.n_layers, batch_size, self.hidden_size), torch.zeros(self.n_layers, batch_size, self.hidden_size))

    return torch.zeros(self.n_layers, batch_size, self.hidden_size)

This next section of code takes care of training the RNN on several of Mark Twain’s books. In this short section, we won’t dive into the code, but you’ll get to learn a lot more about RNNs in a few days! For now, we are just going to observe the training process.

Run Me to get the data

# @title Run Me to get the data
import requests

url = ''
r = requests.get(url, stream=True)

with open('twain.txt', 'wb') as fd:

One cool thing about RNNs is that they can be used to generate language based on what the network sees during training. As the network makes predictions, instead of confirming of those predictions are correct against some training text, we just feed them back into the model as the next observed token. Starting from a random vector for the hidden state, we can generate many original sentences! And what the network generates will reflect the text it was trained on.


def random_training_set(file, file_len, chunk_len, batch_size,
                        device='cpu', seed=0):
  Generates random training set

    file: string
    file_len: int
      Length of file
    chunk_len: int
      Length of chunk
    batch_size: int
      Batch size
    device: string
      GPU/CUDA if available. CPU otherwise [default: CPU]
    seed: int
      Set seed for reproducibility [default: 0]

    inp: torch.tensor
      Input tensor
    target: torch.tensor
    chunk_len: int
      Length of chunk
    batch_size: int
      Batch size
    device: string
      GPU/CUDA if available. CPU otherwise [default: CPU]

  inp = torch.LongTensor(batch_size, chunk_len).to(device)
  target = torch.LongTensor(batch_size, chunk_len).to(device)

  for bi in range(batch_size):
    start_index = random.randint(0, file_len - chunk_len - 1)
    end_index = start_index + chunk_len + 1
    chunk = file[start_index:end_index]
    inp[bi] = char_tensor(chunk[:-1])
    target[bi] = char_tensor(chunk[1:])

  return inp, target, chunk_len, batch_size, device

def train(decoder, criterion, inp, target, chunk_len, batch_size, device):
  Training function

    decoder: nn.module
      Decoder model
    criterion: function
      Loss function
    inp: torch.tensor
    target: torch.tensor
    chunk_len: int
      Length of chunk
    batch_size: int
      Batch size
    device: string
      GPU/CUDA if available. CPU otherwise [default: CPU]

    Decoder loss
  hidden = decoder.init_hidden(batch_size)
  loss = 0

  for c in range(chunk_len):
    output, hidden = decoder(inp[:, c].to(device),
    loss += criterion(output.reshape(batch_size, -1), target[:,c])

  return loss.item() / chunk_len

First, let’s load the text file, and define the model and its hyperparameters.

# Reading and un-unicode-encoding data
all_characters = string.printable
n_characters = len(all_characters)

# Load the text file
file, file_len = read_file('twain.txt')

# Hyperparams
batch_size = 50
chunk_len = 200
model = "rnn"  # Other options: `lstm`, `gru`

n_layers = 2
hidden_size = 200
learning_rate = 0.01

# Define the model, optimizer, and the loss criterion
decoder = CharRNN(n_characters, hidden_size, n_characters,
                  model=model, n_layers=n_layers)

decoder_optimizer = torch.optim.Adagrad(decoder.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

Let’s try it! Run the code below. As the network trains, it will output samples of generated text every 25 epochs. Notice that as the training progresses, the model learns to spell short words, then learns to string some words together, and eventually can produce meaningful sentences (sometimes)! Keep in mind that this is a relatively small network, and doesn’t employ some of the cool things you’ll learn about later in the week (e.g., LSTMs, though you can change that in the code below by changing the value of the model variable if you wish!)

After running the model, and observing the output, get together with your pod, and talk about what you noticed during training. Did your network produce anything interesting? Did it produce anything characteristic of Twain?

Note: training for the full 2000 epochs is likely to take a while, so you may need to stop it before it finishes. If you have time left, set n_epochs to 2000 below.

n_epochs = 1000   # Initial was set to 2000

print_every = 50  # Frequency of printing the outputs

start = time.time()
all_losses = []
loss_avg = 0

print(f"Training for {n_epochs} epochs...\n")
for epoch in tqdm(range(1, n_epochs + 1), position=0, leave=True):
  loss = train(decoder, criterion,
               *random_training_set(file, file_len, chunk_len, batch_size,
                                    device=DEVICE, seed=epoch))
  loss_avg += loss

  if epoch % print_every == 0:
    print(f"[{time_since(start)} {epoch/n_epochs * 100}%) {loss:.4f}]")
    print(f"{generate(decoder, prime_str='Wh', predict_len=150, device=DEVICE)}")
Training for 1000 epochs...
[0min 15.362743377685547sec 5.0%) 2.1638]
Whyd, Thon.

of to dekt seed sortectisthen stuicecty beall ilead, of onsingon it the dile mint hean whend witure to speaghan the say and allather a them
[0min 30.648447275161743sec 10.0%) 1.9462]
Wher saing forted shat his, and see instent st@on cother. And by pook a plant. and to beding.  Ildn't shen inde doch the bupcty, wouks it was the cound 
[0min 45.982569217681885sec 15.0%) 1.9125]
Whan and the was prepricely
firste abath to thinded the start the noses, and mace praked wesandy goied, sone to the his sitere had gaspain:

"She hunger
[1min 1.4012203216552734sec 20.0%) 1.8870]
Where he fure filled to seefry, and says pont to king, over, in a do lece this dome to apped of so the lious to-now do ow that on the slight a for the s
[1min 16.782801389694214sec 25.0%) 1.8738]
Whing and vercace the _went of any
So it. O will plop it will, and pirst fort to fire
hen seap along shourded it. I fale of he cray down it, and up the 
[1min 32.059624433517456sec 30.0%) 1.7899]

"Dolite her lougled the bath's koon to the traight, and ham a gircour the reclockep the new not what I dother the olm of their ulver one wy the 
[1min 47.40600228309631sec 35.0%) 1.7620]
Whour the had mode
and the got a like boy dlad.  The didn't stempin to go tell the gart he fear?. There was
clet in it, nevel and said the
got about a f
[2min 2.71657395362854sec 40.0%) 1.7369]
Wh got the may down live it us and berand it, and a moppesty took at more and what the murnidess, and did. They had in the mimver inther sumple and said
[2min 18.095199823379517sec 45.0%) 1.7477]
Why hurdant, your did began was still. HA PPTom that as been says, and
by thou was ssourself it. It was boys purpiousts some, and My
she begold the stuc
[2min 33.37091422080994sec 50.0%) 1.6915]
Why, good and the fist go to disn and hombsern shart nother mays the trump with the stange, chands and this can an that bet and

And said thea
[2min 48.69055366516113sec 55.00000000000001%) 1.6896]
Whon had stistiun's _that's come maniggant
a spated that so the see
to leave and Joes mand!" she ragged."

"But over the that with the formation of lie,
[3min 4.0842180252075195sec 60.0%) 1.6591]
Where, and was a looa sectter? He did that benried this cuppine their secouses his a comes, and the clave if then into the
cite--out, and hard, from no 
[3min 19.375298023223877sec 65.0%) 1.6145]
Whow she moved as a body and shore his heart of the nearly
mart. A she could far me no. Well, there was the Dilase of him with to the wonder's so at thi
[3min 34.604671239852905sec 70.0%) 1.6671]
Whing and supper. betchead, when all felsicping the thought amoverty, when any fine the Silling
itstroff her said the bone and harm as the says:

[3min 49.896562576293945sec 75.0%) 1.6385]
Where fatirated of the most the storred done a town the King the strunting him; and the cojections the ceesently dack. I was your put a suppent can abou
[4min 5.143733024597168sec 80.0%) 1.6331]
Whered because, and one of that said "Ster me in the bedberg the (art wibl, and we lead to de id and then ain't towe peflling by shook you conting his h
[4min 20.491488218307495sec 85.0%) 1.6179]
What of I was a hows back?"

"Oh, in Enget! Te long here on the Some it is not throu could all they was genter the helpances in because of recembly; for
[4min 35.74625039100647sec 90.0%) 1.6034]
Where, big river out as with brother gone and partimes of the sort a bed "Well, so jurt I was protity for the the the got the
other, findling the place,
[4min 51.00572490692139sec 95.0%) 1.6064]
Where of the chay in dripped in his strung you, but her pround to hee shapt to see because the surplace and for, en most at the being and he was a long 
[5min 6.255812644958496sec 100.0%) 1.5523]
Whe marted him, and dask we tears,
a work of you three? And there."

Buck it aball he went and his been that was tracive to be
wathing for to him a litt

Now you can generate more examples using a trained model. Recall that generate takes the mentioned below arguments to work:

generate(decoder, prime_str='A', predict_len=100, temperature=0.8, device='cpu')

Try it by yourself

print(f"{generate(decoder, prime_str='Wh', predict_len=100, device=DEVICE)}\n")
Whermbles where that thind off and goor going there going of preceable didn't do niggered to far was t

Section 2: Power consumption in Deep Learning

Time estimate: ~20mins

Training NN models can be incredibly costly, both in actual money but also in power consumption.

Video 2: Carbon Footprint of AI

Take a few moments to chat with your pod about the following points:

  • Which societal costs of training do you find most compelling?

  • When is training an AI model worth the cost? Who should make that decision?

  • Should there be additional taxes on energy costs for compute centers?

Exercise 2: Calculate the carbon footprint that your pod generated today.

You can use this online calculator.

Student Response

# @title Student Response
from ipywidgets import widgets

   value='Type your answer here and click on `Submit!`',
   placeholder='Type something',

button = widgets.Button(description="Submit!")


def on_button_clicked(b):
   atform.add_answer('q1', text.value)
   print("Submission successful!")



What a day! We’ve learned a lot! The basics of CNNs and RNNs, and how changes to architecture that allow models to parameter share can greatly reduce the size of the model. We learned about convolution and pooling, as well as the basic idea behind RNNs. To wrap up we thought about the impact of training large NN models.