Tutorial 1: Deep Learning Thinking 2: Architectures and Multimodal DL thinking¶

Week 3, Day 2: DL Thinking 2

Content creators: Konrad Kording, Lyle ungar

Content reviewers: Kelson Shilling-Scrivo

Content editors: Kelson Shilling-Scrivo

Production editors: Gagana B, Spiros Chavlis

Tutorial Objectives¶

In this tutorial, you will practice thinking like a deep learning practitioner and figure out how to design architectures for different scenarios.

By the end of this tutorial, you will be better able to:

• Know how to proceed when low on data

• Have a toolbox of what to do in non-standard situations

We will also continue to see how to get relevant information out of domain experts, arguably the central skill of DL and how to convert insights into domains into the logic of actual approaches.

Setup¶

Install dependencies¶

# @title Install dependencies

from evaltools.airtable import AirtableForm


Section 1: Intro Deep Learning Thinking 2¶

Time estimate: ~4 mins

Video 1: Intro to DL Thinking 2¶

Like Deep Learning thinking 1 last week, this tutorial is a bit different from others - there will be no coding! Instead, you will watch a series of vignettes about various scenarios where you want to use a neural network. This tutorial will focus on various architectures and multimodal thinking.

Each section below will start with a vignette where either Lyle or Konrad is trying to figure out how to set up a neural network for a specific problem. Try to think of questions you want to ask them as you watch, then pay attention to what questions Lyle and Konrad are asking. Were they what you would have asked? How do their questions help quickly clarify the situation?

Section 2: Getting More Data¶

Time estimate: ~15 mins

Video 2: Getting More Data Vignette¶

Konrad wants to build a neural network that classifies images based on the objects contained within them. He needs more data to help him train an accurate network, but buying more images is costly. He needs a different solution.

Think! 1: Designing a strategy to get more data¶

Given everything you know, how would you design a strategy to get some more data (pairs of images and the label of the object they are of) for the image classification neural network that Konrad is training? Be specific & write down a procedure.

Please discuss as a group. If you get stuck, you can uncover the hints below one at a time. Please spend some time discussing before uncovering the next hint, though! You are being real deep learning scientists now, and the answers won’t be easy.

Student Response¶

# @title Student Response
from ipywidgets import widgets

text=widgets.Textarea(
value='Type your answer here and click on Submit!',
placeholder='Type something',
description='',
disabled=False
)

button = widgets.Button(description="Submit!")

display(text,button)

def on_button_clicked(b):
print("Submission successful!")

button.on_click(on_button_clicked)


Look at a few photos of dogs (use an image search engine). How are they similar? How are they different? What makes them all be dogs?

We don’t need to obtain any new images in order to give more examples of each object to our neural network.

Think about color, orientation, flipping, pixel noise, color noise, shearing, contrast, brightness and scaling

Discuss where each of these ideas will break down. Can too much of a good thing be good?

Instead of collecting new data, we can create multiple examples for the neural network of each of our existing images by changing things like flipping them horizontally, shifting them horizontally or vertically by some number of pixels, scaling them to be larger or smaller (and cropping), rotating them, and changing their contrast and brightness.

This is called data augmentation and is a very commonly used and is an important strategy for training neural networks.

Importantly, we need to be careful about the amount we change each image, and we still want them to be useful training images! Let’s say you have a photo of a dog, and you scale it to be 1000x bigger and crop the middle out. You’d have just an image of fur - this would not be very useful as a training example on how to classify dogs. So we want to change factors about the images but not so much that they are no longer recognizable as the original object.

Video 3: Getting More Data Wrap-up¶

Check out the paper mentioned in the above video:

• Balestriero, R., Bottou, L., LeCun, Y. (2022). The Effects of Regularization and Data Augmentation are Class Dependent. arxiv: 2204.03632

(Bonus) Think!: Class-based strategies¶

Discuss how you may want to vary these strategies based on the class of the object/images.

Section 3: Detecting Tumors - What to do if there still isn’t enough data¶

Time estimate: ~15 mins

Video 5: Detecting Tumors Set-up¶

Konrad works for a hospital and wants to train a neural network to detect tumors in brain scans automatically. This type of tumor is pretty rare, which is great for humanity but means we only have a few thousand training examples for our neural network. This isn’t enough.

Even with adding in images of other types of tumors, we don’t have enough data. We have a lot of images of other things in ImageNet, like cats and dogs, though! Maybe we can use that?

Think! 2: Designing a strategy for detecting tumors¶

Given everything you know, how would you design a strategy to be able to train an accurate tumor-detecting neural network? Be specific & write down a procedure.

Please discuss as a group. If you get stuck, you can uncover the hints below one at a time. Please spend some time discussing before uncovering the next hint, though! You are being real deep learning scientists now, and the answers won’t be easy.

Student Response¶

# @title Student Response
from ipywidgets import widgets

text=widgets.Textarea(
value='Type your answer here and click on Submit!',
placeholder='Type something',
description='',
disabled=False
)

button = widgets.Button(description="Submit!")

display(text,button)

def on_button_clicked(b):
print("Submission successful!")

button.on_click(on_button_clicked)


Data augmentation is always something to consider

A human learning to detect tumors is not learning how to see from scratch just based on the tumor images.

You could use another dataset to help. What properties should such a dataset have?

Even though the images in ImageNet are not of tumors, natural images contain information on aspects of visual objects that are similar to tumors (that they’re coherent, locally smooth, etc)

If you train a neural network on ImageNet first so that it learns general vision and embeddings of images, what might you want to change when training on the tumor images dataset?

Humans don’t learn to see when they learn a new classification task. We already have a trained visual system that is good at processing and learning embeddings for natural images.

We can replicate this in neural networks! First, we can train our neural network on ImageNet alone to do object classification. This gives us a neural network that has already learned how to process and embed images.

Then, we want to take this neural network and continue to train it on just the tumor classification dataset. We can chop off the existing final layer (that outputs the probabilities of all the ImageNet classes) and train a new one that outputs the probability of there being a tumor in the image.

We could keep all the weights in the convolutional layers fixed, not allowing them to change after the ImageNet training or fine-tune them. People take both strategies!

This whole process is called pre-training. We have pre-trained the neural network on ImageNet before training on our actual task, the detection of tumors.

We should mention here that there are many ways of doing this. Train the whole network after training on a first task. Train the top layers after training the bottom layers. Potentially first do the latter and then the former. Pre-training can be done in many ways - looking for it as an opportunity is important.

Video 6: Detecting Tumors Wrap-up¶

Check out the paper mentioned in the above video:

• Tschandl, P., Rinner, C., Apalla, Z. et al. (2020). Human–computer collaboration for skin cancer recognition. Nat Med 26: 1229–1234. doi: https://doi.org/10.1038/s41591-020-0942-0

Section 4: Brains on Forrest Gump¶

Time estimate: ~17 mins

Video 8: Brains on Forrest Gump Set-up¶

Konrad has a great dataset - he has someone watching all of the movie Forrest Gump and MRI data (brain imaging) over the whole time the person is watching the movie. So, basically, he has the video stream over time and the brain data over time. He wants to figure out what those two data streams have in common. In other words, he wants to pull the shared information from two data modalities.

Think! 3: Designing a strategy for pulling shared info about brain data and Forrest Gump¶

Given everything you know, how would you design a strategy to get a shared embedding for the brain and video data? Be specific & write down a procedure.

Please discuss as a group. If you get stuck, you can uncover the hints below one at a time. Please spend some time discussing before uncovering the next hint though! You are being real deep learning scientists now and the answers won’t be easy

Student Response¶

# @title Student Response
from ipywidgets import widgets

text=widgets.Textarea(
value='Type your answer here and click on Submit!',
placeholder='Type something',
description='',
disabled=False
)

button = widgets.Button(description="Submit!")

display(text,button)

def on_button_clicked(b):
print("Submission successful!")

button.on_click(on_button_clicked)


We want the two datasets to share something. What does that mean?

Where could the vectors $$\bar{X}_1$$ and $$\bar{X}_2$$ come from? How could they relate to the brain data and video data?

You may want to use more than one neural network!

What do we want our neural network solution to do here? Is there anything you want it to maximize or minimize?

What happens if we multiply all activities by 2? We need a scale invariant solution.

The first thing to note is that we want two embeddings, one for the brain data and a second for the video data.

The second thing to note is that we want these embeddings to capture shared information between the two.

The key is to realize that if both embeddings contain the same information, they should be correlated.

Looking at the formula for Pearson correlation:

(103)$$$\rho = \frac{\text{cov}(X_1, X_2)}{\text{var}(X_1) \cdot \text{var}(X_2)}$$$

Where $$X_1$$ and $$X_2$$ are our two embeddings, to find the correlation between our two embeddings, we take their covariance and normalize it by their combined variance, giving us our scale invariant quantity to optimize.

Imagine the extreme case where there was no noise, and both embeddings extracted the same information. Both embeddings would be perfectly correlated with each other. Conversely, if the two embeddings had no shared information, there would be little to no correlation between them. Therefore, by maximizing the correlation between the two embedding spaces, we’re maximizing the shared information between the two embeddings.

Another way to think about it is that by maximizing the correlation, we’re attempting to have one common embedding between brain data and ANN data. If both networks extract the same information, this will be possible.

The two embeddings will be slightly different if they extract slightly different information. Therefore, the more similar (and thus, more correlated) the embeddings are, the more similar the information extracted.

Video 9: Brains on Forrest Gump Wrap-up¶

Check out the paper mentioned in the above video:

• Andrew, G., Arora, R., Bilmes, J., Livescu, K. (2013). Deep Canonical Correlation Analysis. Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):1247-1255. url: proceedings.mlr.press/v28/andrew13

Summary¶

Time estimate: ~2 mins

Video 10: Wrap up of DL thinking¶

In this set of DL Thinks, we saw several tricks on how to do well when there is very limited data we saw:

• Data augmentation

• Pretraining

• Canonical Correlation Analysis (CCA)

All three can be used in cases where there is limited data available. All three also teach us how the relevant information may be quite clear once we think about it. And how ideas about the world translate into approaches in deep learning.