Tutorial 1: Deep Learning Thinking 2: Architectures and Multimodal DL thinking#
Week 3, Day 2: DL Thinking 2
By Neuromatch Academy
Content creators: Konrad Kording, Lyle ungar
Content reviewers: Kelson Shilling-Scrivo
Content editors: Kelson Shilling-Scrivo
Production editors: Gagana B, Spiros Chavlis
Tutorial Objectives#
In this tutorial, you will practice thinking like a deep learning practitioner and figure out how to design architectures for different scenarios.
By the end of this tutorial, you will be better able to:
Know how to proceed when low on data
Have a toolbox of what to do in non-standard situations
We will also continue to see how to get relevant information out of domain experts, arguably the central skill of DL and how to convert insights into domains into the logic of actual approaches.
Setup#
Install and import feedback gadget#
Show code cell source
# @title Install and import feedback gadget
!pip3 install vibecheck datatops --quiet
from vibecheck import DatatopsContentReviewContainer
def content_review(notebook_section: str):
return DatatopsContentReviewContainer(
"", # No text prompt
notebook_section,
{
"url": "https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab",
"name": "neuromatch_dl",
"user_key": "f379rz8y",
},
).render()
feedback_prefix = "W3D2_T1"
Section 1: Intro Deep Learning Thinking 2#
Time estimate: ~4 mins
Video 1: Intro to DL Thinking 2#
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Intro_to_DL_Thinking_2_Video")
Like Deep Learning thinking 1 last week, this tutorial is a bit different from others - there will be no coding! Instead, you will watch a series of vignettes about various scenarios where you want to use a neural network. This tutorial will focus on various architectures and multimodal thinking.
Each section below will start with a vignette where either Lyle or Konrad is trying to figure out how to set up a neural network for a specific problem. Try to think of questions you want to ask them as you watch, then pay attention to what questions Lyle and Konrad are asking. Were they what you would have asked? How do their questions help quickly clarify the situation?
Section 2: Getting More Data#
Time estimate: ~15 mins
Video 2: Getting More Data Vignette#
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Getting_More_Vignette_Video")
Konrad wants to build a neural network that classifies images based on the objects contained within them. He needs more data to help him train an accurate network, but buying more images is costly. He needs a different solution.
Think! 1: Designing a strategy to get more data#
Given everything you know, how would you design a strategy to get some more data (pairs of images and the label of the object they are of) for the image classification neural network that Konrad is training? Be specific & write down a procedure.
Please discuss as a group. If you get stuck, you can uncover the hints below one at a time. Please spend some time discussing before uncovering the next hint, though! You are being real deep learning scientists now, and the answers won’t be easy.
Click here for hint 1
Look at a few photos of dogs (use an image search engine). How are they similar? How are they different? What makes them all be dogs?
Click here for hint 2
We don’t need to obtain any new images in order to give more examples of each object to our neural network.
Click here for hint 3
Think about color, orientation, flipping, pixel noise, color noise, shearing, contrast, brightness and scaling
Click here for hint 4
Discuss where each of these ideas will break down. Can too much of a good thing be good?
Click here for solution
Instead of collecting new data, we can create multiple examples for the neural network of each of our existing images by changing things like flipping them horizontally, shifting them horizontally or vertically by some number of pixels, scaling them to be larger or smaller (and cropping), rotating them, and changing their contrast and brightness.
This is called data augmentation and is a very commonly used and is an important strategy for training neural networks.
Importantly, we need to be careful about the amount we change each image, and we still want them to be useful training images! Let’s say you have a photo of a dog, and you scale it to be 1000x bigger and crop the middle out. You’d have just an image of fur - this would not be very useful as a training example on how to classify dogs. So we want to change factors about the images but not so much that they are no longer recognizable as the original object.
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Getting_More_Data_Discussion")
Wrap-up#
Video 3: Getting More Data Wrap-up#
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Getting_More_Data_WrapUp_Video")
Check out the paper mentioned in the above video:
Balestriero, R., Bottou, L., LeCun, Y. (2022). The Effects of Regularization and Data Augmentation are Class Dependent. arxiv: 2204.03632
(Bonus) Think!: Class-based strategies#
Discuss how you may want to vary these strategies based on the class of the object/images.
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_ClassBased_strategies_Bonus_Discussion")
Section 3: Detecting Tumors - What to do if there still isn’t enough data#
Time estimate: ~15 mins
Video 4: Detecting Tumors Vignette#
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Detecting_Tumors_Vignette_Video")
Video 5: Detecting Tumors Set-up#
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Detecting_Tumors_SetUp_Video")
Konrad works for a hospital and wants to train a neural network to detect tumors in brain scans automatically. This type of tumor is pretty rare, which is great for humanity but means we only have a few thousand training examples for our neural network. This isn’t enough.
Even with adding in images of other types of tumors, we don’t have enough data. We have a lot of images of other things in ImageNet, like cats and dogs, though! Maybe we can use that?
Think! 2: Designing a strategy for detecting tumors#
Given everything you know, how would you design a strategy to be able to train an accurate tumor-detecting neural network? Be specific & write down a procedure.
Please discuss as a group. If you get stuck, you can uncover the hints below one at a time. Please spend some time discussing before uncovering the next hint, though! You are being real deep learning scientists now, and the answers won’t be easy.
Click here for hint 1
Data augmentation is always something to consider
Click here for hint 2
A human learning to detect tumors is not learning how to see from scratch just based on the tumor images.
Click here for hint 3
You could use another dataset to help. What properties should such a dataset have?
Click here for hint 4
Even though the images in ImageNet are not of tumors, natural images contain information on aspects of visual objects that are similar to tumors (that they’re coherent, locally smooth, etc)
If you train a neural network on ImageNet first so that it learns general vision and embeddings of images, what might you want to change when training on the tumor images dataset?
Click here for solution
Humans don’t learn to see when they learn a new classification task. We already have a trained visual system that is good at processing and learning embeddings for natural images.
We can replicate this in neural networks! First, we can train our neural network on ImageNet alone to do object classification. This gives us a neural network that has already learned how to process and embed images.
Then, we want to take this neural network and continue to train it on just the tumor classification dataset. We can chop off the existing final layer (that outputs the probabilities of all the ImageNet classes) and train a new one that outputs the probability of there being a tumor in the image.
We could keep all the weights in the convolutional layers fixed, not allowing them to change after the ImageNet training or fine-tune them. People take both strategies!
This whole process is called pre-training. We have pre-trained the neural network on ImageNet before training on our actual task, the detection of tumors.
We should mention here that there are many ways of doing this. Train the whole network after training on a first task. Train the top layers after training the bottom layers. Potentially first do the latter and then the former. Pre-training can be done in many ways - looking for it as an opportunity is important.
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Detecting_Tumors_Discussion")
Wrap-up#
Video 6: Detecting Tumors Wrap-up#
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Detecting_Tumors_WrapUp_Video")
Check out the paper mentioned in the above video:
Tschandl, P., Rinner, C., Apalla, Z. et al. (2020). Human–computer collaboration for skin cancer recognition. Nat Med 26: 1229–1234. doi: 10.1038/s41591-020-0942-0
Section 4: Brains on Forrest Gump#
Time estimate: ~17 mins
Video 7: Brains on Forrest Gump Vignette#
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Brains_on_Forrest_Gump_Vignette_Video")
Video 8: Brains on Forrest Gump Set-up#
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Brains_on_Forrest_Gump_SetUp_Video")
Konrad has a great dataset - he has someone watching all of the movie Forrest Gump and MRI data (brain imaging) over the whole time the person is watching the movie. So, basically, he has the video stream over time and the brain data over time. He wants to figure out what those two data streams have in common. In other words, he wants to pull the shared information from two data modalities.
Wrap-up#
Video 9: Brains on Forrest Gump Wrap-up#
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Brains_on_Forrest_Gump_WrapUp_Video")
Check out the paper mentioned in the above video:
Andrew, G., Arora, R., Bilmes, J., Livescu, K. (2013). Deep Canonical Correlation Analysis. Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):1247-1255. url: proceedings.mlr.press/v28/andrew13
Summary#
Time estimate: ~2 mins
Video 10: Wrap-up of DL thinking#
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_WrapUp_of_DL_thinking_Video")
In this tutorial, we saw several tricks on how to do well when there is very limited data we saw:
Data augmentation
Pretraining
Canonical Correlation Analysis (CCA)
All three can be used in cases where there is limited data available. All three also teach us how the relevant information may be quite clear once we think about it. And how ideas about the world translate into approaches in deep learning.
Daily survey#
Don’t forget to complete your reflections and content check in the daily survey! Please be patient after logging in as there is a small delay before you will be redirected to the survey.