\n",
"## Click here for hint 1

\n",
"\n",
"You get time-stamps for the spikes. You will want to do binning into 50 ms bins. You get $k_{i, t}$ for every neuron $i$ and time bin $t$, the spike count for that neuron in that time bin. What will the neural network predict?"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"

\n",
"## Click here for hint 2

\n",
"\n",
"For each bin you can use your neural network model to predict an estimate of $\\lambda_{i,t}$, the number of spikes for neuron $i$ expected at that time bin $t$. The network should get as input the relevant aspects of the motorcycle riding at the relevant times (and potentially of the previous times). "
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"

\n",
"## Click here for hint 3

\n",
"\n",
"You need an equation relating $\\lambda_{i,t}$ (the model prediction) with $k_{i, t}$ (your data) where changing $\\lambda_{i,t}$ to minimize or maximize the number resulting from this equation results in better predictions. What do we already know about the relationship between $\\lambda_{i,t}$ and $k_{i, t}$ that helps us here?\n",
"\n",
"Once you have that, how do you extend to incorporate all neurons and time bins?"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"

\n",
"## Click here for hint 4

\n",
"\n",
"We can treat the bins independently as the spikes are random and independent every millisecond. "
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"

\n",
"## Click here for the solution

\n",
"\n",
"\n",
"First, we will convert our spike timing data to the number of spikes per time bin for time bins of size 50 ms. This gives us $k_{i,t}$ for every neuron $i$ and time bin $t$.\n",
"\n",
"We are assuming a Poisson distribution for our spiking. That means that we get the probability of seeing spike count $k_{i, t}$ given underlying firing rate $\\lambda_{i, t}$ using this equation:\n",
"\n",
"\n",
"\\begin{equation}\n",
"\\mathcal{f(k_{i,t}:\\lambda_{i,t})} = \\mathcal{Pr}(X=k_{i,t}) = \\frac {\\lambda_{i,t}^{k_{i,t}}e^{-\\lambda_{i,t}}}{k_{i,t}!}\n",
"\\end{equation}\n",
"\n",
"That seems a pretty good thing to optimize to make our predictions as good as possible! We want a high probability of seeing the actual spike count we recorded given the neural network prediction of the underlying firing rate. \n",
"\n",
"We will make this negative later so we have an equation that we want to minimize rather than maximize, so we can use all our normal tricks for minimization (instead of maximization). First though, let's scale up to include all our neurons and time bins. \n",
"\n",
"We can treat each time bin as independent because, while the underlying probability of firing changes slowly, every milisecond spiking is random and independent. From probability, we know that we can compute the probability of a set of independent events (all the spike counts) by multiplying the probabilities of each event. So the probability of seeing all of our data given the neural network predictions is all of our probabilities of $k_{i,t}$ multiplied together:\n",
"\n",
"\\begin{align}\n",
"\\mathcal{Pr}(\\text{all_data}) &= \\prod_{i=1}^{N}\\prod_{t=1}^\\top \\mathcal{Pr}(X=k_{i,t})\\\\\n",
"&= \\prod_{i=1}^{N}\\prod_{t=1}^\\top \\frac {\\lambda_{i,t}^{k_{i,t}}e^{-\\lambda_{i,t}}}{k_{i,t}!}\n",
"\\end{align}\n",
"\n",
"This is also known as our likelihood!\n",
"\n",
"We usually use the log likelihood instead of the likelihood when minimizing or maximizing for numerical computation reasons. W We can convert the above equation to log likelihood:\n",
"\n",
"\\begin{align}\n",
"\\text{log likelihood} &= \\sum_{i=1}^N\\sum_{t=1}^\\top \\text{log}(\\mathcal{Pr}(X=k_{i,t}) \\\\\n",
"&= \\sum_{i=1}^N\\sum_{t=1}^\\top k_{i,t} \\text{log}(\\lambda_{i,t}) - \\lambda_{i,t} - \\text{log}(k_{i,t}!)\n",
"\\end{align}\n",
"\n",
"And last but not least, we want to make it negative so we can minimize instead of maximize:\n",
"\n",
"\\begin{equation}\n",
"\\text{negative log likelihood} \n",
"= \\sum_{i=1}^N\\sum_{t=1}^\\top - k_{i,t} \\text{log}(\\lambda_{i,t}) + \\lambda_{i,t} + \\text{log}(k_{i,t}!)\n",
"\\end{equation}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Video 4: Spiking Neurons Wrap-up\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 4: Spiking Neurons Wrap-up\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'fb6A03B2U5g'), ('Bilibili', 'BV1K94y117rH')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Check out the papers mentioned in the above video:\n",
"\n",
"- [Fast inference in generalized linear models via expected log likelihood](https://link.springer.com/article/10.1007/s10827-013-0466-4)\n",
"\n",
"- [Machine Learning for Neural Decoding](https://www.eneuro.org/content/7/4/ENEURO.0506-19.2020)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## (Bonus) Think!: Non-Poisson neurons\n",
"\n",
"If you have time discuss the following. The spiking distributions don't seem quite Poisson. Find a good replacement for your cost function."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Section 3: How can an ANN know its uncertainty"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Video 5: ANN Uncertainty Vignette\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 5: ANN Uncertainty Vignette\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'b2N2OJ2u4AM'), ('Bilibili', 'BV1UN4y1u7Ws')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Video 6: ANN Uncertainty Set-up\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 6: ANN Uncertainty Set-up\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'Reh-gNiOwkQ'), ('Bilibili', 'BV1B34y1W7F8')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Lyle wants to build an artificial neural network that has a measure of its own uncertainty about it's predictions. He wants the neural network to give a prediction/estimate and an uncertainty, or standard deviation, measurement on it. \n",
"\n",
"Let's say Lyle wants to estimate the location of an atom in a chemical molecule based on various inputs. He wants to have the estimate of the location and an estimate of the variance. We don't train neural networks on one data point at a time though - he wants a cost function that takes in N data points (input and atom location pairings).\n",
"\n",
"We think we may be able to use a Gaussian distribution to help Lyle here:\n",
"\n",
"\\begin{equation}\n",
"g(x) = \\frac{1}{\\sigma\\sqrt{2\\pi}} \\text{exp} \\left( -\\frac{1}{2}\\frac{(x-\\mu)^2}{\\sigma^2} \\right)\n",
"\\end{equation}"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Think! 2: Designing a cost function so we measure uncertainty\n",
"\n",
"Given everything you know, how would you design a cost function for a neural network that Lyle is training so that he can get the estimate and the uncertainty of the estimate? Try to write out an equation!\n",
"\n",
"Please discuss as a group. If you get stuck, you can uncover the hints below one at a time. Please spend some time discussing before uncovering the next hint, though! You are being real deep learning scientists now, and the answers won't be easy."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"

\n",
"## Click here for hint 1

\n",
"\n",
"Look at the Gaussian equation. What is the true location? Where is there the estimate of location? Where is there the uncertainty? \n",
"\n",
"What do you want the neural network to predict for one data point (recorded location) given the inputs?"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"

\n",
"## Click here for hint 2

\n",
"\n",
"What did you learn from working through Section 2 that you can use here?"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"

\n",
"## Click here for hint 3

\n",
"\n",
"In section 2, you learned that you want to go from probabilities to negative log likelihoods to form cost functions."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"

\n",
"## Click here for the solution

\n",
"\n",
"For a given set of inputs, we want the neural network to predict the location of the atom and the uncertainty of that estimate. Standard deviation is a great measure of uncertainty so we can predict the mean and standard deviation of the location (instead of just the mean as is more common).\n",
"\n",
"So how do we a design a cost function that involves the mean and standard deviation? We can assume a Gaussian distribution over the location. The neural network can predict the mean of that Gaussian (that's the estimate of the location) and the standard deviation of that Gaussian (that's the uncertainty measure) for a given set of inputs.\n",
"\n",
"Now that we've got that figured out, we can take a very similar approach to what we did in Section 2 with spiking neurons. For a given data point $i$, the neural network predicts the mean ($\\mu_i$) and standard deviation ($\\sigma_i$) of the location given the inputs. We can then compute the probability of seeing the actual recorded location ($x_i$) given these predictions:\n",
"\n",
"\\begin{equation}\n",
"g(x) = \\frac{1}{\\sigma\\sqrt{2\\pi}} \\text{exp}\\left( -\\frac{1}{2}\\frac{(x_i-\\mu_i)^2}{\\sigma_i^2} \\right)\n",
"\\end{equation}\n",
"\n",
"The location of the atom is independent in each data point so we can get the overall likelihood by multiplying the probabilities for the individual data points.\n",
"\\begin{equation}\n",
"\\text{likelihood} = \\prod_{i=1}^N\\frac{1}{\\sigma\\sqrt{2\\pi}} \\text{exp}\\left( -\\frac{1}{2}\\frac{(x_i-\\mu_i)^2}{\\sigma_i^2} \\right)\n",
"\\end{equation}\n",
"\n",
"\n",
"And, as before, we want to take the log of this for numerical reasons and convert to negative log likelihood:\n",
"\n",
"\\begin{equation}\n",
"\\text{negative log likelihood} = \\sum_{i=1}^N \\text{log} \\left( \\frac{1}{\\sigma\\sqrt{2\\pi}} \\text{exp}\\left( -\\frac{1}{2}\\frac{(x_i-\\mu_i)^2}{\\sigma_i^2} \\right) \\right) \n",
"\\end{equation}\n",
"\n",
"Changing the parameters of the neural network so it predicts $\\mu_i$ and $\\sigma_i$ that minimize this equation will give us (hopefully fairly accurate) predictions of the location and the network uncertainty about the location!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Video 7: ANN Uncertainty Wrap-up\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 7: ANN Uncertainty Wrap-up\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'QBKAFRaC8SY'), ('Bilibili', 'BV1zv4y1M7C8')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Check out the papers mentioned in the above video:\n",
"\n",
"- [Rapid prediction of NMR spectral properties with quantified uncertainty](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0374-3)\n",
"\n",
"- [Deep imitation learning for molecular inverse problems](https://papers.nips.cc/paper/2019/file/b0bef4c9a6e50d43880191492d4fc827-Paper.pdf)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## (Bonus) Think!: Negative standard deviations\n",
"\n",
"If the standard deviation is negative, the negative log-likelihood will fail as you'd take the log of a negative number. What should we do to ensure we don't run into this while training our neural network?"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Section 4: Embedding faces"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Video 8: Embedding Faces Vignette\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 8: Embedding Faces Vignette\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'tF0iYBAnyrI'), ('Bilibili', 'BV1NY411K7f6')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Video 9: Embedding Faces Set-up\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 9: Embedding Faces Set-up\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'JrzicfOxqP0'), ('Bilibili', 'BV1fv4y1M7eQ')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Konrad needs help recognizing faces. He wants to build a network that embeds photos of faces so that photos of the same person are nearby in the embedding space and photos of different people are far in the embedding space. We can't just use pixel space because the pixels will be very different between a photo of someone straight on vs. from their side! \n",
"\n",
"We will use a neural network to go from the pixels of each image to an embedding space. Let's say you have a convolutional neural network with m units in the last layer. If you feed a face photo $i$ through the CNN, the activities of the units in the last layer form an $m$ dimensional vector $\\bar{y}_i$ - this is an embedding of that face photo in $m$ dimensional space. \n",
"\n",
"We think we might be able to incorporate Euclidean distance to help us here. The Euclidean distance between two vectors is:\n",
"\n",
"\\begin{equation}\n",
"d(\\bar{y}_i, \\bar{y}_j) = \\sqrt{\\sum_{c=1}^m(\\bar{y}_{i_c} - \\bar{y}_{j_c})^2}\n",
"\\end{equation}\n",
"\n",
"

\n", "\n", "**Note:** a minor remark here, there is an indexing error in the video where it says $i$ instead of $j$." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Think! 3: Designing a cost function for face embedding\n", "\n", "Given everything you know, how would you design a cost function for a neural network that Konrad is training so that he can get a helpful embedding of faces? Try to write out an equation!\n", "\n", "Please discuss as a group. If you get stuck, you can uncover the hints below one at a time. Please spend some time discussing before uncovering the next hint, though! You are being real deep learning scientists now, and the answers won't be easy." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "

\n", "\n", "**Note:** a minor remark here, there is an indexing error in the video where it says $i$ instead of $j$." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Think! 3: Designing a cost function for face embedding\n", "\n", "Given everything you know, how would you design a cost function for a neural network that Konrad is training so that he can get a helpful embedding of faces? Try to write out an equation!\n", "\n", "Please discuss as a group. If you get stuck, you can uncover the hints below one at a time. Please spend some time discussing before uncovering the next hint, though! You are being real deep learning scientists now, and the answers won't be easy." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "

\n",
"## Click here for hint 1

\n",
"\n",
"How do we want to deal with the same faces? Can we just build a cost function based on similar faces? What would happen?"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"

\n",
"## Click here for hint 2

\n",
"\n",
"You need to also include different faces. How do you want to deal with different faces? "
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"

\n",
"## Click here for hint 3

\n",
"\n",
"Similar faces should have low Euclidean distance between their embeddings. Different faces should have high Euclidean distance between their embeddings. Can we phrase this with 3 faces?"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"

\n",
"## Click here for the solution

\n",
"\n",
"We want the same faces to have similar embeddings. Let's say we have one photo of Lyle $a$ and another photo of Lyle $p$. We want the embeddings of those photos to be very similar: we want the Euclidean distance between $\\bar{y}_a$ and $\\bar{y}_p$ (the activitys of the last layer of the CNN when photo $a$ and $p$ are fed through) to be small. \n",
"\n",
"So one possible cost function is:\n",
"\n",
"\\begin{equation}\n",
"\\text{Cost function} = d(\\bar{y}_a, \\bar{y}_p)\n",
"\\end{equation}\n",
"\n",
"Imagine if we just feed in pairs of the same face and minimize that though. There would be no motivation to ever have different embeddings, we would be only minimizing the distance between embeddings. If the CNN was smart, it would just have the same embedding for every single photo - then the cost function would equal 0!\n",
"\n",
"This is clearly not what we want. We want to motivate the CNN to have similar embeddings only when the faces are the same. This means we need to also train it to maximize distance when the faces are different.\n",
"\n",
"We could choose another two photos of different people and maximize that distance but then there's no relation to the embeddings we've already established of the two photos of Lyle. Instead, we will add one more photo to the mix: a photo of Konrad $n$. We want the distance of this photo to be far from our original photos of Lyle $a$ and $p$. So we want the distance between $a$ and $p$ to be small and the distance between $a$ and $n$ for example to be large:\n",
"\n",
"\\begin{equation}\n",
"\\text{Cost function} = d(\\bar{y}_a, \\bar{y}_p) - d(\\bar{y}_a, \\bar{y}_n)\n",
"\\end{equation}\n",
"\n",
"We could compare $n$ to both $a$ and $p$:\n",
"\\begin{equation}\n",
"\\text{Cost function} = d(\\bar{y}_a, \\bar{y}_p) - d(\\bar{y}_a, \\bar{y}_n) - d(\\bar{y}_p, \\bar{y}_n)\n",
"\\end{equation}\n",
"\n",
"But then the cost function is a bit unbalanced, there are two dissimiliarty terms and they might dominate (so achieving the similarity is less important). So let's go with just including one dissimilarity term.\n",
"\n",
"This is an established cost function - triplet loss! We chose the subscripts $a$, $p$, and $n$ for a reason: we have an anchor image, a positive image (the same person's face as the anchor) and a negative image (a different person's face as the anchor). We can then sum over N data points where each data point is a set of three images: \n",
"\n",
"\\begin{equation}\n",
"\\text{Cost function} = \\sum_{i=1}^N [d(\\bar{y}_{a, i}, \\bar{y}_{p, i}) - d(\\bar{y}_{a, i}, \\bar{y}_{n, i})]\n",
"\\end{equation}\n",
"\n",
"There's one little addition in triplet loss. Instead of just using the above cost function, researchers add a constant $\\alpha$ and then make the cost function 0 if it becomes negative. Why do you think they do this?\n",
"\n",
"\\begin{equation}\n",
"\\text{Cost function} = \\text{max} \\left( \\sum_{i=1}^N \\left[ d(\\bar{y}_{a, i}, \\bar{y}_{p, i}) - d(\\bar{y}_{a, i}, \\bar{y}_{n, i}) + \\alpha \\right], 0 \\right)\n",
"\\end{equation}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Video 10: Embedding Faces Wrap-up\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 10: Embedding Faces Wrap-up\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'mVk1W7x6Nps'), ('Bilibili', 'BV1nf4y1f7oL')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Check out the papers mentioned in the above video:\n",
"\n",
"- [Large Scale Online Learning of Image Similarity Through Ranking](https://www.jmlr.org/papers/volume11/chechik10a/chechik10a.pdf)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Summary\n",
"\n",
"Today we have seen a range of different cost functions. So we want to dwell a bit on what we want people to take away from these exercises. We have seen several cost functions:\n",
"\n",
"* Log Poisson likelihood for neurons\n",
"* Uncertainty as a modeled entity\n",
"* Face embeddings\n",
"\n",
"What we saw in all these cases is that these cost functions emerge from insights into the problem domain. We saw how one needs to, in a way, pull these insights out of the domain experts. And how, at the same time, the cost functions come from computational insights. Coming up with the proper cost functions requires listening to what domain experts say and probing the things they may mean but not say."
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"include_colab_link": true,
"name": "W2D2_Tutorial2",
"provenance": [],
"toc_visible": true
},
"kernel": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}