Open In Colab   Open in Kaggle

Modeling Steps 1 - 2#

By Neuromatch Academy

Content creators: Marius ‘t Hart, Megan Peters, Paul Schrater, Gunnar Blohm

Content reviewers: Eric DeWitt, Tara van Viegen, Marius Pachitariu

Production editors: Ella Batty, Spiros Chavlis


Note: This is the same as NMA CN W1D2 Tutorial 1 - we provide it here as well for ease of access.


Objectives#

We deconstruct the modeling process and break it down into 10 easy steps. Following the thought process of these steps will help you design and complete a Deep Learning (DL) project.

We assume that you have a general idea of a project in mind, i.e., a preliminary question, goal, and/or phenomenon you would like to investigate. These 10 steps were originally developed for computational neuroscience models; but they really apply to any research project. We will now work through the 10 steps of modeling (Blohm et al., 2019).

We provide 3 example projects:

  • a neuro theory model (if you’re comp neuro inclined) - this is also our roleplay example! See the corresponding notebook here

  • a brain decoding model (simple logistic regression; if your data science inclined)! See the corresponding notebook here

  • a movement classification model (Convolutional Neural Network; if you’re DL inclined)! See the corresponding notebook here

Video 0: 10 steps overview#


Setup#

# Imports
import numpy as np
import matplotlib.pyplot as plt

# for random distributions:
from scipy.stats import norm, poisson

# for logistic regression:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

Plotting Functions#

Hide code cell source
# @title Plotting Functions

def rasterplot(spikes,movement,trial):

  [movements, trials, neurons, timepoints] = np.shape(spikes)

  trial_spikes = spikes[movement,trial,:,:]

  trial_events = [((trial_spikes[x,:] > 0).nonzero()[0]-150)/100 for x in range(neurons)]

  plt.figure()
  dt=1/100
  plt.eventplot(trial_events, linewidths=1);
  plt.title('movement: %d - trial: %d'%(movement, trial))
  plt.ylabel('neuron')
  plt.xlabel('time [s]')

def plotCrossValAccuracies(accuracies):
  f, ax = plt.subplots(figsize=(8, 3))
  ax.boxplot(accuracies, vert=False, widths=.7)
  ax.scatter(accuracies, np.ones(8))
  ax.set(
    xlabel="Accuracy",
    yticks=[],
    title=f"Average test accuracy: {accuracies.mean():.2%}"
  )
  ax.spines["left"].set_visible(False)

Generate Data#

Hide code cell source
# @title Generate Data

# @markdown `generateSpikeTrains(seed=37)`

# @markdown `subsetPerception(spikes, seed=0)`

def generateSpikeTrains(seed=37):

  gain = 2
  neurons = 50
  movements = [0, 1, 2]
  repetitions = 800

  np.random.seed(seed)

  # set up the basic parameters:
  dt = 1/100
  start, stop = -1.5, 1.5
  t = np.arange(start, stop + dt, dt)  # a time interval
  Velocity_sigma = 0.5  # std dev of the velocity profile
  Velocity_Profile = norm.pdf(t, 0, Velocity_sigma)/norm.pdf(0, 0, Velocity_sigma)  # The Gaussian velocity profile, normalized to a peak of 1

  # set up the neuron properties:
  Gains = np.random.rand(neurons) * gain  # random sensitivity between 0 and `gain`
  FRs   = (np.random.rand(neurons) * 60 ) - 10  # random base firing rate between -10 and 50

  # output matrix will have this shape:
  target_shape = [len(movements), repetitions, neurons, len(Velocity_Profile)]

  # build matrix for spikes, first, they depend on the velocity profile:
  Spikes = np.repeat(Velocity_Profile.reshape([1, 1, 1, len(Velocity_Profile)]),
                     len(movements)*repetitions*neurons, axis=2).reshape(target_shape)

  # multiplied by gains:
  S_gains = np.repeat(np.repeat(Gains.reshape([1, 1, neurons]),
                                len(movements)*repetitions, axis=1).reshape(target_shape[:3]),
                      len(Velocity_Profile)).reshape(target_shape)
  Spikes = Spikes * S_gains

  # and multiplied by the movement:
  S_moves = np.repeat( np.array(movements).reshape([len(movements), 1, 1, 1]),
                      repetitions*neurons*len(Velocity_Profile),
                      axis=3 ).reshape(target_shape)

  Spikes = Spikes * S_moves

  # on top of a baseline firing rate:
  S_FR = np.repeat(np.repeat(FRs.reshape([1, 1, neurons]),
                             len(movements)*repetitions, axis=1).reshape(target_shape[:3]),
                   len(Velocity_Profile)).reshape(target_shape)
  Spikes = Spikes + S_FR

  # can not run the poisson random number generator on input lower than 0:
  Spikes = np.where(Spikes < 0, 0, Spikes)

  # so far, these were expected firing rates per second, correct for dt:
  Spikes = poisson.rvs(Spikes * dt)

  return Spikes


def subsetPerception(spikes, seed=0):

  movements = [0, 1, 2]
  split = 400
  subset = 40
  hwin = 3

  [num_movements, repetitions, neurons, timepoints] = np.shape(spikes)

  decision = np.zeros([num_movements, repetitions])

  # ground truth for logistic regression:
  y_train = np.repeat([0, 1, 1], split)
  y_test  = np.repeat([0, 1, 1], repetitions - split)

  m_train = np.repeat(movements, split)
  m_test  = np.repeat(movements, split)

  # reproduce the time points:
  dt = 1 / 100
  start, stop = -1.5, 1.5
  t = np.arange(start, stop+dt, dt)

  w_idx = list((abs(t) < (hwin*dt)).nonzero()[0])
  w_0 = min(w_idx)
  w_1 = max(w_idx) + 1  # python...0

  # get the total spike counts from stationary and movement trials:
  spikes_stat = np.sum(spikes[0, :, :, :], axis=2)
  spikes_move = np.sum(spikes[1:, :, :, :], axis=3)

  train_spikes_stat = spikes_stat[:split, :]
  train_spikes_move = spikes_move[:, :split, :].reshape([-1 ,neurons])

  test_spikes_stat = spikes_stat[split:, :]
  test_spikes_move = spikes_move[:, split:, :].reshape([-1, neurons])

  # data to use to predict y:
  x_train = np.concatenate((train_spikes_stat, train_spikes_move))
  x_test = np.concatenate(( test_spikes_stat, test_spikes_move))

  # this line creates a logistics regression model object, and immediately fits it:
  population_model = LogisticRegression(solver='liblinear',
                                        random_state=seed).fit(x_train, y_train)

  # solver, one of: 'liblinear', 'newton-cg', 'lbfgs', 'sag', and 'saga'
  # some of those require certain other options
  # print(population_model.coef_)       # slope
  # print(population_model.intercept_)  # intercept

  ground_truth = np.array(population_model.predict(x_test))
  ground_truth = ground_truth.reshape([3, -1])

  output = {}
  output['perception'] = ground_truth
  output['spikes'] = spikes[:, split:, :subset, :]

  return output
def getData():

  spikes = generateSpikeTrains()
  dataset = subsetPerception(spikes=spikes)

  return dataset


dataset = getData()
perception = dataset['perception']
spikes = dataset['spikes']

Demos#

We will demo the modeling process to you based on the train illusion. The introductory video will explain the phenomenon to you. Then we will do roleplay to showcase some common pitfalls to you based on a computational modeling project around the train illusion. In addition to the computational model, we will also provide a data neuroscience project example to you so you can appreciate similarities and differences.

Enjoy!

Disclaimer#

The pitfalls roleplay videos were developed for a computational neuroscience modeling project. But all steps and pitfalls also apply to Deep Learning projects. There is a DL joke throughout these videos; that does NOT mean we do not like or appreciate DL (on the contrary, otherwise we would not be teaching it here). But it should serve as a warning that DL is not a magic answer to all questions… 😉

Video 1: Introduction to tutorial#


Step 1: Finding a phenomenon and a question to ask about it#

Video 2: Asking a question#

Example projects step 1#

Hide code cell source
# @title Example projects step 1
from ipywidgets import widgets
from IPython.display import Markdown

markdown1 = '''

## Step 1
<br>
<font size='3pt'>
The train illusion occurs when sitting on a train and viewing another train outside the window. Suddenly, the other train *seems* to move, i.e. you experience visual motion of the other train relative to your train. But which train is actually moving?

Often people have the wrong percept. In particular, they think their own train might be moving when it's the other train that moves; or vice versa. The illusion is usually resolved once you gain vision of the surroundings that lets you disambiguate the relative motion; or if you experience strong vibrations indicating that it is indeed your own train that is in motion.


We asked the following (arbitrary) question for our demo project: "How do noisy vestibular estimates of motion lead to illusory percepts of self motion?"
</font>
'''

markdown2 = '''
## Step 1
<br>

<font size='3pt'>
The train illusion occurs when sitting on a train and viewing another train outside the window. Suddenly, the other train *seems* to move, i.e. you experience visual motion of the other train relative to your train. But which train is actually moving?

Often people mix this up. In particular, they think their own train might be moving when it's the other train that moves; or vice versa. The illusion is usually resolved once you gain vision of the surroundings that lets you disambiguate the relative motion; or if you experience strong vibrations indicating that it is indeed your own train that is in motion.


We assume that we have build the train illusion model (see the other example project colab). That model predicts that accumulated sensory evidence from vestibular signals determines the decision of whether self-motion is experienced or not. We now have vestibular neuron data (simulated in our case, but let's pretend) and would like to see if that prediction holds true.

The data contains *N* neurons and *M* trials for each of 3 motion conditions: no self-motion, slowly accelerating self-motion and faster accelerating self-motion. In our data,
*N* = 40 and *M* = 400.

**So we can ask the following question**: "Does accumulated vestibular neuron activity correlate with self-motion judgements?"
</font>
'''

markdown3 = '''
## Step 1
<br>
<font size='3pt'>
There are many different questions we could ask with the MoVi dataset. We will start with a simple question: "Can we classify movements from skeletal motion data, and if so, which body parts are the most informative ones?"

Our goal is to perform a pilot study to see if this is possible in principle. We will therefore use "ground truth" skeletal motion data that has been computed using an inference algorithm (see MoVi paper). If this works out, then as a next step we might want to use the raw sensor data or even videos...

The ultimate goal could for example be to figure out which body parts to record movements from (e.g. is just a wristband enough?) to classify movement.
</font>
'''

out2 = widgets.Output()
with out2:
  display(Markdown(markdown2))

out1 = widgets.Output()
with out1:
  display(Markdown(markdown1))

out3 = widgets.Output()
with out3:
  display(Markdown(markdown3))

out = widgets.Tab([out1, out2, out3])
out.set_title(0, 'Computational Model')
out.set_title(1, 'Data Analysis')
out.set_title(2, 'Deep Learning')

display(out)

Asking your own question#

You should already have a project idea from your brainstorming yesterday. Write down the phenomenon, question and goal(s) if you have them.

As a reminder, here is what you should discuss and write down:

  • What exact aspect of data needs modeling?

    • Answer this question clearly and precisely! Otherwise you will get lost (almost guaranteed)

    • Write everything down!

    • Also identify aspects of data that you do not want to address (yet)

  • Define an evaluation method!

    • How will you know your modeling is good?

    • E.g., comparison to specific data (quantitative method of comparison?)

  • For computational models: think of an experiment that could test your model

    • You essentially want your model to interface with this experiment, i.e. you want to simulate this experiment

You can find interesting questions by looking for phenomena that differ from your expectations. In what way does it differ? How could that be explained (starting to think about mechanistic questions and structural hypotheses)? Why could it be the way it is? What experiment could you design to investigate this phenomenon? What kind of data would you need?

Make sure to avoid the pitfalls!

Click here for a recap on pitfalls
  1. Question is too general

  • Remember: science advances one small step at the time. Get the small step right…

  1. Precise aspect of phenomenon you want to model is unclear

  • You will fail to ask a meaningful question

  1. You have already chosen a toolkit

  • This will prevent you from thinking deeply about the best way to answer your scientific question

  1. You don’t have a clear goal

  • What do you want to get out of modeling?

  1. You don’t have a potential experiment in mind

  • This will help concretize your objectives and think through the logic behind your goal

Note

The hardest part is Step 1. Once that is properly set up, all other should be easier. BUT: often you think that Step 1 is done only to figure out in later steps (anywhere really) that you were not as clear on your question and goal than you thought. Revisiting Step 1 is frequent necessity. Don’t feel bad about it. You can revisit Step 1 later; for now, let’s move on to the nest step.


Step 2: Understanding the state of the art & background#

Here you will do a literature review (to be done AFTER this tutorial!).

Video 3: Literature Review & Background Knowledge#

Example projects step 2#

Hide code cell source
# @title Example projects step 2
from ipywidgets import widgets
from IPython.display import Markdown
import numpy as np

markdown1 = '''

## Step 2

<br>
<font size='3pt'>

You have learned all about the vestibular system in the Intro video. This is also where you would do a literature search to learn more about what's known about self-motion perception and vestibular signals. You would also want to examine any attempts to model self-motion, perceptual decision making and vestibular processing.</font>
'''

markdown21 = '''
## Step 2

<br>
<font size='3pt'>
While it seems a well-known fact that vestibular signals are noisy, we should check if we can also find this in the literature.

Let's also see what's in our data, there should be a 4d array called `spikes` that has spike counts (positive integers), a 2d array called `perception` with self-motion judgements (0=no motion or 1=motion). Let's see what this data looks like:

</font><br>
'''

markdown22 = '''
<br>
<font size='3pt'>
In the `spikes` array, we see our 3 acceleration conditions (first dimension), with 400 trials each (second dimensions) and simultaneous recordings from 40 neurons (third dimension), across 3 seconds in 10 ms bins (fourth dimension). The first two dimensions are also there in the `perception` array.

Perfect perception would have looked like [0, 1, 1]. The average judgements are far from correct (lots of self-motion illusions) but they do make some sense: it's closer to 0 in the no-motion condition and closer to 1 in both of the real-motion conditions.

The idea of our project is that the vestibular signals are noisy so that they might be mis-interpreted by the brain. Let's see if we can reproduce the stimuli from the data:
</font>
<br>
'''

markdown23 = '''
<br>
<font size='3pt'>
Blue is the no-motion condition, and produces flat average spike counts across the 3 s time interval. The orange and green line do show a bell-shaped curve that corresponds to the acceleration profile. But there also seems to be considerable noise: exactly what we need. Let's see what the spike trains for a single trial look like:
</font>
<br>
'''

markdown24 = '''
<br>
<font size='3pt'>
You can change the trial number in the bit of code above to compare what the rasterplots look like in different trials. You'll notice that they all look kind of the same: the 3 conditions are very hard (impossible?) to distinguish by eye-balling.

Now that we have seen the data, let's see if we can extract self-motion judgements from the spike counts.
</font>
<br>
'''

display(Markdown(r""))


markdown3 = '''

## Step 2

<br>
<font size='3pt'>

Most importantly, our literature review needs to address the following:
* what modeling approaches make it possible to classify time series data?
* how is human motion captured?
* what exactly is in the MoVi dataset?
* what is known regarding classification of human movement based on different measurements?

What we learn from the literature review is too long to write out here... But we would like to point out that human motion classification has been done; we're not proposing a very novel project here. But that's ok for an NMA project!
</font>
<br>
'''



out2 = widgets.Output()
with out2:
  display(Markdown(markdown21))
  print(f'The shape of `spikes` is: {np.shape(spikes)}')
  print(f'The shape of `perception` is: {np.shape(perception)}')
  print(f'The mean of `perception` is: {np.mean(perception, axis=1)}')
  display(Markdown(markdown22))

  for move_no in range(3):
    plt.plot(np.arange(-1.5, 1.5 + (1/100),
                       (1/100)),
             np.mean(np.mean(spikes[move_no, :, :, :],
                             axis=0),
                     axis=0),
             label=['no motion', '$1 m/s^2$', '$2 m/s^2$'][move_no])
  plt.xlabel('time [s]');
  plt.ylabel('averaged spike counts');
  plt.legend()
  plt.show()

  display(Markdown(markdown23))

  for move in range(3):
    rasterplot(spikes = spikes, movement = move, trial = 0)
  plt.show()

  display(Markdown(markdown24))

out1 = widgets.Output()
with out1:
  display(Markdown(markdown1))

out3 = widgets.Output()
with out3:
  display(Markdown(markdown3))

out = widgets.Tab([out1, out2, out3])
out.set_title(0, 'Computational Model')
out.set_title(1, 'Data Analysis')
out.set_title(2, 'Deep Learning')

display(out)
<IPython.core.display.Markdown object>

Here you will do a literature review. For the projects, do not spend too much time on this. A thorough literature review could take weeks or months depending on your prior knowledge of the field…

The important thing for your project here is not to exhaustively survey the literature but rather to learn the process of modeling. 1-2 days of digging into the literature should be enough!

Here is what you should get out of it:

  • Survey the literature

    • What’s known?

    • What has already been done?

    • Previous models as a starting point?

    • What hypotheses have been emitted in the field?

    • Are there any alternative / complementary modeling approaches?

  • What skill sets are required?

    • Do I need learn something before I can start?

    • Ensure that no important aspect is missed

  • Potentially provides specific data sets / alternative modeling approaches for comparison