Open In Colab   Open in Kaggle

Music classification and generation with spectrograms

By Neuromatch Academy

Content creators: Beatrix Benko, Lina Teichmann

Our 2021 Sponsors, including Presenting Sponsor Facebook Reality Labs

This notebook

This notebook loads the GTZAN dataset which includes audiofiles and spectrograms. You can use this dataset or find your own. The first part of the notebook is all about data visualization and show how to make spectrograms from audiofiles. The second part of the notebook includes a CNN that is trained on the spectrograms to predict music genre. Below we also provide links to tutorials and other resources if you want to try to do some of the harder project ideas.

Have fun :)

Acknowledgements

This notebook was written by Beatrix Benkő and Lina Teichmann.

Useful code examples:

https://towardsdatascience.com/music-genre-classification-with-python-c714d032f0d8

https://pytorch.org/vision/stable/models.html

https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py

https://github.com/kamalesh0406/Audio-Classification

https://github.com/zcaceres/spec_augment

https://musicinformationretrieval.com/ipython_audio.html


Setup

Install dependencies

# @title Install dependencies
!sudo apt-get install -y ffmpeg --quiet
!pip install librosa --quiet
!pip install imageio --quiet
!pip install imageio-ffmpeg --quiet
Reading package lists...
Building dependency tree...
Reading state information...
ffmpeg is already the newest version (7:3.4.8-0ubuntu0.2).
0 upgraded, 0 newly installed, 0 to remove and 40 not upgraded.
# Import necessary libraries.
import os
import glob
import imageio
import random, shutil
import torch
import torch.nn as nn
from tqdm.notebook import tqdm
import torch.nn.functional as F
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as display
import librosa
import librosa.display
import requests

fname = "music.zip"
url = "https://osf.io/drjhb/download"

if not os.path.isfile(fname):
  try:
    r = requests.get(url)
  except requests.ConnectionError:
    print("!!! Failed to download data !!!")
  else:
    if r.status_code != requests.codes.ok:
      print("!!! Failed to download data !!!")
    else:
      with open(fname, "wb") as fid:
        fid.write(r.content)

Loading GTZAN dataset (includes spectrograms)

The GTZAN dataset for music genre classification can be dowloaded from Kaggle: https://www.kaggle.com/andradaolteanu/gtzan-dataset-music-genre-classification.

To download from Kaggle using this code you need to download and copy over your api token. In Kaggle go to the upper right side -> account -> API -> create API token. This downloads a json file. Copy the content into api_token. It should look like this:

api_token = {“username”:”johnsmith”,”key”:”123a123a123”}

from zipfile import ZipFile

with ZipFile(fname, 'r') as zipObj:
  # Extract all the contents of zip file in different directory
  zipObj.extractall()

Have a look at the data

In this section we are looking at an example of an audio waveform. Then we’ll transform the sound wave to a spectrogram and compare it with the spectrogram that was included with the downloaded dataset.

# Inspect an audio file from the dataset.

sample_path = 'Data/genres_original/jazz/jazz.00000.wav'

# if you want to listen to the audio, uncomment below.
display.Audio(sample_path)