GSOC 2024 - BPM Detection Service for Accurate Tempo Estimation

munishk · March 26, 2024, 12:41pm

Contact information

Name: Munish Kumar

IRC nick: munishk

Discord: thedarkbot.

Email: munishkumar19042002@gmail.com

Github: munish0838

HuggingFace: munish0838

Timezone: UTC +5:30

[GSOC 24] BPM Detection Service for Accurate Tempo Estimation

Proposed mentors: mayhem

Languages/skills: Python/Flask

Estimated Project Length: 350

Tasks for the Summer:

The project is divided into 4 phases:

Phase 1: Plan and build the docker container with all the algorithms running over any files that it finds, and submits the values to the server. Setting up the server to store values from songs.
Phase 2: Taking help from community members to run the docker container on their music collection, to get as much data as possible
Phase 3: Analyze the data collected and determine the relative order of the algorithms, optimize the container to give consensus from 3 algorithms to submit values
Phase 4: Open the improved container to a wider audience and fix any problems or issues reported by the users.

Project Overview:

The BPM value of a song is an important characteristic for music information as well as music recommendation. Algorithms for bpm prediction are very tricky since they follow different techniques and struggle with songs with varying tempos. This project aims to develop a robust and accurate BPM detection service by combining outputs from various state-of-the-art algorithms and providing the service as a docker container. As a stretch goal, the project aims to create API server endpoints and a database for the integration of the service into Listenbrainz.

My contributions till now:

Literature Review on latest bpm models: Link
Testing pre-existing models for bpm predictions: Colab
Attempt to Fine-tuning Audio transformers: Colab
My contributions to ListenBrainz: My Commits

Implementations:

BPM detection, also known as tempo estimation in the research literature, is a crucial feature for music recommendation. Estimating tempo accurately can enhance user experience and add to metadata information of the music. It can further be used for classifying music.

Currently, there is no implementation for bpm detection in ListenBrainz. An algorithm using peak selection was tested (link) but peak value is often misleading in cases where audio has varying tempo. Hence, to overcome this, I have explored several algorithms for global tempo estimation. The algorithms range from mathematical functions to pre-trained Recurrent Neural Networks and Directional Convolutional Neural Networks. Since the algorithms use varying techniques, the plan is to use all of them to agree on the bpm value for the songs. These predictions along with their confidence scores will be used to determine the reliability of the algorithms. Additionally, data from Spotify may also be stored to be used as another source of bpm values. The process will be optimized to register the value for the bpm once 3 algorithms (from a total of 10+) agree on the value.

The algorithms currently proposed to use are:

Tempo CNN
- Tempo CNN library contains a total of 32 pre-trained deep learning models. It has models from TempoCNN, Convolutional Neural Networks with Directional filters (DeepTemp, ShallowTemp and DeepSquare models)

Deep Rhythm is a convolutional neural network implementation of the DeepRythm paper
Essentia:

Rythm Beat Detection: Estimates BPM using onset detection and beat tracking.
PercivalBpmEstimator: Estimates tempo directly from audio data

Aubio: Tempo estimation based on an autocorrelation method
Librosa: Estimates tempo based on Fourier transformation
Madmom and Marsyas

To get the final result from predictions, a voting algorithm will be used. The multiples of bpm values are perceived as same by humans, hence this will also need to be considered when deciding on the final value of the tempo. Moreover, float values may vary eg (97.6, 96.9, 97.1, etc). Hence there is a need to minimize the error by evaluating different techniques by comparing them against true values from Spotify.

Workflow:

Audio pre-processing: Since the algorithms are trained on different datasets, with varying audio duration and sampling rates, there is a need to process the audio files before inference for accurate results. When the command for batch inference is run, audio files will be loaded one at a time. For each algorithm, the file will be converted to the appropriate format for processing and the predicted value will be stored. For example, tempoCNN divides the audio clip into windows of 256 frames each and then averages the prediction values from each frame window. Librosa uses Fourier transform on the spectrogram of the audio file.
Audio Inference: Different algorithms use different methods under the hood. Calculation of confidence score and tempo values to defined precision may be done as a classification task or a regression task. Using the confidence scores is also critical in evaluating the reliability of the algorithm.
Analysis of test set: After calculating and storing predictions and confidence scores, evaluation of test sets will be done with the help of the community to find out the reliability of different algorithms and the suitable tempo values for audio files. This will help us to determine the order of execution of algorithms while finding the bpm value for a song.
Docker containerization: The service will be served as a docker container for ease of accessibility and setup, to automate the processing of batch of files.
Documentation: Detailed documentation of the project, codebase, and setup will be created, including
- Overview of the project
- Details about each underlying algorithm
- Details about analysis of algorithms and working
- Guidelines to use the docker container and script
- Future Scope

Timeline:

Pre-GSOC (April-May)

Explore Docker concepts, installation, and building sample images
Set up a Docker environment on the local system for development
Continue contributing to codebases

Community bonding period (1 May - 26 May)

Finalize output formats (JSON, CSV, etc.) for BPM and associated metadata
Determine audio input structure (folders, nested folders, list of paths, etc.)
Decide on supported audio formats (wav, mp3, etc.) and conversion scripts to .wav formats
Plan and finalize docker container architecture and test out args for docker image build
Understand server side of listenbrainz for getting to know the requirements of the system better
Getting all the algorithms running and checking compatibilities with each other
Exploring beat tracking algorithms and how they can help in bpm estimation

Week 1-2 (27 May - Jun 9)

Writing Python script to read audio files from specified input structure and read files from albums in sequential order.
Integrate audio processing libraries (librosa, essentia, etc.)
Preprocessing the audio files for appropriate processing by the algorithms.
Test script on sample audio files for correct loading

Week 3-4 (Jun 10 - Jun 23)

Setup and testing of docker container for automated processing of songs and saving bpm values

Week 5-6 (Jun 24 - Jul 7)

Running the docker container on as many songs as possible with the help of community members.
Fixing any bugs or issues reported by the community
Working on feedback received from community members

Week 7 (Jul 8 - Jul 14)

Prepare for mid-term evaluation materials and demo
Buffer week

Mid-Term Evaluation

Week 8-9 (Jul 15 - Jul 28)

Analysing the data and determining the reliability and preference order of the algorithms
Optimizing the Python script to use algorithms on preference order and save values to the server when 3 algorithms agree on the value.
Modifying and refining the docker container for the service.

Week 10-11 (Jul 29 - Aug 11)

Opening the docker container to a wider audience
Fixing bugs and issues reported by users, and incorporating feedback
Write comprehensive documentation
Add instructions to run the Docker image
Document all algorithms, final approach, and output formats

Week 12 (Aug 12 - Aug 18)

Prepare final project presentation, demo, and deliverables
Buffer week

About Me

I am Munish Kumar, a final year B.Tech Computer Science student at Punjab Engineering College, Chandigarh. I have worked as a Solution Delivery Analyst Intern at McKinsey & Co. from Jan-July 23, where I majorly worked on data science and engineering projects.

Community affinities

What type of music do you listen to? (please list a series of MBIDs as examples)

I mostly listen to Indian music singers like Anuradha, Anup Jalota, and also Arjit Singh

What aspects of MusicBrainz/ListenBrainz/BookBrainz/Picard interest you the most?

I find MusicBrainz and listenbrainz very interesting, especially since they can help collect and leverage data for a recommendation, track listening, and give valuable insights.

Have you ever used MusicBrainz Picard to tag your files or use any of our projects in the past?

I have recently used ListenBrainz and MusicBrainz while contributing to issues and GSoC

Programming precedents

When did you first start programming?

I started programming in my first year of college, with Python as my first language. Since then, I have majorly been working on Python for deep learning, but also explored C++ for data structures and JavaScript for some group projects.

Have you contributed to other open-source projects? If so, which projects, and can we see some of your code?

I have created some open-source models and contributed to open-source datasets which can be found at: munish0838 (Munish Kumar)

What sorts of programming projects have you done on your own time?

In my first year, I created a discord bot in Python for rendering lecture timetables and notes
I created a speech emotion recognition model on the RAVDESS dataset in my minor project, this was my first project entering into the world of audio processing
I have created a chat with your document application using OpenAI and Langchain in Python
I have worked on a cross-chain research project in blockchain

Practical requirements

What computer(s) do you have available for working on your SoC project?

I have Asus TUF A15 laptop with 16 GB RAM, Ryzen7 4800H processor and Windows OS

How much time do you have available per week, and how would you plan to use it?

I have my endsem exams in the first week of May. I graduate in mid-May and am completely available for half of May and the complete month of June, so I can contribute 30-40+ hours per week during this period. I will be joining a full-time job in the first week of July, but, since the first two months are a training period, ie the complete duration of GSOC, I can contribute 25-35 hours per week

rob · March 27, 2024, 3:59pm

A few thoughts:

Let’s drop the stretch goal of doing the server side things – that doesn’t actually work with the schedule very well. Lets think of another stretch goal.
We should think of the project in a few phases:

Phase 1: Plan and build the docker container with all the algs running over any files that it fines. It does not further analysis, it just submits the calculated data to the server.
Phase 2: Let a few members of the community run the container on their whole music collection. We’ll collect what we can, but probably won’t reach 1M – lets remove that figure from the proposal.
Phase 3: Analyze the data we’ve collected and then adjust the order in which the algs are used and to shortcut the process if 3 algs give consensus as well as asking the server if the BMP for a given file has already been determined. If so, skip that file.
Phase 4: Open the improved container to a wider audience and fix any problems that people report to us.

We should change how we look at the local music collection. We should process the tracks that belong to an album in sequential order, so that we can take advantage of the improved context that gives us (using the assumption that tracks from an album are likely to be of similar BPM ranges), which can influence our final selection of BPM.

Also, the Spotify BPM data is not a true value, just another unreliable datapoint – I’ve found errors in their data too. We can fetch that value on the server when the server records the data for a given track.

OK, I’ll leave it here for the time being. The open questions about your new job will significantly impact the schedule, so work out what we’re doing there first.

munishk · March 28, 2024, 4:59am

Thank you @rob, I have updated the proposal for new schedule and phased progress, based on feedback and discussion on IRC chat. I hope I was able to make the required changes