Contact information
Name: Munish Kumar
IRC nick: munishk
Discord: thedarkbot.
Email: munishkumar19042002@gmail.com
Github: munish0838
HuggingFace: munish0838
Timezone: UTC +5:30
[GSOC 24] BPM Detection Service for Accurate Tempo Estimation
Proposed mentors: mayhem
Languages/skills: Python/Flask
Estimated Project Length: 350
Tasks for the Summer:
The project is divided into 4 phases:
- Phase 1: Plan and build the docker container with all the algorithms running over any files that it finds, and submits the values to the server. Setting up the server to store values from songs.
- Phase 2: Taking help from community members to run the docker container on their music collection, to get as much data as possible
- Phase 3: Analyze the data collected and determine the relative order of the algorithms, optimize the container to give consensus from 3 algorithms to submit values
- Phase 4: Open the improved container to a wider audience and fix any problems or issues reported by the users.
Project Overview:
The BPM value of a song is an important characteristic for music information as well as music recommendation. Algorithms for bpm prediction are very tricky since they follow different techniques and struggle with songs with varying tempos. This project aims to develop a robust and accurate BPM detection service by combining outputs from various state-of-the-art algorithms and providing the service as a docker container. As a stretch goal, the project aims to create API server endpoints and a database for the integration of the service into Listenbrainz.
My contributions till now:
-
Literature Review on latest bpm models: Link
-
Testing pre-existing models for bpm predictions: Colab
-
Attempt to Fine-tuning Audio transformers: Colab
-
My contributions to ListenBrainz: My Commits
Implementations:
BPM detection, also known as tempo estimation in the research literature, is a crucial feature for music recommendation. Estimating tempo accurately can enhance user experience and add to metadata information of the music. It can further be used for classifying music.
Currently, there is no implementation for bpm detection in ListenBrainz. An algorithm using peak selection was tested (link) but peak value is often misleading in cases where audio has varying tempo. Hence, to overcome this, I have explored several algorithms for global tempo estimation. The algorithms range from mathematical functions to pre-trained Recurrent Neural Networks and Directional Convolutional Neural Networks. Since the algorithms use varying techniques, the plan is to use all of them to agree on the bpm value for the songs. These predictions along with their confidence scores will be used to determine the reliability of the algorithms. Additionally, data from Spotify may also be stored to be used as another source of bpm values. The process will be optimized to register the value for the bpm once 3 algorithms (from a total of 10+) agree on the value.
The algorithms currently proposed to use are:
- Tempo CNN
- Tempo CNN library contains a total of 32 pre-trained deep learning models. It has models from TempoCNN, Convolutional Neural Networks with Directional filters (DeepTemp, ShallowTemp and DeepSquare models)
-
Deep Rhythm is a convolutional neural network implementation of the DeepRythm paper
- Rythm Beat Detection: Estimates BPM using onset detection and beat tracking.
- PercivalBpmEstimator: Estimates tempo directly from audio data
-
Aubio: Tempo estimation based on an autocorrelation method
-
Librosa: Estimates tempo based on Fourier transformation
To get the final result from predictions, a voting algorithm will be used. The multiples of bpm values are perceived as same by humans, hence this will also need to be considered when deciding on the final value of the tempo. Moreover, float values may vary eg (97.6, 96.9, 97.1, etc). Hence there is a need to minimize the error by evaluating different techniques by comparing them against true values from Spotify.
Workflow:
- Audio pre-processing: Since the algorithms are trained on different datasets, with varying audio duration and sampling rates, there is a need to process the audio files before inference for accurate results. When the command for batch inference is run, audio files will be loaded one at a time. For each algorithm, the file will be converted to the appropriate format for processing and the predicted value will be stored. For example, tempoCNN divides the audio clip into windows of 256 frames each and then averages the prediction values from each frame window. Librosa uses Fourier transform on the spectrogram of the audio file.
- Audio Inference: Different algorithms use different methods under the hood. Calculation of confidence score and tempo values to defined precision may be done as a classification task or a regression task. Using the confidence scores is also critical in evaluating the reliability of the algorithm.
- Analysis of test set: After calculating and storing predictions and confidence scores, evaluation of test sets will be done with the help of the community to find out the reliability of different algorithms and the suitable tempo values for audio files. This will help us to determine the order of execution of algorithms while finding the bpm value for a song.
- Docker containerization: The service will be served as a docker container for ease of accessibility and setup, to automate the processing of batch of files.
- Documentation: Detailed documentation of the project, codebase, and setup will be created, including
- Overview of the project
- Details about each underlying algorithm
- Details about analysis of algorithms and working
- Guidelines to use the docker container and script
- Future Scope
Timeline:
Pre-GSOC (April-May)
- Explore Docker concepts, installation, and building sample images
- Set up a Docker environment on the local system for development
- Continue contributing to codebases
Community bonding period (1 May - 26 May)
- Finalize output formats (JSON, CSV, etc.) for BPM and associated metadata
- Determine audio input structure (folders, nested folders, list of paths, etc.)
- Decide on supported audio formats (wav, mp3, etc.) and conversion scripts to .wav formats
- Plan and finalize docker container architecture and test out args for docker image build
- Understand server side of listenbrainz for getting to know the requirements of the system better
- Getting all the algorithms running and checking compatibilities with each other
- Exploring beat tracking algorithms and how they can help in bpm estimation
Week 1-2 (27 May - Jun 9)
- Writing Python script to read audio files from specified input structure and read files from albums in sequential order.
- Integrate audio processing libraries (librosa, essentia, etc.)
- Preprocessing the audio files for appropriate processing by the algorithms.
- Test script on sample audio files for correct loading
Week 3-4 (Jun 10 - Jun 23)
- Setup and testing of docker container for automated processing of songs and saving bpm values
Week 5-6 (Jun 24 - Jul 7)
- Running the docker container on as many songs as possible with the help of community members.
- Fixing any bugs or issues reported by the community
- Working on feedback received from community members
Week 7 (Jul 8 - Jul 14)
- Prepare for mid-term evaluation materials and demo
- Buffer week
Mid-Term Evaluation
Week 8-9 (Jul 15 - Jul 28)
- Analysing the data and determining the reliability and preference order of the algorithms
- Optimizing the Python script to use algorithms on preference order and save values to the server when 3 algorithms agree on the value.
- Modifying and refining the docker container for the service.
Week 10-11 (Jul 29 - Aug 11)
- Opening the docker container to a wider audience
- Fixing bugs and issues reported by users, and incorporating feedback
- Write comprehensive documentation
- Add instructions to run the Docker image
- Document all algorithms, final approach, and output formats
Week 12 (Aug 12 - Aug 18)
- Prepare final project presentation, demo, and deliverables
- Buffer week
About Me
I am Munish Kumar, a final year B.Tech Computer Science student at Punjab Engineering College, Chandigarh. I have worked as a Solution Delivery Analyst Intern at McKinsey & Co. from Jan-July 23, where I majorly worked on data science and engineering projects.
Community affinities
What type of music do you listen to? (please list a series of MBIDs as examples)
- I mostly listen to Indian music singers like Anuradha, Anup Jalota, and also Arjit Singh
What aspects of MusicBrainz/ListenBrainz/BookBrainz/Picard interest you the most?
- I find MusicBrainz and listenbrainz very interesting, especially since they can help collect and leverage data for a recommendation, track listening, and give valuable insights.
Have you ever used MusicBrainz Picard to tag your files or use any of our projects in the past?
- I have recently used ListenBrainz and MusicBrainz while contributing to issues and GSoC
Programming precedents
When did you first start programming?
- I started programming in my first year of college, with Python as my first language. Since then, I have majorly been working on Python for deep learning, but also explored C++ for data structures and JavaScript for some group projects.
Have you contributed to other open-source projects? If so, which projects, and can we see some of your code?
- I have created some open-source models and contributed to open-source datasets which can be found at: munish0838 (Munish Kumar)
What sorts of programming projects have you done on your own time?
-
In my first year, I created a discord bot in Python for rendering lecture timetables and notes
-
I created a speech emotion recognition model on the RAVDESS dataset in my minor project, this was my first project entering into the world of audio processing
-
I have created a chat with your document application using OpenAI and Langchain in Python
-
I have worked on a cross-chain research project in blockchain
Practical requirements
What computer(s) do you have available for working on your SoC project?
- I have Asus TUF A15 laptop with 16 GB RAM, Ryzen7 4800H processor and Windows OS
How much time do you have available per week, and how would you plan to use it?
- I have my endsem exams in the first week of May. I graduate in mid-May and am completely available for half of May and the complete month of June, so I can contribute 30-40+ hours per week during this period. I will be joining a full-time job in the first week of July, but, since the first two months are a training period, ie the complete duration of GSOC, I can contribute 25-35 hours per week