GSOC 2024: Add Machine Learning to the BPM detection code

mara42 · March 17, 2024, 2:52am

Personal Information

Name: Maradana Jeevana

IRC nick: mara42

Email: jeevanamaradana@gmail.com

Github: Jeevana52

Time Zone: UTC +05:30

GSOC 2024: Add Machine Learning to the BPM detection code

Project Preview:

Listenbrainz currently consists of an existing BPM algorithm integrated with MusicBrainz. This BPM detection needs to be improved in its peak selection portion and upgrade the accuracy and reliability of the BPM values for music represented in MusicBrainz.With a suitable training dataset comprising a few thousand audio files with known BPM values. This dataset will be used to train the neural network model. The neural network will learn from the audio files and their corresponding BPM values to identify and select the music’s most significant beats or peaks.

Implementation of the project:

1.Data Collection:

To begin the project, it is necessary to collect a diverse set of audio files comprising a few thousand samples, each with known BPM values.
It is essential to ensure that the dataset covers various music genres and styles to provide a comprehensive understanding of different beats.
Convert these audio files into a frequency spectrogram representation by this we can observe changes in the frequency and patterns of the audio file.
Then after this spectrogram representation needs to be in a 2D matrix, with time on the x-axis and frequency bins on the y-axis. The pixel values in the matrix represent the intensity of the audio signal at each time and frequency.

2.Data Preprocessing:

Then create a binary vector representing the locations of beats or pulses in the audio signal.

The Binary vector provides a temporal reference for the beats in the audio, and it allows the neural network to learn and predict the occurrence and timing of beats.

3. Model Architecture Design:

To design neural network architecture I decided to choose a Convolutional Neural Network for audio and image processing tasks and a combination of convolutional layers, dense layers, and time-distributed layers to process the spectrogram input and predict the pulse vector output.
This Time-distributed convolutional layer is used to extract temporal features from each time step of the spectrogram.

4.Training the Model:

1. Training the data:

a. Randomly select a segment from the spectrogram as the input training example.
b. Determine the corresponding time interval for the selected segment in the audio.
c. Based on the time interval, locate the beats or pulses in the audio and create the corresponding pulse vector for the selected segment.
d. Repeat steps b-d to generate multiple training samples from the spectrogram.
The random selection of segments helps introduce variability and ensures that the network learns to detect beats at different positions within the audio

2. Training the neural network model

Splitting the Dataset:

The first step is to split the dataset into training, validation, and test sets. The purpose of this split is to evaluate the performance of the trained model on unseen data and prevent overfitting.

Fitting Network Parameters:

The training process involves optimizing the network’s parameters to minimize the difference between the predicted BPM values and the ground truth values. This optimization is achieved using gradient descent algorithms, stochastic gradient descent (SGD), Adam.

Monitoring Validation Error:

The validation set is used to evaluate the model’s performance during training and prevent overfitting. After each training epoch, the model’s predictions on the validation set are compared to the ground truth BPM values, and a mean absolute error is calculated.

Final Evaluation with the Test Set:

Once the training process is complete, you evaluate the model’s performance on the test set, which was not used during training or validation. This evaluation provides an unbiased measure of how well the model generalizes to unseen data.

5. Evaluation and Testing:

Once the neural network model for BPM detection has been trained, it is crucial to evaluate its performance and test its ability to predict BPM values accurately. This section covers the evaluation process and how to utilize the trained model for predicting BPM and beat locations in new audio samples.

Timeline

The below-mentioned timeline is how I will spend my time during GSOC 2024

Pre-Community Bonding Period (2 April - 1 May)

Understand the existing BPM detection algorithm in the the-BPM-detector project. Also, Discuss the project requirements and goals with my mentor, Mayhem.

Community Bonding Period (2May - 27May):

Collaborate with my mentor and Metabrainz team to gather the necessary audio dataset for training.

At this period I have my College exams so this period may be affected.

Week 1 (June 4th - June 10th):

Dataset Collection and Preparation: Collect a suitable training dataset of audio files with known BPM values.
Preprocess and clean the dataset to ensure its quality and compatibility with the machine learning model.

Week 2 (June 11th - June 17th):

Design the neural network architecture suitable for the BPM detection task.
Implement the necessary code to preprocess the audio data and extract relevant features.

Week 3-4 (June 18th - July 1st):

Model Design and Training- Train the neural network using the prepared dataset.
Optimize the model’s performance through fine-tuning and experimentation.

Week 5-6 (July 2nd - July 15th):

Model Evaluation and Testing: Evaluate the trained model’s accuracy and performance using the testing dataset.
Identify and address any issues or limitations of the model.

Week 7 (July16th - July 22th):

Integration and Deployment: Integrate the trained machine learning model into the existing MetaBrainz projects.
Develop the necessary code and APIs to enable the BPM detection functionality.

Week 8 (July 23rd - July 30th):

Integration and Deployment: Test the integration to ensure smooth operation and compatibility with the existing systems.
Deploy the model and associated code to a production environment.

Week 9 (July 31st - August 7th):

Documentation and Finalization: Document the entire process, including the model design, training methodology, and integration steps.
Create user guides and documentation to assist developers and users in utilizing the BPM detection functionality.

Weeks 10 (August 8th - August 15th):

Buffer Period for Refinements, Bug Fixes, and Extra Tasks: Allocate additional time for addressing any unforeseen challenges, refining the model, fixing bugs, and completing any extra tasks that may arise during the project.
Conduct thorough testing, optimize performance, and ensure the overall quality of the implemented the model.

Detailed Information About Yourself

I am Maradana Jeevana currently a native of Andhra Pradesh and I am a sophomore pursuing Bachelors in Computer Science from Lendi Institute of Engineering and Technology, Visakhapatnam, India. I have prior knowledge of coding since my Higher Secondary schooling. I was very much interested in Python then and later in my first year I explored a lot of new technologies and learned but later When I was learning ML and working in this field I enjoyed it a lot which made me find my niche.

When did you first start programming?

I started with Python as my first programming language.

What made you interested in applying to this project?

I was only focusing on the Machine Learning project because obviously, I love Machine learning and honestly, speaking Metabrainz has a collaborative environment and they answer the queries patiently which made me choose Metabrainz

What sorts of programming projects have you done on your own time?

My projects are paris house prizing and classification of flower, and also MyChat-bot then in also my web development field I had beach-a-sports and Currently developing my skills more.

How much time do you have available, and how would you plan to use it?

I will be providing 35 to 40 hours per week to the project.

lucifer · March 17, 2024, 9:16am

Hi!

Thanks for the proposal, @rob will probably give a more detailed review but I want to point out two things first.

This needs to happen now before you submit the final proposal.

You need to create a simple working prototype or make other relevant code contributions before submitting the final proposal otherwise it will not be considered.

mara42 · March 17, 2024, 10:15am

Okay I’ll make sure add code contribution. And make the changes which you have mentioned.

rob · March 18, 2024, 2:10pm

Hi!

Thanks for submitting your proposal! However, your proposal is lacking a lot of detail that we require in our proposals and other parts are actually not correct.

We currently do not have any BPM support in ListenBrainz. Please review your whole proposal for accuracy and make sure that you’ve answered all of the questions on the proposal template.

Also, you contacted me with a private message, which we requested that applying students don’t do that. Consider reading our blog post on how to avoid common mistakes:

The Top 8 Mistakes of GSoC Applicants – MetaBrainz Blog

This sentence demonstrates that you do not actually know how you’re going to do what you propose. Locating the beats in the audio is literally whole point of the project and you listed this as sub-task without specifying how you’re going to do this.

This project is a fairly advanced project and we’re not going to accept proposals that don’t make it abundantly clear what the proposer is talking about. Your proposal hasn’t given me a single indication that you’re qualified to take on this project.

In order for me to seriously consider this project, you’re going to have to convince me with loads and loads of details as to how you’re going to implement this feature.

mara42 · March 18, 2024, 5:21pm

Thanks!! For the detailed feedback, I’ll make sure to add every detail in brief and as it’s my first open source experience sorry for my mistakes make sure not to repeat them once again.