GSoC 2022: Proposal for 'Cover Art Processing' feature in Picard GUI

kLambda · April 27, 2022, 10:09am

Cover Art Processing

Personal Details

Name: Krishna Kumar
IRC: kLambda
University: National Institute of Technology, Rourkela
Email: krishnakumar76068@gmai.com
Phone: (+91) 9337956022
Country of Residence: India
Timezone: IST (GMT + 0530)
Primary Language: English

I am a final year B.Tech undergraduate pursuing Mechanical Engineering at the National Institute of Technology, Rourkela. My semester will complete in late April, leaving me enough time to prepare for my GSoC project. If I’m selected, I shall be able to work around 40 hrs a week on the project, though I am open to putting in more effort as per the work requirement.

Why this project?

This feature of adding image processing techniques to cover art unlocks an entirely new user experience and eliminates pre-processing of images as per file type or size constraints. I look at this project from the perspective of adding a new feature to an end-user application. So basically, we aim to provide it to users/developers in as flexible a form as possible.

Technical Knowledge

I am a final-year B.Tech undergraduate from NIT Rourkela. My major is Mechanical Engineering, and my minor is in Electronics and Communication Engineering. My specialization is in image processing and web and GUI-based deployment.

The courses that I’ve done include

Fundamentals of Signals and Systems.
Digital System Deisgn
Fundamentals of Communication Systems
Computer Vision
Algorithms for Image Processing

I’ve also done image processing-based courses on Coursera.

Some of my previous projects in image processing and computer vision include

Denoising MRI images.
Estimate the number of tiles from a pack of tiles using edge detection techniques.
Hand gesture recognition-based robotic arm.

Some of my PyQt5 based project includes:-

A graphic user interface to upload images and run a Neural network model on the uploaded image and derive certain features.
A GUI to run specific OS-based file processing techniques.

I have previously worked with programming languages, including Python, Dart, Matlab, CUDA, Assembly Language Programming, etc.

Project

Project Abstract

MusicBrainz Picard is a free and open-source cross-platform graphic user interface developed by MetaBrainz Foundation, a non-profit company for operations on digital audio recording as:

Identifying
Tagging
Organizing

The proposal idea aims to add manual and automatic image processing options for cover art. The proposal aims at adding the following features to the current GUI to enable users to better interactive image processing experiences for cover art:

Automatic image resizes when the image exceeds a specific maximum size.
Automatic format conversion to compatible image format.
Adding certain overlay filters on the cover art.
The inbuilt canvas drawing feature enables the digital artist to design cover art using Picard GUI.
Adding grid features allow the user to crop/rotate covert art.
Custom resizing and format selection dropdown.
Adding image post-processing plugins to add further image manipulations.
Adding a few effects like sharpening, sepia, blurring, emboss.
Adding a few transformations on the cover art.

Thus, an additional feature of editing cover art would be added to the current Picard GUI to enable users to manipulate cover art interactively.

Technical Details

Below I have explained the significant steps of the algorithm.

We start by considering a single image uploaded by the user.
We shall be using the OpenCV package for different image alterations.
An initial check would be the image size and image format.

img = cv2.imread(<path of the image>);
dimensions = img.shape

If image size exceeds a certain threshold, automatic resizing would be carried out using the resize() function OpenCV.
Cropping of the image shall be carried out using img[x1:x2, y1:y2] functionality, and the constraints shall be fetched from the user drag input on canvas in GUI.
Change in HSV can be carried out by the rgb2hsv() function of OpenCV, in which the input shall be fetched from the track bar interacted by users.
Saving the image in a different format can be carried out by save(<filename>.<desired_iamge_extension>) function of OpenCV.
Morphological Image Processing can help smooth the image using opening and closing operations.

Dilation

Erosion

Rotate an image by using getRotationMatrix2D(<center>, <angle>, <scale>) and warpAffine(<image_variable>, <rotation_matrix_variable>, (<width>, <height>)) function.
Also adding contrast to the image using addWeighted(<source_image>, <alpha1>, <source_image2>, <alpha2>, <beta>).
Gaussian Blur to make the image blurry using GaussianBlur() method of OpenCV.
To detect the edges in an image, we can use the Canny() method of the cv2, which implements the Canny edge detection, also called optimal detector having the syntax Canny(<image>, <minVal>, <maxVal>)
To convert a color image into a grayscale image, we can use the BGR2GRAY attribute of the cv2 module.
Image masking is the technique of apple some other image as a mask on the original image or changing the pixel values in the picture. To facilitate the functionality, we can use the HoughCircle() method of the OpenCV module.
To reduce noise from an image, we have different methods provided by the OpenCV module like
fastNIMeansDenoising() : To remove noise from grayscale image.
fastNIMeansDenoisingColored() : To remove noise from colored image.
To remove the background from an image, we shall first find the contours of the detected edges of the main object, create a mask with the zeros method of NumPy for the background, and then combine the mask and the image using bitwise and operator.
All the methods are independent of one another. Depending upon the user’s sequence of applying various image manipulations shall be stacked one after another, and a preview panel shall demonstrate the final covert art after the image processing.

Why OpenCv over PIL ?

OpenCv

Open Source computer vision
Supports python, C++, Java
Processes images and videos for feature extraction.
Reads the images in BGR format by default.

PIL(Python Imaging Library)

Image processing package exclusively for Python.
A project name Pillow is forked to the original PIL library for its use in Python3.x and above.
Reads the images in RGB format by default.

Processing Time comparison between OpenCV and PIL on 8800 images (Source)

Because of cross language compatibility, and faster processing [1.4 times faster as per blog] and well documented methods than PIL, OpenCV is decided for the current proposal.

Timeline

Before May 20:

To familiarize myself thoroughly with MetaBrainz Picard’s functionality and architecture.
Study of the source files Picard available at metabrainz/picard GitHub repository.

May 20 - June 12 (Before the official coding time):

To do self-coding with Picard source files to improve my further understanding and ease of using classes and objects in PyQt5.
I will remain in constant touch with my mentor and metabrainz community during this period. I will stay active on IRC and Mailing lists to discuss and finalize the modifications(if any) that need to be on the existing structure and design.
Thus, with the help of my mentor, I will become clear about my future goals and the final UI that needs to be implemented to enhance a better user experience.
Creating a design wireframe of the required additional user interfaces using Balsamic IQ and cross-validating with mentors.

June 13 - September 12 (Official coding period starts)

Start creating the UI as per the finalized wireframes in the previous phase.
Define all the required classes that need to be added to current source files.
Creating different functions under the different groups of features in the form of classes involving other image processing techniques using the OpenCV package.
Finally, linking all the functions created in the above phase with the static GUI enables the proper functionality of image processing techniques.
Consequently, the testing of the proper working of the entire fundamental code changes is carried out.
Making further changes in the code to improve the Functionality, Exception handling, Bug Removal.
To be in constant touch with MetaBrainz developers to let them know about the progress.
Most of the time will be consumed for rigorous testing and bug fixes.

The last seven days would be invested in creating proper documentation enriched with every significant change in code structure and functionalities added.

A buffer of two weeks has been kept for any unpredictable delay.

Personal Inspiration for the Project

I am excited to work on the idea of adding various image processing features to the current Picard GUI. One of the projects I worked on was to count the number of tiles by detecting the number of edges of the tiles. I initially applied a blur to remove noises in the image. After that, I used adaptive thresholding (to eliminate the change in lighting condition effect on thresholding) and canny edge detection to determine the number of edges and finally give the user output of the number of tiles in the pack. The whole idea was to create a cross-validator for tile retailers buying boxes of tiles to cross-check the number of tiles present instantly. And the entire algorithm was deployed in a mobile application by creating an android/IOS app using the flutter framework.

The current project is to add a few image processing techniques for the better assistance of the users. Creating a better user experience by applying techniques I have expertise in is a huge motivation.

Other Information

Tell us about the computer(s) you have available for working on your SoC project!

Asus Rog Strix G531GT
Processor: Intel(R) Core™ i7-9750H CPU @ 2.60GHz
RAM: 16GB

When did you first start programming?

It has been 4 years since I started programming.

What type of music do you listen to?

I do have a much variable taste including mainstream Bollywood and some semi-classical genres like gazal, qawali and Sufi.

What aspects of the project you’re applying for (e.g., MusicBrainz, AcousticBrainz, etc.) interest you the most?

The aspect of the project which interests me the most is adding new features to a product enabling user to have additional better user experience. And more importantly the community support which would allow to brainstorm a problem and come up with refined ideas.

Have you ever used MusicBrainz to tag your files?

As of now, I’ve not ever used MusicBrainz to tag files but will be doing it in near future for sure.

If you have not contributed to open source projects, do you have other code we can look at?

Yeah, I’ve worked on different projects like tracking of uni-cellular organism in a medium, RFID GUI, physiotherapy aiding android application and many other projects available on my GitHub.

What sorts of programming projects have you done on your own time?

I’ve worked on various projects involving deployment of different DL/ML models into web application using different techstack like MERN, FReMP, DJango, android applications using Flutter and desktop graphic user interface using PyQt5 and Tkinter. Also I’ve worked on IoT projects like Robotic Arm cloning external gestures, etc.

How much time do you have available, and how would you plan to use it?

I plan to dedicate 40+ hours per week and complete the idea of upgrading Picard GUI with an cover art image processing feature within 10 weeks keeping 2-3 weeks as buffer for further testing of the integrataion.