Thanks for the question.
We have two pieces of software used in AcousticBrainz - Essentia is used to compute the low level information (your point 1.), and Gaia is used for the machine learning component.
The tools are used in each component like this:
1. Essentia - create low-level data
2. Gaia - train models using a dataset of low-level data and the SVM algorithm
3. Essentia [essentia_streaming_extractor_music_svm program] + gaia [SVM library] - use the computed models from 2. and the low-level data from 1. to create high-level data
4. Confusion matrices don't use any special libraries - we just have some python code in the gaia package which uses the ground-truth from a dataset, and the freshly computed high-level data and compares them.
My expectation is that the scope of this project would cover 2-4. The idea is to no longer have to install gaia. Instead we will have another package (perhaps part of acousticbrainz-server, but probably an external dependency) written in python which does these steps.
We already made a start at this here: https://github.com/MTG/acousticbrainz-sklearn
Feel free to take a look at the code and ask if you have any questions.
We have two other people actively working on this task as well. This means that during the proposal period we will have to work closely to work out which tasks potential students can do, and which tasks will be done by others. Please feel free to read the tickets on the repository which I posted above, and start asking questions about specific implementation details. From there we can work out what a good split of tasks is.