How should we test the methods of MusicBrainz database access?

little_rsh · March 5, 2018, 7:25am

I would like to work on the idea More detailed integration with MusicBrainz in AcousticBrainz for GSoC’18. As the first step of the project would be to perform a test using data at the scale of AB in order to see which method of MB database access would work best for us, I have already started working on first method of direct connection with MB database by setting up the development environment in AB and tried fetching the data for the entity: recording using mbdata.models. I have opened a PR for the same.

I am working on writing a proposal for this idea. I would like to know how exactly are we planning to test both methods of DB access (the direct connection with MB database and copying the relevant information from MB into the different schema in AB respectively)? Would we want to run two different versions of AB in production, each using a different method for db access? What criteria are we considering while testing these methods? This seems like something which needs input from the MetaBrainz team.

Regarding the import of MB into the AB database, right now what I am thinking is: copying the entire MB database in AB server might not be feasible. So, we could copy and save only the subset information which we require in a denormalized way (maybe JSON) and then while fetching the data we could first try getting from AB’s subset of MB and if it is not present then we can get from NGS and save in the subset. The community’s opinion on this approach would be appreciated.

Also we could add a command to update the database & run this command periodically using cron in order to remain updated with the actual MB schema. We could add timestamps to the files so that it will only fetch the data that is new or has been changed since last import.

little_rsh · March 5, 2018, 1:33pm

@alastairp Please take a look!