It’s true that “Matching metadata when you have a whole album of metadata” is a much easier task, but our current goal isn’t to collect album metadata. Instead we need a solution to the fact that many people are going to be sending data to us from ListenBrainz that only contains an artist credit and track name. Of course, some submissions may also contain album name, track number, Spotify ids, MusicBrainz ids. The more data that we get, the easier these submissions will be to map directly to a MusicBrainz id. In fact, accepting full acoustid fingerprints is an interesting idea! I wonder how many people have fingerprints but no MBIDs in their metdata?
Neither do we. I suspect we’re going to have a huge number of easy matches (where the artist credit-track name are unique in musicbrainz) and a huge number of hard ones (where recordings exist with many MBIDs or artist names are duplicated or recording names are duplicated or…). As a starting point, I suspect that we will map items in MessyBrainz to 0 or more MBIDs, not exactly 1.
We’ve just had a proposal for an SoC project, which might be a first start to understanding this data. There’s also a data dump available if you want to take a look at the data right now. As I said previously we’ve only been collecting data up to now in a way that we think is useful. Now we’re finally starting to look at it and see how achievable this task actually is.
Although the Messybrainz website mentions AB, we currently don’t use it for submissions sent to AcousticBrainz, and neither Messybrainz nor Acousticbrainz have the understanding of an “album” of music at submission time. This is definitely a future possibility, but for now adds much more complexity to the problem, perhaps not for much value.
Because we don’t want to prevent people from using ListenBrainz just because they don’t have audio with MBID tags, or they use software that doesn’t send them with the request.