Is there any kind of project to improve holes in MusicBrainz coverage

It is. If your language is not already there, go ahead and propose it (I think you can do that from the Transifex site directly), and weā€™ll get to approving it ASAP. :slight_smile:

4 Likes

My main concern in regards to automatic importing is that there might be ā€˜hijackedā€™ artists somewhere in the MusicBrainz database. The classic example of a hijacked artist I already encountered: Editor 1 creates a group in MusicBrainz, adds 3 albums, attaches no links. Editor 2 adds a single and 4 links (Discogs, YouTube, Twitter, Facebook). However, the single and the links donā€™t belong to this group, but belong to another group (or even a solo artist) with the same name. If youā€™d automatically used importing based on the attached Discogs link, even more releases would then be added to the incorrect artist.

4 Likes

Iā€™d also be nervous of automatic importing of anything. If a bot is set to do the import a human needs to double check. I would be especially worried when there is an overlap and a bot is trying to add releases to an artist already here.

The way this place works now with lots of fussy OCD geeks patrolling the quality of the data means it is hard for bad data to settle in for long. If a bot import from discogs is allowed to kick off then the volume of data is going to be hard to check.

Currently Musicbrainz data is assumed to have come from someone who actually had a copy of the release in hand. Or cares enough to make sure the data is correct. An auto-import looses that.

Personally I think discogs is a great resource. I use it in parallel alongside Musicbrainz. Though with a different tilt on how their check their data it is not logical to assume an direct one to one match.

I always thought that Quality over Quantity was the priority here. :slight_smile:

3 Likes

I totally agree, automatic importing is out of question, but we could use those reports and a community effort to reduce the number of missing links.
Discogs links are very helpy to editors imho, plus, when they exist, the Discogs Importer userscript takes advantage of them, prefilling with (usually) correct artists.

5 Likes

I am not a big fan of automation. But, in fairness, programs are only as good as the program writer. So, some programs are going to be better than others.

Automation just means that their errors become our errors.

1 Like

Totally agree (Iā€™m also an ex-programmer). Programs are only as good as the data input and the assumptions being made by the programmer.

Automation needs checking. What would scare me even more is imagine someone at Discogs got the same idea and started importing from MB at the same timeā€¦ the exaggeration of errors would get insane.

Humans are always better at spotting data errors than bots.

If data is missing from MB then it should be 100% checked as it is entered - bot or human entry. Getting a bot to do the import is fine - but someone needs to check the data fits.

ā€œFilling Holesā€ is a good plan - but it needs to be done carefully.

Is ā€œGarbage in, Garbage outā€ still a phrase used?

1 Like

Just my 2 cents:
Do you assume that a human check is 100% error free?
Do you assume that the used sources are 100% error free?

And about your ā€œthat Quality over Quantity was the priority hereā€.
Maybe you should think about it from another view:
What do you think is the next step for a user, if he/she doesnā€™t find his artist or album on MB?
Does he add it or does she search a source where it is available?

I really hope all the GSoC additions and changes help to feed MB with more data.

If there would be something like ā€œdata upload for quality checkā€ I could provide many hundred or thousands of albums. Actually itā€™s just an excessive amount of manual work (including the use of Development/Seeding/Release Editor - MusicBrainz Wiki)

If it was, then automation wouldnā€™t be a problem simply because the human information would already be correct, leading to no imported errors by automation.

I dealt with a hoaxer on WP. The information he hoaxed was then copied by other sites automatically. The information was copied by people into forum posts. News articles would write real articles and insert lines from Wikipedia into their articles, which furthered the hoax.
Then those sites would get sited back into Wikipedia articles to make the information look even more legitimate.

It is the reason why WP has a list of reliable and unreliable sources. It avoids the circular information.

1 Like

This report only includes artists whereby
1> There is only one artist with this name in MusicBrainz
2> There is only one artist with this name in Discogs
3> Both the artist in MuicBrainz and the artist in Disocgs have a release (whihc isnt called Greatest Hits/Best Of) with the same name

So I dont really see how this scenario could occur.

This is about linking artists, not adding releases.

2 Likes

Two thing I would like to point out

1> We are only talking about linking artists here, not adding artists or adding releases.
2> There are various bots that do import, and all you can do is check the code and think its correct. But in this case Iā€™m presenting you with the list so you can manually check it before the data is added, so you donā€™t have to make any assumptions about the program you can check the results, in theory you could manually check all 60,000 but thats alot of work. Surely if you checked a small sample (say 300) and found them all to be valid wouldnā€™t that be enough verification to assume that 99.9% (or maybe 100%) are valid.

1 Like

Freso, I remember @Rob asking, during the youtubed meeting, to be informed if there were plans to ?was it scrape or import? data from another db automatically.

Iā€™m not clear about what is being proposed on this thread or exactly what @Robā€™s request was - could you check if they overlap?

I have updated reports filtering out most of these sort of artists from the artists not in MusicBrainz report and updated with latest Discogs data, hence the reason the number of artists has increased.

Filtering is done by looking for name conjunctives such as and,y, et so some such as Don Kosaken Chor Serge Jaroff have not been filltered out but since this is only a manual list to suggest artists probably not in MusicBrainz I think that is okay.

Not always true, but in the case of MB metadata, yes, since we are far to have an advanced AI to do the work :wink: (see The Stanford Question Answering Dataset for AI vs humansā€¦)

On topic, we need both quantity & quality, and Discogs clearly beats MB on quantity, while quality is far from bad. Discogs is considered as a reliable source by all editors, though, of course, it has errors (MB tooā€¦).

Ijabz identified a bunch of artists for which Discogs links are missing, i see nothing wrong in trying to improve the situation, as it would help editors & users.

3 Likes

Interesting discussion. I personally donā€™t think thereā€™s much point in simply trying to edit or import more. Discogs has an advantage over MB because thereā€™s a very practical application of the page - buying and selling items. Because of this thereā€™s a comprehensive database, and because of that database, the collections feature, and ā€˜what is my collection worthā€™ etc, is also very good.

I came to MB because I heard editing on here was how to improve Last.fm data, which I was using at the time. Iā€™m not sure where or how other users came here, but I imagine something similar? Perhaps via Picard? Google search results (where Google is using MB data)?

Anyway, I think the solution is to have stronger integration. Maybe plugins for other websites or apps that can easily be configured to display discographies or collections, which will then motivate people to fill in the gaps when they see that theirs is incorrect/not complete.

I feel like MB is getting close though, itā€™s got a pretty broad range of uses, it just has never tipped over the edge - the UI redesign is probably the first step there, a mentoring/tutorial system might be the next. I feel like the true killer app is going to be events - I donā€™t know of another site that is so well equipped for these, Discogs certainly isnā€™t. I really want to see event images added, and then integration of that data/peoples event collections into a big site or application, and then that will have a big knock-on effect.

In any case I think trying to brute force the gaps in the database isnā€™t a sustainable solution. Following new users and being encouraging might do more in the end than a hours of editing. Thatā€™s my thinking these daysā€¦

8 Likes

I just wish we could get some libraries to use our music data and contribute back :frowning:

Iā€™m not fully convinced because setlist.fm seems to have a very strong userbase who probably wonā€™t want to migrateā€¦ despite setlist.fm using MusicBrainz artist info to begin with. I donā€™t think just images would be enough to entice people, but maybe proper work integration would (ā€œsongs most often played during the current Nine Inch Nails tourā€ kind of thing - or ā€œholy shit, this is the first time they play this song live since 1989!ā€)

2 Likes

Certainly spending time doing more editing is not practible, but I assume you are not against more data being in MB. That is why I suggest an auto import of this correct data, it improves the database with very limited actual effort. If anyone can actually show the data in report 2 would incorrectly link artists then I would better understand the negativity around the idea.

2 Likes

A statistician could tell us how many weā€™d need to check for a given confidence IF (and it is a big if) any errors are randomly distributed.

Another approach would be to gauge community interest and, if interest is there, check these artists as a One Month Project.

Updated the report again we have 64,000 artists with possible link to a Discogs artist where it seems highly unlikely any are wrong http://reports.albunack.net/mbartist_discogsartist_report2.html

and another 360,00 artists with link to a Discogs artist where it seems highly unlikely that more than a handful are wrong http://reports.albunack.net/mbartist_discogsartist_report3.html

This is good data, and I would like to get it into MusicBrainz but Im not convinced if I create a bot it will be accepted which is frustrating because there is no other practical way to get the data into MusicBrainz ?

6 Likes

2 posts were split to a new topic: Improvements useful for potential event app