Digital releases

One project that I would like is music index that sits next to musicbrainz (call it botbrainz, indexbrainz, spiderbrainz?).
Music stores have websites and some of them have api’s.
Can we write scrapers that go out and scrape theses sites for information.
This can also scrape things like spotify, youtube, soundcloud etc.
This will build an index of music and where it is available.

Let the computers keep track of this information and keep this information up to date instead of having to maintain this ourselves.
If there are different editions it could let us know that some stores have these songs and other stores have a different list and track who has what.
The api would be useful for musicbrainz to allow you to go to a digital release and find what stores have what edition of the release with the same amount of tracks.
It would also be useful for listenbrainz to allow you to build playlists for spotify, youtube or another streaming service.

2 Likes

Very interesting idea… but does that involve more work than a modification to MB?

Musicbrainz is not friendly to bots and this is a policy that I generally agree with.
There are a few bots that do some minor fixes but generally the focus is on human editors.
Bots can start adding garbage so it is something that you need to be careful about.
If we can make something that makes life easier for human editors and allow them to add missing information without too much work it will help things.

If you have indexbrainz and design the web services right it should be useful for musicbrainz, listenbrainz and other uses.
Suggested web services:
Store lookup:

  • Give it a musicbrainz release id, return a list of stores and url’s to go directly to that release.

Missing lookup:

  • Give it a musicbrainz release id or artist id, return a status code if there is a release with a different set of recordings or a missing album.
  • Seed the editor with the list of tracks.
  • Use A multi‐source seeder for digital releases to seed the release editor.
  • For things such as soundcloud / youtube suggest adding a release as a single or just add stand alone recordings.

Song lookup:

  • For things like listenbrainz have a central index of songs instead of trying one service after another.

Initially should be able to call this api from a greasemonkey(or equivalent) user script without needing to modify musicbrainz.
Once things have been tested and the web service is stable you can add this to musicbrainz like they did with adding acousticbrainz.
In case you missed it you can go to recordings pages and it will look up acousticbrainz.org for an entry with that recording id and display the key and BPM on the page.

2 Likes

I have yet to see an argument for tracking store availability that gives a user more information than what is currently possible with the “can be stream/purchased for download” links.

With a CD or other physical media, we can track all these minor details, because they are all verifiable in the end. If you have the CD in hand, you can just check whether it says “Booklet printed in X” vs. “Booklet printed in Y”. With digital releases, the information can change under our feet (since vendors have total control over their databases). The popular streaming platform import tool demonstrates this: It adds a huge list of country codes as release events if if its unavailable in some of them, but if you run the same import a few days later it’s not unusual to find that the list of available countries changed again, e.g. including the one that was indicated as unavailable before.

1 Like

Interesting points, thanks. I did just jump in to this topic so apologies if I missed some concepts explained in previous parts of the discussion.

Personally, after adding hundreds of digital releases, I’ve come up with a few rough guidelines to determine if a separate release is needed:

  1. Differing number of tracks, track lengths or track order (this in particularly is surprisingly common).
  2. Different release label (again reasonably common).
  3. Different catalogue numbers (however this is often quite ambiguous due to “standardisation” of numbers on platforms like Beatport and Juno Download).
  4. Different barcodes.

I always start from the position of “a new release is not required” so I am looking for notable differences rather than similarities. Some differences I personally think are generally not significant enough to warrant creating a separate release are:

  1. Release dates (within reason), e.g. artists often release on Bandcamp first and then streaming sites a few weeks later.
  2. File formats and bitrates. Most stores offer downloads in several formats, and streaming sites such as Deezer offer “HQ” lossless subscriptions. I’ve seen master releases on Discogs containing 6 releases which are identical except for the file format.
  3. Minor differences in track titles (e.g. “Original Mix”) or featuring artist credits.

Do you have a link to an edit note or forum post about this? I’d be interested to read it.

Isn’t the purpose of a barcode is to uniquely identify a given “item” that is offered for sale? I appreciate that most people neither know nor care what the barcode of a given release is, but it surely still serves a useful purpose for a music database as a unique identifier?

I agree that the acoustic fingerprint is important and it’s entirely reasonable to have multiple releases in the database with the exact same AcoustIDs. However I would also say that differing AcoustIDs is a strong indicator that a separate release is required.

I’d be interested to hear your view on why the current URL relationships aren’t sufficient to provide the “second layer” to enable someone to identify the release by the the store or distributor (if known)? I do appreciate that this information isn’t available in Picard, so it would require the user to look at the release pages on MB.

1 Like

I agree and this is why I have issues with adding 180+ release events. It makes sense in a perfect system, but in the real world it doesn’t really capture any useful information.

I think many of the difficulties we have around adding digital releases are due to people trying to treat digital releases like physical releases, when they are and always will be fundamentally different. The complete control platforms have over their databases means that there is no canonical source of information for a digital release, so as submitters we are forced to create the most accurate release we can based on information from multiple sources. This naturally make many people (particularly those who are likely to contribute to MB) uncomfortable, because we expect the process to be almost completely objective.

A good example of the way in which the schemas of different platform’s databases cause problems is the “standardisation” of catalog numbers:

Would anyone seriously argue that separate releases should be created for Juno Download and Beatport because they strip the hyphen from the catalogue number?

1 Like

I very much agree with the catalog numbers example, and I would go one step further and say to ignore the catalog numbers from Beatport (I don’t know about Junodownload), because as far as we can tell, they are not assigned by the artist or even the label, but the store based on an algorithm.

1 Like

If they have the same artwork, barcode, labels, tracklisting, recordings, than it’s the same release. Storefront alone, I believe, is irrelevant. It seems the same to me as physical media. So, I agree.

2 Likes

You can also search YouTube and 7digital by barcode to see if they are the same release as iTunes, Spotify & Deezer. Also, HD Tracks now has a great API that gives barcodes as well at: https://hdtracks.azurewebsites.net/api/v1/album/[HDTracks album ID string]. Also, many Bandcamp release actually do have barcodes as well, many don’t.

2 Likes

There are actually cat # on Beatport that look like legitimate cat #. But, yes, many are just the barcode thrown in the field.

1 Like

It doesn’t seem to be possible to assign different barcodes to physical and digital releases on Bandcamp, so most barcodes apply to the physical release. As a result I would only trust the barcode on BC if the release is digital only.

1 Like

Doesn’t matter. If a digital release on Bandcamp has a barcode, than it’s the barcode for the digital release. I’ve edited many Bandcamp releases that share barcodes with other digital outlets. Also, many iTunes, etc. releases also share barcodes with physical media. It’s still the barcode for that release.

2 Likes

So all the similar releases from below release group could be merged together?

Regarding the quality will the new field be available at Media drop down box (ex: Lossy/Lossless/HD) or only at the store link with format. First option would be nice to better organize its collectio.

Thanks

I personally would never want those merged… It looks (at a glance) like someone’s gone to the effort to make a really comprehensive release group. Amazing.

(Something like a seperator or filter for digital vs physical could help for those who don’t like it)

3 Likes

For me it would like trying to keep records of all the shops that had a specific CD available including their internal references, catalogues and other marketing tools which is not the purpose of Musicbrainz :slight_smile:
Also it could be quiet complex as “Release mode” was not created to handle the nature of those data which are not fixed in time (It could be available in a shop and/or country then removed then back again).
And there are the other issues already discussed such as labels (not the same meaning as for physical), impossibility to verify the data from public sources,…

Nevertheless seems some people are interested to keep those infos so we could but “Relationship” seems more appropriate to handle the evolution in time and/or the differences between shops

Taking the old Release group example I would see:

Releases:
- 1 for the 11 tracks CD
- 1 for the 11 tracks Digital Media showing Lossy/Lossless (5 to merge together)
- 1 for the 15 tracks CD
- 1 for the 15 tracks Digital Media Lossy/Lossless/HD xxbits/xxxKhz (6 to merge together)
- 1 for the 20 tracks CD
- 1 for the 20 tracks Vinyl
- 1 for the 20 tracks Digital Media (3 to merge together)

With this example of relationships (fake data):

Stream for free: Spotify under [no label]
in: Albania from 2018-04-06
Vietnam from 2018-04-06 to 2019-01-14

Purchase for download: Qobuzz under [no label] with barcode XXXXXX1
in: France from 2018-04
Belgium from 2018-04 to 2019-12

Purchase for download: HDTracks under Initial Artist Services with barcode XXXXXX1
in: USA from 2019

Purchase for download: iTunes under Initial Artist Services and Mastered for iTunes with barcode XXXXXX2
in: USA from 2019

Purchase for download: Qobuzz under IAMNEW with barcode XXXXXX3
in: France from 2020-01

In details it means a specific release should be created only in case of major difference(s) on the Track numbers or music files (ex: real remasters).
Marketing stuff from shops (ex: Masterised for iTunes) should be ingored as rips from CD (see 1)

For the main release data:

  • Date: The first one from the different platforms
  • Country: To grey out (not really relevant as restrictions can easily be bypassed and change in time)
  • Label: To grey out (as no imprints on Digital releases and the ones from platforms dont refer to the same notion)
  • Cat number: To grey out (as for labels it would ends up mostly in wrong information)
  • Barcode: To grey out or to allow multiple values

1 Not sure about this one. To my point they are not a new release but a rip of the CD version, not a specific release of media (files in this case). But I m missing knowledge on this: How were made releases in the 2000s (real remaster or just rips) and legaly (ex: can a shop provide digital files along a CD is selling to a customer without specific contract).

No. If they have different barcodes, they are different releases.

6 Likes

I am not a collector of digital, but don’t see the problem with this Release Group. If the barcodes are different, then they are different Releases. We use far smaller differences in the CD versions to allow different releases. Small changes in artwork is enough for a new Release, so the same should be fine for Digital.

Too much information is better than too little. A difference is a difference and is interesting to someone…

4 Likes

I have yet to see a good reason for not storing as granular info for digital as we do for physical - at least none that couldn’t be solved via UI (for instance being able to filter or seperate digital media on a label page).

That release group in MB must be the only place in existence where someone who asks themselves ‘I wonder what the difference between all these flippin’ digital versions of Sainte-Victoire is’ can find an answer, thanks to the hard work of @Fabe56.

Why we would not work to integrate that into MB so everyone (including people who only want to see physical details) is happy, but want to just remove the detailed data, I can’t understand ¯_(ツ)_/¯

1 Like

I have a few comments on this statement…
I generally agree, however, when it comes to digital releases, I think a few changes of thought need to happen. Examples, when I have the release “in hand”, how often is it that I know the barcode? With a CD, yes, distinctions are made, but based on what we can see, on the release, in our hand. If a CD does not list a featured artist, it is not supposed to be listed for that release. That means that information not on the release is disregarded, for that release. Why is this logic not applying to digital releases?

I do agree that all information has value. I, as well as others, have proposed ways to address this with a tiered approach to releases. But having release details that a person in possession of a digital release will not know as primary attributes seems senseless. You will be stating identifying factors that do not actually help identify the release.

I would love to see more digital release detail, but not to a point where it becomes meaningless at a primary level. Primary identifying details need to be found on the release itself, the same logic as physical releases. Attention also needs to be given to the fact that those factors are not the same.

Just my opinion. I hope the logic makes sense. You cannot apply all the same rules to physical and digital, they are totally different. Physical releases nitpick every aspect of the physical release, why not the same on digital (being the metadata and encoder specs)?

1 Like

I’m not sure what you mean here. I think that is currently the case on digital releases. The only thing I can think of that may be an unknown is barcodes on sites that don’t provide them, i.e. Amazon, Tidal, etc. Some have started to do catch-all’s for non barcode releases. I’m not sure if that’s the way to go or not. I admit that I assume that if a release is on both Spotify & Deezer with the same barcode, I typically treat the other digital releases as the same release unless they have a different barcode. I also count differences in artwork, if everything else is the same as a different release, just like I would if it was a physical release. As far as the “feat.” apparently the guidelines quietly removed the “not featured on back cover art” thing and now if they are listed in the liner notes you can add them. I had voted no to a few until they pointed out that the guidelines don’t state that. I know the guidelines used to, so why was that removed? I think because so many rap releases, etc, have “feat.” artist that show on every digital release, but were only found on the liner notes on physical media.

3 Likes