Personal Information
- Name: Kartikeya Sharma
- IRC nick: kartikeyaSh
- Email: 09kartikeya@gmail.com
- Github: https://github.com/kartikeyaSh
- Blog: https://kartikeyaSh.github.io
- Time Zone: UTC+0530
Project Details:
Currently Listenbrainz uses MSIDs (Messybrainz-ID) for retrieving useful user stats (e.g. user listens). Now ListenBrainz also plans to generate data which could be used by MusicBrainz to show useful information like artist popularity. MusicBrainz has MBIDs (MusicBrainz-ID) associated with each artist, recording, and release. In order to provide MusicBrainz with the ability to access information based on MBIDs, we have to associate recording_mbids, artist_mbids and release_mbids to the listens where we can. For most of the listens we donāt have artist_mbids and relesae_mbids associated with them. But have recording_mbids associated with them. So, now I plan to associate MBIDs to MSIDs. To do this Iāve divided the project into four parts.
- Create clusters and association based on MBIDs present in the recording.
- Create clusters and association for artists and releases using recording MBIDs.
- Make sure new listens are clustered on insertion properly whenever possible.
- Create API endpoints in MessyBrainz.
Note: Iāve assumed that There will be a MusicBrainz database which can be queried.
Goals of the project:
- Create infrastructure for creating, updating, collapsing, and deleting clusters and associating MBIDs to the created clusters.
- Execute devised algorithms on the database using the infrastructure created.
Iāve written this proposal with the mindset that Iāll be creating some algorithm and to get that algorithm executed Iāll create the infrastructure that is needed. This infrastructure can be used in future.
In part 1 Iāll create infrastructure for creating, updating, collapsing and deleting clusters.
In part 2 Iāll create an infrastructure to access MusicBrainz database and store the required information in our database for future use.
In part 3 Iāll create the infrastructure cluster newly inserted listens to MessyBrainz database and use the cache tables created in part 2.
In part 4 Iāll create infrastructure for retrieving information from the MessyBrainz database using MSIDs and MBIDs which can be used by ListenBrainz and MusicBrainz to show useful stats.
Part 1:
Create clusters based on MBIDs already in the database.
In this phase, Iāll be creating clusters based on MBIDs which have been already inserted when submitting listens.
Why not create a cluster for MSIDs and then associate MBIDs?
We can use meta_sha256 to cluster the MSIDs and then assign MBIDs for those clusters. The meta_sha256 field is computed using the values of (artist, title) fields in the JSON data stored in the recording_json table. So, fields which have same artist name and title for the recording will be clustered together. But with this approach, we canāt handle cases in which we have recordings of the same artist with same names (Summertime by Miles Davis and (Summertime by Miles Davis) or cases where recording, artist, and release all have the same value for different recordings (see track 1 and track 4 of release We Want Miles). Such recordings will have same (artist, title) fields, hence will have exactly similar SHA values. So, if we use meta_sha256 to cluster these fields we will end up clustering the above two recordings into one cluster. Which is not a correct thing to do. As in the end we want to assign MBIDs to clusters. We wonāt be able to assign an unique MBID to such a cluster. On the other hand if we from the start use MBIDs present in the data to create clusters we will never have any ambiguities as MBIDs are unique for every recording, artist, and release. Also by using MBIDs to form clusters, we will also be able to cluster recordings which may have some spelling mistake in the artist, title fields.
First, all this will be done on recordings.
Here is the schema of MessyBrainz database:
List of foreign keys:
recording.data --> recording_json.id
recording.artist --> artist_credit.gid
recording.release --> release.gid
recording_cluster.recording_gid --> recording.gid
recording_redirect.recording_cluster_id --> recording_cluster.cluster_id
artist_credit_redirect.artist_credit_cluster_id --> artist_credit_cluster.cluster_id
artist_credit_cluster.artist_credit_gid --> artist_credit.gid
release_redirect.release_cluster_id --> release_cluster.cluster_id
release.cluster.release_gid --> release.gid
--> represents āforeign key toā relationship
e.g. recording.data is āforeign key toā recording_json.id .
How to access MusicBrainz database:
As done in CritiqueBrainz Iāll use the Docker image which the MusicBrainz project is using. Iāll use mbdata to access the MusicBrainz data for my purposes.
Before proceeding further we need to get unique recording_mbids from the recording_json table which we can get by running a query like
SELECT DISTINCT data ->> ārecording_mbidā FROM recording_json WHERE data ->> 'recording_mbid' IS NOT NULL;
Validate data in recordings:
It may happen that MBIDs submitted in the recording are incorrect. Before clustering/associating the MBID we can check if other data inside the JSON is similar to the corresponding MBID. We can use MusicBrainz database to validate this information. I am thinking of a check like https://en.wikipedia.org/wiki/Edit_distance where we set some threshold and only if that threshold is crossed we say that recording may have wrong MBIDs. Because it is quite easy for manually tagged files to have few errors. And comparison can be made case-insensitive. We can also check if the submitted artist MBIDs and release MBID corresponds to the recording or not.
Create clusters for recordings :
Here are the steps to create a cluster for a single recording using recording MBID present in the data:
- For a given recording MBID in the recording_json table, get all recording_json.id which contain this recording MBID.
- From this list of recording_json.id query the recording table to get the recording.gid associated with this recording_json.id.
- For this list of recording, gids increment the cluster_id in the recording_cluster table and associate this cluster_id to all the recording gids.
- In the recording_redirect table add an entry for the cluster_id and the recording MBID which represents this cluster.
Example:
Here is an example of how it will be done.
Iāve used data present the data dump available here.
Letās create clusters for 'recording_mbid' = '58e48a5d-0ce7-49b8-b1f9-b96a56892eec'
;
- Get
IDs
fromrecording_json
table for'recording_mbid' ='58e48a5d-0ce7-49b8-b1f9-b96a56892eec'
.
SELECT * FROM recording_json WHERE data ->> 'recording_mbid' ='58e48a5d-0ce7-49b8-b1f9-b96a56892eec';
id | data | data_sha256 | meta_sha256 |
---|---|---|---|
1054240 | {āartistā:āThe Oh Hellosā,ārecording_mbidā:ā58e48a5d-0ce7-49b8-b1f9-b96a56892eecā,āreleaseā:āThe Oh Hellos EPā,ātitleā:āCold Is the Nightā} | d8134f6e80c8d841ceb229024942b73154668ea0da20d83eeebf1bd29d0d01bb | 550185d5dc2b9a269db15e2c94fffa22b0ab4cea89023770358a8b310ac58ca5 |
1107275 | {āartistā:āThe Oh Hellosā,ārecording_mbidā:ā58e48a5d-0ce7-49b8-b1f9-b96a56892eecā,āreleaseā:āThe Oh Helloāsā,ātitleā:āCold Is the Nightā} | 2cdea46c3446d0cca1732df5bce12cec3305b94b6c7bdcff9af87f376fbbd84b | 550185d5dc2b9a269db15e2c94fffa22b0ab4cea89023770358a8b310ac58ca5 |
5864854 | {āartistā:āThe Oh Hellosā,ārecording_mbidā:ā58e48a5d-0ce7-49b8-b1f9-b96a56892eecā,ātitleā:āCold Is the Nightā} | ebbce841db049579f6350c5e8cc665439e9bb7a16bf7ecb8a50b0bc1cc8abaaa | 550185d5dc2b9a269db15e2c94fffa22b0ab4cea89023770358a8b310ac58ca5 |
8772454 | {āartistā:āThe Oh Hellosā,ārecording_mbidā:ā58e48a5d-0ce7-49b8-b1f9-b96a56892eecā,āreleaseā:āThe Summer Kollectionā,ātitleā:āCold Is the Nightā} | 33cf15c39be9368094480a3483923b759422f3bf0e551c475efed80bf42af98a | 550185d5dc2b9a269db15e2c94fffa22b0ab4cea89023770358a8b310ac58ca5 |
- Now using the
IDs
fromrecording_json
table we query thegids
fromrecording
table and get the following table :
SELECT gid FROM recording WHERE data = 8772454 OR data = 1054240 OR data = 1107275 OR data = 5864854;
gid |
---|
f9c89d8b-b6d8-4996-83fa-8a9444962b98 |
28aae0dd-6f0a-49da-92a0-1ddddcf0ea90 |
1e30ac1b-6d1a-4e0d-bd12-4177eb7d9ecb |
8ec1db13-aa0c-4a52-9885-2ffcd031534d |
- These four
gids
represent the same recording. So we should put them in the same cluster. For that, we first increment the serial columncluster_id
inrecording_cluster
table and put thesegids
into therecording_cluster
table. The following table depicts this:
cluster id | recording_gid |
---|---|
1 | f9c89d8b-b6d8-4996-83fa-8a9444962b98 |
1 | 28aae0dd-6f0a-49da-92a0-1ddddcf0ea90 |
1 | 1e30ac1b-6d1a-4e0d-bd12-4177eb7d9ecb |
1 | 8ec1db13-aa0c-4a52-9885-2ffcd031534d |
- Now we put this
cluster_id
into therecording_cluster_redirect
table with therecording_mbid
on which we were working and get following table:
recording_cluster_id | recording_mbid |
---|---|
1 | 58e48a5d-0ce7-49b8-b1f9-b96a56892eec |
Doing that for all Unique MBIDs we will get clusters based on MBIDs for all the listens which have MBID.
On the bases of the data in the available data dump. Here are some stats:
We have 9335675
recordings in recording_json
table. Out of which 4377964
recordings contain recording_mbid
, So by doing the above process we will end up associating 46.89%
recording MSIDs with recording MBIDs.
The same process can be applied to create clusters of artists and releases where MBIDs are present in the json data present in the recording_json table. A little modification needs to be done for creating clusters for artists.
For artists we will get a list of MBIDs so all those MBIDs are to be associated to a single MSID. For this, we have the artist_credit_redirect table where we will associate a cluster id to multiple MBIDs.
For example recording I Donāt Wanna Live Forever (Fifty Shades Darker) has two artists ZAYN and Taylor Swift so the artist_credit_cluster table will contain a single cluster_id which will map to both ZAYN and Taylor Swift artist MBIDs.
Here are a few stats based on data in data dump:
We have 689185
artists in artist_credit
table. 12875
recordings contain artist_mbids
, So by doing the above process we will end up associating 1.87%
artist MSIDs with artist MBIDs in case MSIDs are not clustered at all i.e. MSIDs are already unique.
We have 1066187
releases in release
table. 10691
recordings contain release_mbid
, So by doing the above process, we will end up associating 1.0027%
release MSIDs with release MBIDs in case MSIDs are not clustered at all i.e. MSIDs are already unique.
Even if we have some clustering these results will not be great. Due to this, Iāve proposed the second method in part 2.
Implementation details:
Scripts will be written to execute the above-specified algorithm on the database. The scripts to create clusters for all three entities have somewhat similar structure. Code for creating clusters, merging two clusters, deleting clusters, and updating clusters to add or remove or edit some field is quite similar for all three entities. While creating clusters for recording Iāll keep this in mind and create a file like data_utils.py which contains this shareable code. So, that while creating scripts for artist and release Iāll simply reuse the code from data_utils.py.
As recording MBIDs and MSIDs are to be fetched a lot of times. So, to save time we can simply create a table which will contain fields for (recording MBIDs, recording MSIDs, release names) which will be indexed on recording MBIDs and recording MSIDs for fast lookup.
A functionality should be present in the script to let user only run this script on data submitted after a specified timestamp. For this we already have fields (submitted, updated) in the tables which can be used to create such functionality.
A dry run feature for the scripts which will not manipulate anything in the database but will keep track of information like how many clusters are formed, how many entities are examined, how many MSIDs to MBID association are made will be written. Dry run will only create temporary tables to store information in case required. For dry run the script will keep a list of variables to store information which will be logged after the execution of the script in dry run.
Part 2:
Creating clusters for artist_credits and releases where we only have recording MBID in the recording_json.data
Iāll explain the process for the artist_credit table which will be applicable for release table also.
For creating these clusters we will also take help of MusicBrainz database
We will use the recording MBIDs to fetch artist_mbids from the MusicBrainz database.
As written in the documentation of MessyBrainz create_tables.sql
file āMessybrainz artists are artist credits. That is, they could represent more than 1 MusicBrainz id. These are linked in the artist_credit_redirect table.ā As an artist_credit can represent more than one MBID so an artist_credit_cluster will also represent multiple MBIDs.
Creating cache tables:
To access the required information from MusicBrainz database we should create two new tables. We will create tables to store information about these two entities:
- artist MBIDs corresponding to a recording MBID.
- release MBID corresponding to a recording MBID.
By creating these tables we wonāt have to query MusicBrainz database for MBIDs every time. And due to some reason we donāt have artist MBIDs for some recording MBIDs in our cache tables, we will query the MusicBrainz database and insert that information into cache tables. Iāve used the word cache in the sense that we will first try to get the required information from these tables and if the information is not present in these tables, we will query MusicBrainz database. The data in these tables is permanent.
Before proceeding further, Iāll create scripts to fill up these tables. These scripts can be executed periodically. And will fetch artist MBIDs, and release MBIDs (If possible, as recordings can be present in multiple releases) from MusicBrainz database for the recordings which only contain recording MBIDs. We can use the āsubmittedā field in the recording_json table to run the above script only for the recordings submitted after some timestamp.
Algorithm to create clusters for artist_credit:
Let me explain the algorithm to create a cluster using a single recording_mbid.
- For a recording_mbid fetch cluster_id from the recording_redirect table.
- Now using this cluster_id find all the MSIDs that this recording_mbid represents. And for these recording_MSIDs we find all artist MSIDs that these recording MSIDs corresponds to in the recording table.
- Now put all these artist MSID values in the same cluster as all these artists represent the same recording and should be the same.
- Fetch artist MBIDs for this recording_mbid from MusicBrainz database. This can have multiple MBIDs.
- For each of the MBIDs we got from MusicBrainz database, we put them in the artist_credit_redirect table with the same cluster_id value.
Example:
Lets create cluster for artists using "recording_mbid":"0b2432c3-9215-4115-a1c8-87ef048bd3df"
.
- Get the list of
id
for therecording_mbid
from therecording_json
table.
id | data | data_sha256 | meta_sha256 |
---|---|---|---|
228602 | {āartistā:āMohit Chauhan, Viviane Chaix, Tanvi Shah, Suvi Suresh & Shaliniā,ārecording_mbidā:ā0b2432c3-9215-4115-a1c8-87ef048bd3dfā,āreleaseā:āRockstarā,ātitleā:āHawaa Hawaaā} | 2248cd19ba67bb99ab82bcabb8ef0806916aca0e5df069121c661f37f053d24c | 791f18d4c22c67d3b3ed2b4c9bfa01a20220511e4cf26da542313b17bba1eb08 |
9594 | {āartistā:āMohit Chauhan, Viviane Chaix, Tanvi Shah, Suvi Suresh & Shaliniā,ārecording_mbidā:ā0b2432c3-9215-4115-a1c8-87ef048bd3dfā,ātitleā:āHawaa Hawaaā} | c14c9b3308fabd40878b43d2b6864a4e7934474e29a9b392d8dc086041f9e692 | 791f18d4c22c67d3b3ed2b4c9bfa01a20220511e4cf26da542313b17bba1eb08 |
- Now using these
IDs
fetch theartist_msids
from therecording
table.
select distinct artist from recording where data=228602 or data=9594;
artist |
---|
afa7da69-ba11-4cc3-9193-3b67903f72b5 |
- Now put all these artist MSID values in the same cluster.
cluster_id | artist_credit_gid | updated |
---|---|---|
1 | afa7da69-ba11-4cc3-9193-3b67903f72b5 | 2018-03-05 20:06:00.425561+05:30 |
- Fetch artist MBIDs for the
"recording_mbid":"0b2432c3-9215-4115-a1c8-87ef048bd3df"
from MusicBrainz database.
artist_MBIDs |
---|
1dd28f27-4ab3-4a3f-8174-4ccd571a9dce |
e58e9ad6-66be-4ec2-b1d1-9f7f6def9711 |
fc8ee5d5-f03a-4e7e-97c5-624ee35c9894 |
10300673-e9b8-40ba-a7aa-5954238bb3e6 |
edd8f606-b78e-4410-baff-eacf17f169cc |
- For each of the MBIDs we got from MusicBrainz database, we put them in the
artist_credit_redirect
table with the samecluster_id
value.
artist_credit_cluster_id | artist_mbid |
---|---|
1 | 1dd28f27-4ab3-4a3f-8174-4ccd571a9dce |
1 | e58e9ad6-66be-4ec2-b1d1-9f7f6def9711 |
1 | fc8ee5d5-f03a-4e7e-97c5-624ee35c9894 |
1 | 10300673-e9b8-40ba-a7aa-5954238bb3e6 |
1 | edd8f606-b78e-4410-baff-eacf17f169cc |
Now using this approach we will get clusters of artist_credits wherever possible. We will use the same approach to get clusters for releases.
While using the same approach for creating clusters for releases we will also take help of the release
field in the JSON data. We will fetch the list of releases and release MBIDs in which the recording has been released. Out of these releases, we will match the name of the release that the release
field contains in the JSON data. If we find a match then this release MSID will be mapped to the release MBID that has the same release name. In case of release name not being able to disambiguate the release, we will have to rely on additional information which can be used to disambiguate this. We donāt cluster releases which have different names.
Even in the case, we get a unique release in the MusicBrainz database we canāt just cluster the releases as it may happen that the release listed in the JSON data may be new and not present in the MusicBrainz database. So, by clustering such releases we will be making the data incorrect in the database which is something we donāt want.
For listens that donāt contain MBIDs we simply create a cluster for each recording, artist, release and put them in one to one relationship in respected cluster tables and donāt put these cluster_ids in cluster_redirect tables.
Here are some stats based on the data available in the data dump.
We have 2200662
distinct recording_mbids in the data dump so applying the above approach we will be able to associate MBIDs to a significant number of listens.
We have 7512343
recordings in recording_json which contain all three fields i.e. artist, title and release
so we will also be able to match most of the release MBIDs that we get by the above approach.
To get the exact stats we will have to first execute this approach on the data present in the data dump.
Implementation details:
After creating scripts, manage.py will provide functions which will have options to create clusters for an entity(recording, artist, and release), delete any formed clusters, and a dry run feature which will not mainupulate anything in the database but will keep track of information like how many clusters are formed, how many entities are examined, how many MSIDs to MBID association are made. Dry run will only create temporary tables to store information in case required.
For dry run the script will keep a list of variables to store information which will be logged after the execution of the script in dry run.
And then tests to verify the correctness of these scripts will also be written.
Part 3:
Cluster newly submitted data into appropriate clusters.
When new listens are inserted into the database we should cluster them into some cluster whenever possible. The access to the MusicBrainz will be done in two steps. First the cache tables created in part 2 will be queried and if MBIDs are not found then we will query MusicBrainz database for MBIDs.
Here is the pseudo code for clustering newly inserted recording:
Input to this algorithm is recording JSON
.
First, we see that the same recording exists in the database or not by using sha256 as done here.
If we find that the recording is in the database already:
We must have clustered and associated the recording MBID to this recording if it was possible. So, we don't work on it again.
Else:
Insert this recording to our database and assign it a new MSID.
If this recording contains recording MBID:
If this recording MBID is present in the recording_redirect table:
Add this new recording MSID to the cluster represented by the recording MBID.
Else:
We create a new cluster and associate this cluster the MBID present in the recording.
As we know an artist_credit can be associated with more than one MBID. So, some modifications are done to the approach that we use for recording.
For clustering artists here is the pseudo code:
Input to this algorithm is recording JSON
First, we see that the same artist exists in the database or not by querying the database as done here.
If we find that the artist is in the database already:
If the recording contains artist MBIDs:
If the database already contains a cluster which points to only this list of MBIDs:
We have already clustered and associated this artist_credit if it was possible. So, we don't work on it again.
Else:
Create a new cluster and assign the MSID of the artist_credit to it.
Assign these MBIDs to this cluster.
Else:
If the recording contains recording MBID:
Fetch artist MBIDs from MusicBrainz database.
If the database already contains a cluster which points to only this list of MBIDs:
We have already clustered this artist_credit. So, we don't work on it again.
Else:
Create a new cluster and assign the MSID of the artist_credit to it.
Assign these MBIDs to this cluster.
Else:
Insert this artist to our database and assign it a new MSID.
If the recording contains artist MBIDs:
If the database already contains a cluster which points to only this list of MBIDs:
We already have a cluster for this artist_credit. So, Put this new MSID to the list of MSIDs in the cluster.
Else:
Create a new cluster and assign the MSID of the artist_credit to it.
Assign these MBIDs to this cluster.
Else:
If the recording contains recording MBID:
Fetch artist MBIDs from MusicBrainz database.
If the database already contains a cluster which points to only this list of MBIDs:
We already have a cluster for this artist_credit. So, Put this new MSID to the list of MSIDs in the cluster.
Else:
Create a new cluster and assign the MSID of the artist_credit to it.
Assign these MBIDs to this cluster.
This process is also applicable to releases by just putting a validation check on release name after finding the releases using MusicBrainz database.
Implementation details:
As a lot of listens are submitted to MessyBrainz so we canāt do the above computation in the same web container and will have to send the new submitted listens to a RabbitMQ queue which will be submitted to a container which will be continuously running a script to execute the above functionality. This will be similar to what influx-writer does in ListenBrainz. While doing so, we can use the utility functions in data_utils.py created during part 1 and part 2.
Part 4:
Create Endpoints for API.
After we have created clusters we also need MessyBrainz API to provide the following functionality:
- Get a list of all MSIDs provided a single MSID.
- Get a list of all MSIDs provided an MBID.
This information will be used by ListenBrainz to calculate stats based on MBIDs and MSIDs. Iām mostly interested in adding the functionality to MessyBrainz to let ListenBrainz fetch the above information. ListenBrainz does not use MessyBrainz API but uses MessyBrainz directly. For the sake of completeness, I will add API endpoints too.
GET /msid/?{params=value}
The value of the parameters will be used to generate the list of all MSIDs that the MSID in the URL is equivalent to.
-
URL Parameters
Required
id=[UUID]
: The value of id is MSID. All the MSIDs which are represent the same MSID as in the URL will be returned.
request_type=[string]
: This is the type of request which is made. Can have three typesartist
,release
, andrecording
.
Optional
mbid=[boolean]
: If MBID associated with the MSID also wanted in the response, then set it totrue
else set it tofalse
. By default, it is set tofalse
. Wonāt return MBIDs in case no association is present in the database.
Sample Call: Request from curl to get MSIDs for request_type = recording.
$ curl "https://messybrainz.com/msid/?id=baf4f1b8-0665-4bba-b7fc-b23aa9cf0c95&request_type=recording&mbid=true \ -X GET
- Response Example:
"count":"1",
"ids":[
{
"mbid_count": "1",
"msid_count": "2",
"mbid": "0b2432c3-9215-4115-a1c8-87ef048bd3df",
"msid": ["baf4f1b8-0665-4bba-b7fc-b23aa9cf0c95","dc696544-108f-4217-90da-f2b377b7327e"]
}
]
}
Sample Call: Request from curl to get MSIDs for request_type = artist.
$ curl "https://messybrainz.com/msid/?id=afa7da69-ba11-4cc3-9193-3b67903f72b5&request_type=artist&mbid=true \ -X GET
- Response Example:
"count":"2",
"ids":[
{
"mbid_count": "1",
"msid_count": "1",
"mbid": ["88a8d8a9-7c9b-4f7b-8700-7f0f7a503688"],
"msid": ["afa7da69-ba11-4cc3-9193-3b67903f72b5"]
},
{
"mbid_count": "1",
"msid_count": "2",
"mbid": ["b49a9595-3576-44bb-8ac0-e26d3f5b42ff"],
"msid": ["afa7da69-ba11-4cc3-9193-3b67903f72b5", "93ec2d51-983d-4ce4-85c9-1380d63d86c0"]
}
]
}
This is a typical case in which a single artist_credit is corresponding to multiple artist MBIDs. Here MSID afa7da69-ba11-4cc3-9193-3b67903f72b5
corresponds to such an MSID. This can be a case where artists have the same name. For example, James Morrison (UK singer) and James Morrison (Australian jazz musician). In this case, MSIDs representing these MBIDs will be same and we may get a response like above.
GET /mbid/?{params=value}
The value of the parameters will be used to generate the list of all MSIDs that the MBID in the URL is equivalent to.
-
URL Parameters
Required
id=[UUID]
: The value of id is MBID. All the MSIDs which represent this MBID in the URL will be returned.
request_type=[string]
: This is the type of request which is made. Can have three typesartist
,release
, andrecording
.
Sample Call: Request from curl to get MSIDs for MBID.
$ curl "https://messybrainz.com/mbid/?id=93ec2d51-983d-4ce4-85c9-1380d63d86c0&request_type=recording \ -X GET
- Response Example:
{
"msid_count": "2",
"msids": ["baf4f1b8-0665-4bba-b7fc-b23aa9cf0c95", "dc696544-108f-4217-90da-f2b377b7327e"]
}
Sample Call: Request from curl to get MSIDs for MBID.
$ curl "https://messybrainz.com/mbid/?id=afa7da69-ba11-4cc3-9193-3b67903f72b5&request_type=artist \ -X GET
- Response Example:
{
"msid_count": "3",
"msids": ["93ec2d51-983d-4ce4-85c9-1380d63d86c0", "61746abb-76a5-465d-aee7-c4c42d61b7c4", "0b2432c3-9215-4115-a1c8-87ef048bd3df"]
}
If we query MSIDs for some MBID we will always get only one cluster with a list of MSIDs which represent the same MBID.
Implementation details:
Now to generate the response for a get request using MSIDs we take following steps:
- Using this
messybrainz_id
we get thecluster_id
from therecording_cluster
table. - Now from that
cluster_id
, we get all theids
that are inside this cluster from therecording_cluster
table.
For MBIDs we will have to get the cluster_id from *_cluster_redirect
table and fetch the MSIDs from *_cluster
table.
Most of the code will be written in data.py which will include functions for adding this functionality and api.py will contain endpoints for clients to access this functionality.
Here is a link to my gist for additional ideas, timeline, about me and Q & A.