Project Summary
-
Title: GraphQL server as a MusicBrainz API alternative
-
Proposed mentors: @bitmap , @jadedblueeyes
-
Languages/skills: Rust, GraphQL, SQL
-
Estimated Project Length: 350 hours
Expected Outcomes
-
Working GraphQL server in Rust covering a specific subset of entity types
-
Schema design with depth limiting and query cost analysis
-
In process caching layer
-
Multi-entity lookup in a single query
Extension Objectives
-
Cover additional relatable entity types beyond the initial subset
-
Add new database indexes or materialised views to support expensive links
-
Integrate with other MetaBrainz projects
Contact Information
-
matrix: @op3kay
-
email: sreeharirathish128@email.com
-
github: owlpharoah
-
my page where i dump stuff periodically: site
-
Timezone: IST
Personal Introduction
Hi everyone! Iām Hari, also known online as owlpharoah(op3kay), and Iām a second year student at IIIT Jabalpur. Iāve been an avid music fan ever since I was a child. I was bought up in a household of artists and music played a huge role in my early years. I also really enjoy Rust and backend programming, so this project of building a GraphQL server as a MusicBrainz API alternative felt like a sweet dream come true ![]()
Why This Project
The current MusicBrainz XML/JSON API works, but its not really optimal. You need a bunch of inc parameters to get related data, browsing support is not even across entity types, and you cant really look up five artists at once.
GraphQL is a really good fit here. It handles asked feilds with asked relationships without custom server logic per link type and multi entity lookup is possible. Query depth and cost can be analysed/limited before execution rather than after the database has already been hit. And this is something Iād want to spend the summer on.
Prior Work
Before writing this proposal, I built a rough prototype to validate the approach and get a better feel for the real problems. It covers Artist, Release Group, Release, and Recording focusing on basic field resolution, relationship between them, and an enforcable depth limit set at the schema level. The stack is async-graphql, sqlx, and Axum, which is what Iād use for the real thing.
Building it showed me a few issues Iāll need to focus on:
N+1 queries
In the prototype, resolving artist_type, gender, and area on an Artist each fire a separate query. Thatās fine for a single artist lookup, but would be really hard on the server if it were for a list. DataLoaders fix this by batching, sometimes working somewhat like a schema level cache.
Depth limiting
async-graphql makes direct depth limiting pretty easy, but it doesnāt catch everything. A shallow query could still be expensive. Queries for different types will have to be analysed individually, assigned certain weights and an overall query cost limit must also be implemented
let schema = Schema::build(...).limit_complexity(200).limit_depth(5).data(pool).finish();
#[ComplexObject]
impl Artist{
#[graphql(complexity = "10 * child_complexity")] #create weights
async fn release(&self,..) -> async_graphql::Result<Vec<Release>>{...}
Proposed Project
Scope
Rather than trying to cover everything, Iāll pick a focused subset of entity types and do them properly. My current plan is:
-
First Priority Entity Fields: Artist, Release, Release Group, Recording, Label
-
Core fields:
-
Artist:
gid,name,sort_name,comment,type,gender,area -
Release Group:
gid,name,comment,artist credit -
Release:
gid,name,comment,artist credit,release group,status,language,script,country,packaging -
Recording:
gid,name,comment,length,video -
Label:
gid,name,comment,type,area,label_code,begin/end dates
-
-
Standard relationships between them
-
URL relationships
Schema Design
A sketch of the Artist type:
#[derive(SimpleObject)]
#[graphql(complex)]
pub struct Artist {
pub gid: Uuid,
pub name: String,
pub sort_name: String,
pub comment: Option<String>,
#[graphql(skip)]
pub id: i32,// internal DB id ie not exposed
}
#[ComplexObject]
impl Artist {
async fn area(&self, ctx: &Context<'_>) -> Result<Option<String>> { ... }
async fn release_groups(&self,...) -> async_graphql::Result<Vec<ReleaseGroup>> { ... }
}
A similar sketch for Artist credit would be:
type ArtistCredit {
artist: Artist!
name: String!
joinPhrase: String # e.g. "&","feat.",etc..
}
Architecture
The server will be written using:
- async-graphql: Solid support for DataLoader pattern, query depth and complexity limiting.
- sqlx: the actual database connection & queries will be handled by this
- axum: The framework for the server
Hereās how a request flows through the system:
Caching Strategy
The caching will be in two levels, both in process so as to reduce external dependencies or extra server requirements:
- DataLoader batching: within a single request, identical lookups are deduplicated automatically.
- moka response cache: entity reads are cached in-process with a TTL . It runs in the same process and evicts least used cache hence promising performance.
Timeline
-
Week 1
-
Align with mentors on the final entity subset and schema conventions
-
Iāve already spent time with the MusicBrainz database schema and the existing API source, and built a prototype so this week would mainly be about locking in decisions
-
Set up the production project structure and testing scaffolding.
-
-
Week 2
-
Schema design review with mentors.
-
Field naming, nullability, and relationship directions need to be finalized before I start writing resolvers.
-
-
Weeks 3
- Ill actually start implementing the resolvers from the approved schema. These would build the base foundation for the entire server. After which ill start to focus on Dataloaders and other parts of this.
-
Week 4-5
-
DataLoader implementation for all entity types.
-
This is where N+1 query problems get fixed. Iāll write tests to verify that fetching 100 artists with their release groups produces an actually optimized number of database queries.
-
-
Week 6
-
Query depth limiting and analyzing complexity with weights.
-
async-graphql has extension points for this.. I need to decide on defaults and make them configurable.
-
Iāll also identify any relationship links that are structurally expensive and either add indexes or block them entirely.
-
-
Week 7
-
Moka caching layer.
-
This covers the cache key design, TTL configuration, memory bounds, and making sure nothing serves empty(stale) data. Cache hit/miss metrics go in here too.
-
-
Week 8
-
Multi-entity lookup
-
A single query should be able to return 50 artists by MBID. Mostly about making sure the DataLoader handles list inputs correctly and the schema exposes it cleanly.
-
-
Week 9
- Deployment work. Dockerfile, Docker Compose setup, and documentation for running the server locally and against the MusicBrainz database.
-
Week 10
- Integration and documentation. API documentation, schema reference, and any integration work needed to connect the server to the broader MusicBrainz infrastructure.
-
Weeks 11ā12
- Buffer. Something will take longer than expected. If not, Iāll use the time to extend the entity coverage or add materialised views for the most expensive relationship queries.This week takes into consideration Murphyās Law.
Community Affinities
What music do you listen to?
I was bought up in a house of artists and hence developed a love for it early on. Some tracks that I play on repeat constantly:
What aspects of MusicBrainz interest you most?
The scale of it. Collecting structured metadata for all recorded music, maintained by contributors rather than a big company. Im all ups for open sourced communities and data and anyone being able to contribute in helping it grow is what excites me the most.
Programming Background
I ran my first āhello worldā in python back when I was a freshman in highschool. I still remember all the possibilities my mind spun up as soon as I saw the text āhello worldā pop up on the terminal. That moment was phenomenal and something ill never forget. After which I moved onto writing simple CLI games like a stock trading CLI game with pnl and other metrics. Another instance I still remember was using python to generate fake handwritten notes that I had to submit instead of actually writing them by hand(it actually worked
). I then went onto exploring programming languages like i was pokemon hunting from Java to Javascript, Go, C, C++ and ive finally ended up at Rust and havenāt looked back ever since.
During this time I also started exploring web dev, made a few fun projects like gitmaps ā lets you see how spread on the world map your oss contributors are. With web dev I realised backend is something I really enjoy doing and thats what im working on now.
Some other projects ive built includes:
-
RFstarstarKC - CLI learning tool to learn about RFCs and their implementations using animations and markdown documents.
-
VReWind - an NPM package to scaffold react tailwind projects.
-
lobstorrent - CLI to find blazingly fast torrents for anything.
Practical Requirements
-
Available equipment: My primary laptop runs Arch with Niri (wayland). My secondary runs windows 11. Both devices have 16gb RAM and 512gb Storage.
-
Time available: My university semester ends by may 5th and from then on my summer vacations start. From that point Iām free of other commitments and can work roughly 30-35 hours a week on the project.
