Mohammad Shahnawaz
Metabrainz: veldora
discord : veldora
Summary:
BrainzPlayer is the custom react component in Listenbrainz which is used to search and play tracks from multiple sources eg.Youtube, Spotify and uses multiple data sources to facilitate track playback.
This project aims to allow users to play audio files from Internet Archive (IA) by integrating it to BrainzPlayer. This integration will help Listenbrainz(LB) users to discover and play songs that might not be available on other platforms.
My approach involves in creating an Internet Archive indexer. It will extract the metadata of audio files and URL from Internet Archive using the IA’s API and store these in database. The indexer will be synchronized so it will index metadata items from IA and moniter it for changes.
Then an API will be implemented to search indexed data and extract required media URL for track. This API will be used by InternetArchivePlayer
to play the track.
For the frontend, I will create a InternetArchivePlayer
similar to players(add link) in Brainzplayer . It will search the indexed data using an API and find the relevant audio file URL for playback and using html5 audio element in BrainzPlayer play the track
This project allows Listenbrainz to play songs that are not readily available with the functionality of backend indexing and lightweight audio player using HTML5.
Project Overview
This project involves developing a Internet Archive player and integrating it with BrainzPlayer. It will incorporate ListenBrainz with vast audio collection of Internet Archive allowing users more options to
Brainz
The key goals of this project are:
-
Developing an indexer to index music metadata items and media file urls from the Internet Archive.
-
Developing an API for BrainzPlayer to search this index for media files.
-
Integrate the API with BrainPlayer to play the relevant track.
-
The indexer is able to index metadata items in the Internet Archive and monitor it for changes.
Understanding of the Project
Phase 1: Implement Internet Archive Indexer
Part 1:
Currently ListenBrainz has indexers (e.g. spotify, apple, soundcloud) in listenbrainz/metadata_cache
to extrack track metadata from external sources.
In this part we will implement an Indexer for Internet Archive data.
The Internet Archive(IA) has a vast repository of music and they offers APIs and python library to search and retrieve metadata and media URLs. So using the IA python library we will extract the URLs of track along with their metadata.
from internetarchive import search_items, get_item
search_results = search_items('collection:78rpm')
URLs = []
for result in search_results:
identifier = result['identifier']
item = get_item(identifier)
for file in item.files:
name = file.get('name')
if name and name.endswith(('.mp3', '.ogg', '.flac', '.wav')):
url = f"https://archive.org/download/{identifier}/{name}"
URLs.append(url)
This URL will be formatted with track metadata. The identifier for the track will be create based on current identifiers such as spotify:track:<spotify_id>
.
"title": "Song Name",
"artist": "Artist Name",
"collection": "78rpm",
"url": "https://archive.org/download/...",
"identifier": "IntenetArchive:track:<IA_identifier>"
Part 2: Store the formatted data in database using Redis.
In this part we will use the existing function in Listenbrainz e.g brainzutils.cache
.
from brainzutils import cache
def store_ia_recording(metadata: dict) -> None:
identifier = metadata.get("identifier")
redis_key = f"ia:recording:{identifier}"
cache.set(redis_key, metadata)
Phase 2 : Develop Search API in Listenbrainz
In this we will create a new API endpoint that will be able to:
- search the indexed data stored in Redis database.
- search te indexed database for the item.
- Return the result of query as json with URL of track.
Part 1: Create a new API route 1/internet_archive/search
.
This API route will be implemented in the listenbrainz/webserver/views/internet_archive_api.py
.
Phase 3: Integrate indexer with BrainzPlayer
In this part we will extend the current BrainzPlayer funtionality to support new InternetArchivePlayer
Part 1 :
Create a Internet Archive player similar to current players in frontend/js/src/common/brainzplayer/InternetArchivePlayer.tsx
This react component will query the new internet_archive_api
to get track metadata and URL.
Part 2: Create a custom audio player for InternetArchivePlayer
In this part we will create a custom audio player in frontend/js/src/common/brainzplayer/AudioPlayer.tsx
to play the audio file using HTML5 audio element, this creates a simple audio player without using wrappers like existing players eg. Spotify, Apple, Soundcloud.
interface Props {
audioUrl: string;
onEnded: () => void;
onPause: () => void;
onPlay: () => void;
onTimeUpdate: (currentTime: number, duration: number) => void;
onError?: () => void;
}
export const AudioPlayer = React.forwardRef<HTMLAudioElement, Props>(
({ audioUrl, onEnded, onPause, onPlay, onTimeUpdate, onError }, ref) => {
return (
<audio
ref={ref}
src={audioUrl}
controls
onEnded={onEnded}
onPause={onPause}
onPlay={onPlay}
onError={onError}
onTimeUpdate={(e) => {
onTimeUpdate(e.currentTarget.currentTime, e.currentTarget.duration);
}}
/>
);
}
);
Part 3 : Integrate InternetArchivePlayer
in Brainzplayer
In this part we will extend the existing BrainzPlayer to handle the new InternetArchivePlayer
We will make changes in frontend/js/src/common/brainzplayer/BrainzPlayer.tsx
if (datasource instanceof InternetArchivePlayer) {
return AppleMusicPlayer.isListenFromThisService(listen);
switch (key) {
case "InternetArchive":
dataSources.push(InternetArchivePlayerRef);
break;
Phase 4: Testing and Error handling
In this part we will do a comprehensive testing and error handling of indexer and API. Also I will check the functionality of player .
I will also discuss any potential changes with mentor.
Macro-Implementation Details with timelines:
Project details compile and Community Bonding. (Week 1)
Milestone 1: Develop an Indexer for Internet Archive (Weeks 1-5)
Deliverables:
An indexer similar to current ones in listenbrainz/metadata_cache
with these functions:
- Search Internet Archive collection
search_ia_items(page: int)
— (Week 1-2) - Extract URLs from search result
extract_urls(identifier: str)
. — (Week 3) - Format track metadata to store it in database
format_track(item: Dict).
— (Week 4) - Store metadata of tracks in database using Redis
store_track_db(entry: Dict)
. — (Week 5)
Milestone 2: Develop API for ListenBrainz (Week 6-8)
Deliverables:
- New API endpoint in ListenBrainz
1/internet_archive/search
. — (Week 6) - Accept query parameter eg.
artists, title
. — (Week 7) - Query indexed database and search for matching items. — (Week 8)
- Return JSON result of query. — (Week 8)
"results": [
{
"title": "Song Name",
"artist": "Artist Name",
"collection": "78rpm",
"url": "https://archive.org/download/...",
"identifier": "IntenetArchive:track:<IA_identifier>"
}
]
- The Indexer is able to monitor changes in IA database
sync_IA()
Milestone 3: Integrate Internet archive tracks with BrainzPlayer (Week 8-11)
- In this we will modify the current BrainzPlayer in
frontend/js/src/common/brainzplayer
. - It will search the search IA endpoint
1/internet_archive/search)
. - Update BrainzPlayer to add HTML5 element to play audio.
Milestone 4: Testing and Errors handling ()
During this period I will do comprehensive testing and get feedback from my mentor for any code changes and error handling.
Deliverable:
Brainzplayer will be able to play recording from Internet Archive without any errors.
- Issue: Redis may not be designed for large-scale fuzzy searches or full-text queries across thousands of keys. This can become a bottleneck as the number of indexed recordings grows.
- Support Needed:
- Guidance from the ListenBrainz team on the expected scale and whether it’s acceptable to query Redis with
keys()
or switch to an alternate lightweight search solution (e.g.,RediSearch
orSQLite
).
- Guidance from the ListenBrainz team on the expected scale and whether it’s acceptable to query Redis with
- Possible Solutions:
- Use limited namespaces (
ia:recording:
) and apply pagination or caching on results. - If scale becomes a concern, switch to a batched index export and query from a local in-memory structure.
- Use limited namespaces (
Potential Issues
1. Redis Search limitations and Scalablity:
-
Issue: Redis may not be designed for large-scale fuzzy searches or full-text queries across thousands of keys. This can become a bottleneck as the number of indexed recordings grows.
-
Support Needed: Guidance from the ListenBrainz team on the expected scale and whether it’s acceptable to query Redis with keys() or switch to an alternate lightweight search solution (e.g., RediSearch or SQLite).
-
Possible Solutions: Use limited namespaces (ia:recording:) and apply pagination or caching on results.
2. Metadata Inconsistencies from Internet Archive
- Issue: Internet Archive metadata can vary in format (missing fields, inconsistent naming, etc.).
- Support Needed:
- Help with setting up fallback or canonical mapping rules—possibly guidance on how similar issues were handled in other metadata indexers in ListenBrainz.
- Possible Solutions:
- Define a normalization layer in the indexer to ensure consistent keys (
title
,artist
,identifier
,audio_url
, etc.). - Maintain a validation schema and log skipped or malformed entries.
- Define a normalization layer in the indexer to ensure consistent keys (
3. Changes in Internet Archive APIs or File Structure
- Issue: The structure of files or the API response from Internet Archive might change or contain edge cases that break the indexer.
- Support Needed:
- Input on how tightly to couple the indexer with IA APIs—should the IA CLI tool (
internetarchive
Python package) be preferred for future-proofing?
- Input on how tightly to couple the indexer with IA APIs—should the IA CLI tool (
- Possible Solutions:
- Use the official IA Python CLI/toolkit (
internetarchive
) to abstract low-level API changes. - Set up unit tests on metadata extraction to catch schema drift early.
- Use the official IA Python CLI/toolkit (
4. Playback Compatibility in BrainzPlayer
- Issue: Some recordings may be in formats not supported by HTML5 audio (
.flac
,.ogg
), or some links may require redirection or CORS headers. - Support Needed:
- Help from the frontend team to test multiple IA formats and ensure graceful fallback.
- Possible Solutions:
- Prefer
.mp3
links when available. - Implement a client-side MIME type check in BrainzPlayer and log/play only compatible formats.
- Prefer
5. Monitoring for Updates to Internet Archive Collection
- Issue: The IA collection is dynamic. New recordings are added and sometimes updated.
- Support Needed:
- Advice on whether a full reindex strategy is acceptable or if ListenBrainz has a preferred method for incremental updates.
- Possible Solutions:
- Add a lightweight scheduler (e.g., cron) to re-fetch recently modified items.
- Use the IA API’s metadata change logs to limit reprocessing.
6. Security or Abuse of the API Endpoint
- Issue: Search endpoint may be spammed or queried heavily, impacting Redis performance.
- Support Needed:
- Direction on API rate limiting—whether existing ListenBrainz middleware supports this.
- Possible Solutions:
- Add simple caching or throttling logic to
1/api/internet_archive/search
. - Implement
max_limit
for API call.
- Add simple caching or throttling logic to
BrainzPlayer is a custom React component in ListenBrainz used to search and play tracks from multiple sources such as YouTube and Spotify. It utilizes various data sources to facilitate track playback.
This project aims to extend BrainzPlayer by integrating support for audio files from the Internet Archive (IA). This integration will enable ListenBrainz (LB) users to discover and play songs that might not be available on other platforms.