GSOC 2025: Correcting Out-of-Sync Cover Art and Event Art Metadata on Archive.org Application

MetaBrainz Summer of Code Application: Implement a Daemon for Correcting Out-of-Sync Cover Art and Event Art Metadata on Archive.org

Personal Information


Synopsis

This project aims to develop a daemon that detects and corrects out-of-sync cover art and event art metadata on archive.org. It extends the functionality of the newly deployed artwork-indexer service by monitoring MusicBrainz entities with associated images, identifying metadata inconsistencies, and automatically resolving them.


Benefits to the Community

Enhancing metadata integrity is crucial for maintaining MusicBrainz’s reliability. This daemon will address the following issues:

  • Outdated or incorrect entity metadata (titles, artists, dates, etc.)
  • Inconsistent image metadata (types, comments, thumbnails, etc.)
  • Missing images (present on archive.org but absent in index.json or MusicBrainz)
  • Incorrectly listed images (present in index.json or MusicBrainz but missing from archive.org)
  • Malformed JSON (encoding issues, incorrect data types)

By integrating this daemon with the artwork-indexer service, the project automates metadata verification and correction, building on initial efforts by bitmap and leveraging insights from Jira issue IMG-129.


Deliverables

  • A daemon that continuously monitors and corrects metadata inconsistencies.
  • Seamless integration with the artwork-indexer service for automated corrections.
  • A prioritized task queue system for metadata fixes.
  • Comprehensive project architecture and workflow documentation.
  • Active engagement with the MetaBrainz community for feedback and validation.

Technical Details

Technologies & Tools

  • Programming Languages: Python, SQL
  • Database Management: PostgreSQL
  • Infrastructure: Docker (if required), Git

Proposed Mentor

  • bitmap

Reviewers

  • reosarevok, yvanzo

Discussion Platform

  • MetaBrainz Community Forum

Time Commitment

  • Project Duration: 175 hours (Medium)
  • Availability: 30-40 hours per week throughout the coding period

Timeline

Phase Duration Tasks
Community Bonding 2 weeks Engage with MetaBrainz community, refine project plan, and set up environment
Phase 1 (Weeks 1-4) 4 weeks Design and implement core daemon functionality
Phase 2 (Weeks 5-8) 4 weeks Integrate daemon with artwork-indexer and implement metadata corrections
Phase 3 (Weeks 9-12) 4 weeks Conduct testing, debugging, and refine functionality based on feedback
Final Evaluation 2 weeks Complete documentation, submit final report, and address any remaining issues

System Architecture

Below is a high-level architecture of how the daemon interacts with MusicBrainz and archive.org.

flowchart TD
    MB[MusicBrainz Database] -->|Fetch metadata| DAEMON[Metadata Correction Daemon]
    DAEMON -->|Verify metadata| ARTIDX[Artwork-Indexer]
    ARTIDX -->|Retrieve images| ARCHIVE[Archive.org]
    ARCHIVE -->|Send images| ARTIDX
    ARTIDX -->|Update metadata| MB
    DAEMON -->|Correct inconsistencies| MB

Workflow Diagram

sequenceDiagram
    participant MB as MusicBrainz
    participant DAEMON as Correction Daemon
    participant ARTIDX as Artwork-Indexer
    participant ARCHIVE as Archive.org
    
    DAEMON->>MB: Fetch metadata records
    DAEMON->>ARTIDX: Validate image metadata
    ARTIDX->>ARCHIVE: Retrieve images
    ARCHIVE-->>ARTIDX: Send images
    ARTIDX-->>DAEMON: Send validation report
    DAEMON->>MB: Correct metadata inconsistencies

Skills & Experience

Programming Background

  • Languages: Python, Java, C/C++, JavaScript
  • Development Experience: AI/ML projects, database management, music applications
  • Notable Projects:
    • Music Player Application: Built in Python with MySQL integration
    • AI-Based Facial Detection & Recognition
    • Trivia Game: Developed using jQuery and JavaScript
    • Study App: Created in Flutter/Dart with AI-powered task organization

Open-Source Contributions

While I have not officially contributed to open-source repositories, my GitHub showcases multiple projects demonstrating experience in AI/ML, database management, and application development.


Music Preferences & Community Engagement

  • Favorite Genres: Alternative rock, indie, electronic
  • Artists I Enjoy: Hozier, Florence + The Machine, Imagine Dragons
  • Interest in MusicBrainz: Passion for metadata organization and automation
  • Previous Usage: Familiar with MusicBrainz Picard’s role in metadata management, though I have not used it extensively.

System Setup

  • Device: MacBook Pro 14-inch (2023, M3 Pro, 18GB RAM)
  • Development Tools: Python, SQL, Git, Docker (if required)

Commitment

I understand the expectations of GSoC and am fully committed to dedicating my time and effort to successfully completing this project.


Thank you for your time and consideration. I look forward to contributing to MetaBrainz!

Hello!

Just wanted to link to some renderings of the Mermaid charts that you wrote out to help potential mentors and reviewers evaluate your proposal.

System Architecture
Workflow Diagram

Hi @nada_mo,

Thanks for your proposal. It’s very light on technical details, however. It’d be nice if you could follow Development/Summer of Code/Getting started - MusicBrainz Wiki and introduce yourself on Matrix/IRC (sorry if I missed it).

I’d also like prospective students to submit at least one pull request to the artwork-indexer before making a proposal. I previously suggested https://tickets.metabrainz.org/browse/IMG-158 as a good first task.