MetaBrainz Summer of Code Application: Implement a Daemon for Correcting Out-of-Sync Cover Art and Event Art Metadata on Archive.org
Personal Information
- Name: Nada Mohamed
- Email: nadamo.cs@gmail.com
- GitHub: https://github.com/noah-mclain
- LinkedIn: www.linkedin.com/in/nada-mohamed-300305ma
- IRC/Matrix Nickname: nada_mo
Synopsis
This project aims to develop a daemon that detects and corrects out-of-sync cover art and event art metadata on archive.org. It extends the functionality of the newly deployed artwork-indexer service by monitoring MusicBrainz entities with associated images, identifying metadata inconsistencies, and automatically resolving them.
Benefits to the Community
Enhancing metadata integrity is crucial for maintaining MusicBrainz’s reliability. This daemon will address the following issues:
- Outdated or incorrect entity metadata (titles, artists, dates, etc.)
- Inconsistent image metadata (types, comments, thumbnails, etc.)
- Missing images (present on archive.org but absent in index.json or MusicBrainz)
- Incorrectly listed images (present in index.json or MusicBrainz but missing from archive.org)
- Malformed JSON (encoding issues, incorrect data types)
By integrating this daemon with the artwork-indexer service, the project automates metadata verification and correction, building on initial efforts by bitmap and leveraging insights from Jira issue IMG-129.
Deliverables
- A daemon that continuously monitors and corrects metadata inconsistencies.
- Seamless integration with the artwork-indexer service for automated corrections.
- A prioritized task queue system for metadata fixes.
- Comprehensive project architecture and workflow documentation.
- Active engagement with the MetaBrainz community for feedback and validation.
Technical Details
Technologies & Tools
- Programming Languages: Python, SQL
- Database Management: PostgreSQL
- Infrastructure: Docker (if required), Git
Proposed Mentor
- bitmap
Reviewers
- reosarevok, yvanzo
Discussion Platform
- MetaBrainz Community Forum
Time Commitment
- Project Duration: 175 hours (Medium)
- Availability: 30-40 hours per week throughout the coding period
Timeline
Phase | Duration | Tasks |
---|---|---|
Community Bonding | 2 weeks | Engage with MetaBrainz community, refine project plan, and set up environment |
Phase 1 (Weeks 1-4) | 4 weeks | Design and implement core daemon functionality |
Phase 2 (Weeks 5-8) | 4 weeks | Integrate daemon with artwork-indexer and implement metadata corrections |
Phase 3 (Weeks 9-12) | 4 weeks | Conduct testing, debugging, and refine functionality based on feedback |
Final Evaluation | 2 weeks | Complete documentation, submit final report, and address any remaining issues |
System Architecture
Below is a high-level architecture of how the daemon interacts with MusicBrainz and archive.org.
flowchart TD
MB[MusicBrainz Database] -->|Fetch metadata| DAEMON[Metadata Correction Daemon]
DAEMON -->|Verify metadata| ARTIDX[Artwork-Indexer]
ARTIDX -->|Retrieve images| ARCHIVE[Archive.org]
ARCHIVE -->|Send images| ARTIDX
ARTIDX -->|Update metadata| MB
DAEMON -->|Correct inconsistencies| MB
Workflow Diagram
sequenceDiagram
participant MB as MusicBrainz
participant DAEMON as Correction Daemon
participant ARTIDX as Artwork-Indexer
participant ARCHIVE as Archive.org
DAEMON->>MB: Fetch metadata records
DAEMON->>ARTIDX: Validate image metadata
ARTIDX->>ARCHIVE: Retrieve images
ARCHIVE-->>ARTIDX: Send images
ARTIDX-->>DAEMON: Send validation report
DAEMON->>MB: Correct metadata inconsistencies
Skills & Experience
Programming Background
- Languages: Python, Java, C/C++, JavaScript
- Development Experience: AI/ML projects, database management, music applications
- Notable Projects:
- Music Player Application: Built in Python with MySQL integration
- AI-Based Facial Detection & Recognition
- Trivia Game: Developed using jQuery and JavaScript
- Study App: Created in Flutter/Dart with AI-powered task organization
Open-Source Contributions
While I have not officially contributed to open-source repositories, my GitHub showcases multiple projects demonstrating experience in AI/ML, database management, and application development.
Music Preferences & Community Engagement
- Favorite Genres: Alternative rock, indie, electronic
- Artists I Enjoy: Hozier, Florence + The Machine, Imagine Dragons
- Interest in MusicBrainz: Passion for metadata organization and automation
- Previous Usage: Familiar with MusicBrainz Picard’s role in metadata management, though I have not used it extensively.
System Setup
- Device: MacBook Pro 14-inch (2023, M3 Pro, 18GB RAM)
- Development Tools: Python, SQL, Git, Docker (if required)
Commitment
I understand the expectations of GSoC and am fully committed to dedicating my time and effort to successfully completing this project.
Thank you for your time and consideration. I look forward to contributing to MetaBrainz!