GSoC 2025: Proposal for New Calibre plugin

Introduction

Contact Information

Name: Ankit Matth

IRC nick/Matrix handle: Ankit_Matth

Email: ankitmatth101@gmail.com

Time zone: Indian Standard Time (GMT +5:30)

GitHub: Ankit-Matth

LinkedIn: ankit-matth

About Me

I am a sophomore at BRCM College of Engineering & Technology, Bahal (BRCM-CET), pursuing a B.Tech in Computer Science and Engineering. I am a Full Stack Developer with a passion for creating scalable and efficient web applications and have extensive experience with the MERN stack using TypeScript, Testing (Cypress, Jest), MySQL, and code formatting tools like ESLint, Prettier, and Husky. Additionally, I have worked on deployment using Vercel and AWS (EC2). My primary programming language is C++, but I also have a strong understanding of Python, as it is part of my college curriculum. I have participated in a hackathon, HackUniv where my team I secured 1st position, and I successfully completed Hacktoberfest 2023, making contributions to DocsGPT, for which I received a cool T-shirt as recognition.

More importantly, I am genuinely passionate about the New Calibre plugin project because it directly enhances e-book management and metadata organization, making it easier for Calibre’s users to efficiently manage their digital libraries. BookBrainz is one of the best tools for providing accurate and enriched data about editions, e-books, and more. Contributing to this project and improving how users interact with their digital collections is an exciting opportunity for me.

Additionally, this summer, I am fully committed to working on this project, as I have no other obligations—no job or university coursework. With ample free time, I am completely focused on contributing to GSoC and making a meaningful impact.

You can check out more about me here: https://ankit-matth-portfolio.vercel.app/.

Project Overview : New Calibre Plugin

Problem statement –

As stated on the ideas page, Calibre, an established open-source e-book library manager, lacks a functional BookBrainz plugin. Although a plugin previously existed, known as CaliBBre, it was abandoned several years ago. As a result, its codebase, now 8–9 years old, is outdated and incompatible with the latest versions of Calibre and modern BookBrainz.

To address this, a new installable Calibre plugin needs to be developed with essential features, such as searching for editions by name and author and improving e-book metadata. Additionally, implementing synchronization of collections between Calibre and BookBrainz is a desirable feature, though optional.

Project goals (Deliverables)

  1. Revival of CaliBBre Plugin: This project aims to revive the old plugin by rewriting it from scratch, considering the outdated 8–9-year-old code. While implementing it alongside the CaliBBre, it must support modern versions of Calibre.

  2. Interactive User Interface: Develop a user-friendly interface that enables users to search for editions by name and author, select and browse through search results, improve the metadata of e-book files, and sync collections between Calibre and BookBrainz.

  3. Search Functionality: Implement search functionality, enabling users to search for editions by name and author directly within Calibre. This will be achieved by integrating the BookBrainz web service (API), which acts as an intermediary between the plugin and the BookBrainz database, serving as the plugin’s data provider.

  4. Metadata Enhancement: Integrate features to improve metadata for e-book files, leveraging Calibre’s functionality to enhance metadata accuracy and completeness.

  5. Synchronizing Collections: Develop a feature that synchronizes collections between Calibre and BookBrainz, specifically syncing BB collections (i.e., personal user collections) with their Calibre library.

  6. Documentation and Support: Provide comprehensive documentation and support resources, including a user guide (compulsory) and a blog (optional, if needed), to help users install, configure, and effectively utilize the plugin.

  7. Publishing the Plugin: Prepare the plugin for distribution by packaging it appropriately in a ZIP file and uploading it to a GitHub repository for public access and installation. Then, submit it in a new thread on the Calibre Plugins forum via MobileRead or publish it on other relevant platforms based on the mentor’s feedback.

A brief walkthrough of the solution

After a deep dive into the Calibre plugin’s API documentation and plugin development guide, and analyzing the style and structure of the old plugin, CaliBBre, and considering strong feedback from my mentor, I decided to refine my approach. Following mentor’s suggestion, I chose to utilize the current web service at https://api.test.bookbrainz.org/1/docs/ to implement the new plugin. I configured the plugin in such a way that the API endpoint can be easily managed using an environment variable or a setting—specifically by declaring API_URL in the plugin’s config.py file. This API_URL points to api.test.bookbrainz.org by default but can be easily switched to api.bookbrainz.org or any internal search endpoint as needed. In essence, the entire plugin is designed so that updating a single line—the API_URL—redirects all plugin requests to the desired endpoint.

Here’s a step-by-step breakdown of my implementation plan:

1. Finalizing the UI/UX

UI/UX is just as important as other aspects of the project. Before starting the coding work, a sufficient amount of time should be dedicated to designing the user interface. Several tools can be used for UI design and prototyping, such as:

  • Figma (Recommended)
  • QtDesigner
  • Mockflow, etc.

Below, I have attached some demo screenshots illustrating how I plan to implement the plugin UI :

Home Page

Update Metadata Page

2. Code Implementation:

After finalizing the UI, I will complete the basic setup of the plugin. I plan to work on UI development first, followed by the BookBrainz integration. This is my implementation strategy described as:

a. Searching for Editions using the BookBrainz web service:

  • By selecting a book from the Calibre library, users can click the plugin button in the UI interface of Calibre. If no book is selected or multiple books are selected, an error is shown. Otherwise, the plugin initiates a search for that e-book by default using the book’s title.
  • Furthermore, users can refine their search by entering the book’s name, author, or BBID. After this UI action, the plugin constructs the URL for the search query using the user’s input and predefined search parameters. It then sends a search request to the BookBrainz web service API using an HTTP client in Python (such as httpx or requests) to retrieve the response. For example:

b. Displaying the search Results:

  • The BB API returns a JSON response for the search request. Since multiple editions might match a search query, users can browse the results and select the most relevant edition. These edition details are displayed in Calibre using UI components such as QListView, QTableWidget provided by PyQt5, or any other suitable UI element that best fits the workflow.

  • For example, a bare-minimum demo is shown below:


    Output of above code :

c. Fetching the metadata of specific edition:

  • Once the user selects an appropriately matched edition by clicking the button in the UI, the plugin sends multiple lookup requests to the BB API using the BBID (extracted from the previous response), such as:
    • GET /edition/{bbid} (Lookup edition by BBID)
    • GET /edition/{bbid}/identifiers (Get a list of identifiers for an edition by BBID)
    • GET /author?edition={BBID} (Get a list of authors related to the edition)
  • Note: It would be more helpful for metadata retrieval if the GET /edition/{bbid} request will updated to include additional details in the response, such as author name, release date, publishers etc..

  • These API requests retrieve metadata related to the selected edition, which is then properly extracted and stored locally(let say inside a variable fetched_metadata_from_BB). After fetching the metadata, the plugin triggers the Update Metadata dialog box, where the retrieved metadata is displayed alongside the current metadata of that edition using QTableWidget, similar to the example given above. For example, fetch_metadata can be implemented as :

d. Updating Metadata:

  • Users can update the metadata by clicking the “Apply Changes” button in the UI, which triggers the apply_changes() function. This function utilizes Calibre’s metadata plugins (calibre.customize), which offer two main classes: MetadataReaderPlugin and MetadataWriterPlugin for updating book details. For example, set_metadata(mi, type), provided by the MetadataWriterPlugin, is used to modify metadata in both the metadata database (metadata.db) and the EPUB format metadata. Demo code -

e. Synchronizing Collections Between Calibre and BookBrainz:

  • I also plan to implement an optional feature - syncing collections between Calibre and BookBrainz. Based on Monkey’s feedback, I will implement it by syncing BB collections (i.e., personal user collections) with their Calibre library. For public collections users can search for their collection using the collection name or editor/owner name. To extract information about them, the plugin makes requests using URLs like: https://api.test.bookbrainz.org/1/search?q=Monkey&type=editor&size=20&from=0 or https://api.test.bookbrainz.org/1/search?q=Monkey&type=collection&size=20&from=0

  • These requests return a JSON response containing matched collections. Users can browse multiple matches and select a specific collection to sync. A demo UI is shown below:


    Once a user clicks the sync button for a specific collection, the plugin fetches all details about that collection. I plan to support adding a collection to Calibre in two ways, as shown in the UI demo:

    When “Sync” is clicked on a collection:
    First, check if the collection already exists in Calibre by querying Calibre’s database (metadata.db). If it doesn’t exist, create a new collection (Virtual Library or Custom Column) inside Calibre. If it does exist, compare and sync books, compare the books in it with the BookBrainz collection and add any missing books. calibredb add or Calibre’s API will be used to add books to the newly created collection. The plugin ensure that books are tagged/grouped under the synced collection name or virtual library properly. A small demo code snippet for the calibredb add or Calibre’s API is given as -

    collection_id = calibre_api.create_collection("New Collection")
    calibre_api.add_book_to_collection(collection_id, book)
    
    Further, which can be implemented as -
    
    calibredb add --with-library "/path/to/Calibre Library" --tags "New Collection" /path/to/book.epub
    
  • I also plan to implement an Optimized Auto-Sync Strategy, as shown in the settings dialog below:

  • The syncing mentioned here mainly focuses on metadata synchronization and further, importing books using identifiers like ISBN . For example, if users have added a book to their collection on BB, then, when they next open Calibre, the syncing process will not fetch the actual ebook file, but it will create a metadata entry for that book in their Calibre library — including fields like title, author, edition, etc., which are fetched from BB. Even if the book file isn’t present , this effectively builds a virtual library view, or a wishlist-like setup inside Calibre. It helps users to plan what books they want to read or acquire later, and when they eventually get the ebook file, they can simply match it to these existing metadata entries.

  • I call it optimized because it is an Event-Based Auto-Sync—
    It triggers Auto-Sync on Calibre Startup. If new books are detected in collections or metadata updates occur, it starts syncing. If nothing has changed, it intelligently skips syncing to improve efficiency. To detect these changes, I will leverage BB’s revision system instead of relying on “last modified” timestamps (as per @mr_monkey’s feedback). This system offers a more reliable and helpful way to track changes in entities (books, editions, collections, etc.).

  • Note: For this to work, the BookBrainz API needs to be updated to expose revision IDs for relevant entities. This could potentially be achieved via an endpoint like /<entity_type>/<entity_bbid>/revisions. Once available, the plugin will use these revision IDs to determine if any entities have been updated since the last sync.

  • Authentication Considerations for Accessing Private Collections

    Currently, BookBrainz (BB) does not provide explicit OAuth authentication support like MusicBrainz. To access private collections, any of the following approaches may need to be implemented:

    • OAuth Bearer Token Authorization – If BB supports authorization via an Authorization: Bearer header, this can be used to access private collections after user authentication.

    • Session-Based Authentication – If BB relies on session cookies, the plugin could persist login sessions or retrieve authentication cookies to make authenticated API requests accordingly.

    • Enhanced BB API Support – Currently, BB’s API lacks a dedicated endpoint to fetch private (or even public) collections. A GET /collection/{BBID} endpoint could be added for direct collection retrieval. Alternatively, the GET /search?type=collection endpoint could be enhanced to return full collection details—including private collections—when authenticated.

    These authentication mechanisms are crucial for seamless syncing of private collections between Calibre and BookBrainz. A detailed discussion and feedback session with the mentor is required to finalize the implementation approach—specifically for handling private collections.

3. Debugging and Testing

Software development inevitably involves bugs, and handling them efficiently is crucial for a smooth development process. I will adopt a structured debugging and testing approach to ensure the plugin’s reliability and functionality. During development irself, I will wrap critical function calls in try-except blocks to ensure that errors are logged and handled properly. Additionally, I will dedicate specific time to thoroughly check and debug the entire plugin code to catch any subtle or overlooked issues.

  • Debugging with calibre-debug -
    Calibre provides a built-in command-line debugging tool called calibre-debug, which allows running and testing plugins interactively. I will use it extensively during development to test individual components of the plugin efficiently. Some useful commands includes calibre-debug -g or calibre-debug -s etc.

  • Automated Testing
    To validate core functionalities, I will write unit tests using pytest. For mocking external API calls, I will use unittest.mock. Alternatively, libraries like responses or requests-mock can be used for declarative HTTP request mocking, but I prefer manual mocking as it’s straightforward and often more flexible for unit tests. These tests will help ensure that all critical features work as intended. I plan to structure all test cases in a dedicated test.py file to thoroughly test the plugin. For example, a test case for the metadata retrieval function(given above in fetching metadata section) might look like this:

    Output of above test case :

4. Documentation and Support

Like testing, documentation is a critical part of any software product, and good documentation is essential for both developers and users. I will provide the following:

  • Technical Documentation: I will create developer-focused documentation explaining the internal structure of the plugin, including its modules, functions, and their interdependencies. It will also cover how metadata is retrieved and processed from BookBrainz, and how the plugin interacts with Calibre’s internal API. Additionally, I will create a detailed and useful README.md file for the plugin, including setup instructions, a user guide, and other relevant documentation. I also plan to add meaningful comments within the code, especially for important or complex functions. For example, a function like fetch_metadata(book_id) will have clear comments:

  • User Guide: A simple step-by-step guide will be created for end users, covering:
    Installation: How to install the plugin via Calibre’s plugin manager. Usage: How to retrieve book metadata, update book details, and sync collections.

  • Additional Resources: If needed, I will write a blog post or record a demo video showcasing the plugin’s features and explaining how users can integrate it into their workflow.

5. Packaging & Distribution

To ensure smooth distribution and installation, I will follow Calibre’s standard packaging guidelines.

  • Packaging the Plugin: Calibre plugins are packaged as .zip files containing essential files such as main.py, ui.py, about.txt, and __init__.py etc. I will ensure the correct folder structure is followed to allow easy installation via Calibre’s plugin manager.

  • Public Distribution: Once finalized, I will submit the plugin for public distribution through Calibre’s Plugin Repository on the MobileRead Forums and also on GitHub to encourage open-source collaboration and further improvements. If required, I will additionally provide versioned releases using GitHub’s release system to ensure users can access stable and trackable updates.

Extras

Apart from that, if time permits and my mentor agrees, I would love to work on additional features that I plan to implement :

1. Bulk Metadata Verification
Instead of checking one edition at a time, allow users to verify metadata for multiple books in their Calibre library at once.
Generate a report highlighting books with complete metadata and those missing details in BookBrainz.

2. Contribution Mode: Add Books to BookBrainz
If a book does not exist in BookBrainz, allow users to submit the missing edition directly from Calibre. Provide a pre-filled form with Calibre metadata, making it easier for users to contribute new books to the database.

Post-GSoC Plan

Even after GSoC ends, I want to continue contributing to BookBrainz. If for some reason I’m unable to complete the extra features during GSoC, they will be my first priority afterward. Apart from that, there are a few areas I’d like to improve:

  1. Responsiveness – Ensure a better UI across various screen sizes.

  2. UI/UX Enhancements & Bug Fixes – Improve the overall design and user experience while addressing minor bugs.

  3. Solr Search Server – Since search is a major pain point, I plan to contribute to improving it. Although I’m not very familiar with Solr infrastructure at the moment, I’m eager to learn and work on it in the future.

Additionally, I’ll stay engaged in discussions and contribute to future updates of the New Calibre Plugin. MetaBrainz has a fantastic open-source community, and I’d love to remain involved in the long run.

Timeline

Phase 1: Community Bonding Period & Initial Development

Week Technical Details
May 8 - June 1 - Complete all tasks related to community bonding period.
- Finalize detailed UI design with mentors and gather feedback.
- Create missing or updated Figma designs based on suggestions.
- Complete all onboarding and setup tasks (repository, issue/PR templates, contribution guide).
- Interact regularly with mentors to align on expectations.
June 2 - June 8 - Initialize plugin development - set up folder structure
- Initial setup of the plugin that is necessary configurations
- Build basic UI components - layout, Search (Home) page etc
June 9 - June 15 - Implement Edition Search functionality: input handling, search by title/author/BBID.
- Set up API calls to BookBrainz API and handle various responses.
- Begin writing internal documentation for project setup and API integration decisions.
- Display results in a user-friendly list.
June 16 - June 22 - Add unit tests for search logic.
- Implement metadata extraction for selected edition: fetch and parse relevant data (title, authors, identifiers, etc).
- Create UI component and display metadata in it.
- Write integration tests for fetching and displaying metadata.
June 23 - June 29 - Add in-code comments and update the documentation.
- Implement metadata update functionality - allow users to save changes.
- Write unit and functional tests for metadata update functionality
June 30 - July 6 - Test complete flow: search → select edition → fetch metadata → save.
- Create user guide with detailed instructions
- Write test cases for all remaining implemented functionalities
July 7 - July 13 - Complete documentation of entire workflow.
- Create and implement dialogue boxes for UI components like About Us or User Guide etc.
- Write test cases for these functionalities - Add in-code comments etc.

Midterm Evaluation (July 14 - July 18)

Focus on submitting the midterm evaluation, bug fixes, and feedback from mentors.

Phase 2: Final Touches to the Plugin & Work on Extras

Week Technical Details
July 19 - July 25 - Fix bugs or incomplete tasks(if any) from Phase 1.
- Implement collection search and result display.
- Write accompanying tests and update internal documentation.
July 26 - August 1 - Complete collection sync implementation: fetch and map local and remote collections.
- Implement UI for sync status display
- Setup logic to handle private/public collections.
- Write tests for sync scenarios and implement proper error handling.
August 2 - August 8 - Implement Auto-Sync Strategy
- Setup settings configuration for the plugin with the UI
- Continue writing and updating technical documentation.
- Write test cases for sync functionality
- Create a detailed user guide, README.md or blog as per requirements.
August 9 - August 15 - Work on extra features.
— Bulk Metadata Verification.
— Contribution Mode: Add Books to BookBrainz.
- Write tests for the new features
- Update feature-specific documentation for extra features.
August 16 - August 22 - Conduct full plugin testing (manual + automated).
- Final adjustments needed to complete development.
- Packaging & distribution: - Compressed ZIP file setup - MobileRead Forum submission.
- Update final user documentation and README.
- Prepare short demo video or screenshots (optional).

Final Evaluation - Final Week

Week Technical Details
August 23 - September 1 - Any remaining critical task - Prepare final presentation and GSoC submission. - Final evaluation. - Future guidance & discussion. - Prepare & submit final GSoC evaluation.

Why me?

Over the past few weeks, I have spent a significant amount of time inspecting the BookBrainz codebase and working on resolving issues. I took a deep dive into the Calibre plugin’s API documentation and plugin development documentation. I also explored the codebase of an old BookBrainz plugin, namely CaliBBre.

I have been actively contributing to BookBrainz since joining the community and have gained a solid understanding of both the frontend and backend codebases. My contributions include multiple UI improvements, bug fixes, and code refactoring. Some of my key PRs are:

  • Converted Promises to async/await in component pages
  • Implemented a new footer design
  • Displayed the full BookBrainz logo on the Home Page
  • Refactored Promises syntax to async/await in form components

More details can be found in my PRs.

In conclusion, this project aligns well with my skill set and interests, and I am extremely excited to join GSoC 2025 while contributing to MetaBrainz as this opportunity adds great value to my portfolio. I assure you that I am a quick, self-motivated learner. While I am a student and a beginner in open source, not an expert, I am eager to embark on this exciting journey with the MetaBrainz community.

Community affinities

What type of music do you listen to? (please list a series of MBIDs as examples)

I mainly listen to Haryanivi, Bollywood and Punjabi music. Some of my favourite songs (recording MBIDs) are :

Same Beef - acc70490-89ff-43d3-8a43-1056f915e3e3
Ok Report - 36f9026f-a9d5-473a-8da0-a4a370978301
White Brown Black - 52293877-3768-4708-bafe-ff6da8fe8609
So High - 5993c11e-df08-487b-95c3-2ea3c8321840
Tum Hi Ho - 3e5bc764-dfa4-454d-a2a9-0ee84ae35db2
Tere Mitti - defbc3b7-4a8d-43e7-a016-a3aa472f3868
Billionaire - 8b6cec9e-bae7-4d3e-a421-784df1a72a6b

My favourite artists are (artist MBIDs):

Arijit Singh - ed3f4831-e3e0-4dc0-9381-f5649e9df221
Sidhu Moose Wala - 119a6864-622b-4e6c-8aab-a422080530c6
Yo Yo Honey Singh - 0dc9c4bc-8bcc-42f1-9033-bec41160377f
Karan Aujila - 4a779683-5404-4b90-a0d7-242495158265

What type of books do you read? (Please list a series of BBIDs)

Books that I read are -

Rich dad poor dad – 3a178573-cec4-4987-8c4e-683d90f8c20f
Think and Grow Rich - 83fd6313-bc4b-4d79-84b2-74890b4eb83c
Atomic Habits - 4ea00ddf-a51d-4e80-96e5-9f723cf87c93
Gitanjali - 6103f0e1-5cc1-4d79-8d89-02ba91668215
Bhagavat Gita - 6cdb82a8-1e4e-4ae8-9fb8-039f77a8cba2

What aspects of MusicBrainz/ListenBrainz/BookBrainz/Picard interest you the most?

I am genuinely impressed by MusicBrainz for its vast database, as it includes almost every song I searched for, even niche tracks. I was particularly surprised to find Haryanvi and other local songs despite their relatively small audience. I also like ListenBrainz, especially BrainzPlayer, as it is very convenient. The way it efficiently tracks my listening habits and allows easy export for analysis is something I appreciate. I was also surprised by the BookBrainz database, as it contains detailed information on book editions, including authors, ISBN identifiers, publishers, and more for most books.

Have you ever used MusicBrainz Picard to tag your files or used any of our projects in the past?

No, I haven’t tried MusicBrainz Picard yet, but I plan to try it soon. As mentioned above, I have used ListenBrainz for listening to and analyzing my music preferences. I have also contributed to BookBrainz by adding editions, works, and other details to explore the workflow and understand the entire process. Additionally, I sometimes explore information about a book, such as its author, publisher, and other details, by searching for it on BookBrainz.

Programming precedents

When did you first start programming?

I started programming in my first year of B.Tech, around early 2022, when C language was introduced in our curriculum. Over time, it became a hobby, and I gradually became more serious about coding.

Have you contributed to other open source projects? If so, which projects and can we see some of your code?

No, I haven’t participated in any major open-source programs before this. However, I made small contributions to DocsGPT during Hacktoberfest 2023. My PRs can be found here.

Currently, I have primarily contributed to BookBrainz in the MetaBrainz ecosystem. You can find all my PRs here.

What sorts of programming projects have you done on your own time?

I primarily focus on project-based learning and aim to become a Full Stack Developer. I chose the MERN stack as my main tech stack, but I also explored testing, deployment, and code formatting tools. Below are some of the projects I have built:

  • Multiplayer Tic-Tac-Toe | React.js, Socket io, Cypress

  • Maze Escape | p5.js, HTML, CSS, JavaScript, DSA

  • My Portfolio | MongoDB, Express, React.js, Node.js

  • Personal Doctor | Handlebars, CSS, JS, Node.js, Express, MongoDB

  • Scientific Calculator | HTML, CSS, JavaScript

  • Advanced To-Do List | HTML, CSS, JavaScript

Note: All the projects are properly deployed. The live links and source codes of these can be found in the pinned repositories on my GitHub account. Feel free to check them out!

Practical requirements

What computer(s) do you have available for working on your SoC project?

I use Windows as my primary operating system and have set up my development environment using WSL (Windows Subsystem for Linux). My HP Pavilion laptop has the following specifications:

CPU: 11th Gen Intel® Core™ i3-1115G4 @ 3.00GHz
RAM: 8GB
Storage: 512GB SSD
System Type: 64-bit OS, x64-based processor
Operating System: Windows 10 Pro
GPU: Intel® UHD Graphics with HD display

How much time do you have available per week, and how would you plan to use it?

I am mostly free during my summer break and can dedicate around 30-40 hours, or even more, per week to the GSoC project. This summer, I am fully committed to working on this project, as I have no other obligations—no job or university coursework. With ample free time, I am entirely focused on contributing to GSoC and making a meaningful impact.

3 Likes

Hey @mr_monkey,

I’ve done my best to cover all aspects of the project, but there may still be some errors. I’d love to discuss and resolve them together.

This journey has been rewarding, and I truly appreciate your patience and support. Looking forward to your feedback! :blush:

I’m also working on a small prototype—soon I will share it…

Thanks for your proposal @Ankit_Matth !

I very much appreciate the disclaimer regarding the use of ChatGPT. I can see the fingerprints of it here and there in the formatting, style and formulation. I also see a few parts the just rephrase the older proposal; the proposal was not very good so that isn’t helping :slight_smile:
For reference, I would much rather read a proposal in your own voice than in that of chatgpt, grammar and typos be damned!
It gives me a better idea of who you are, and I’m not here to mentor an LLM :slight_smile:

With that out of the way, the first piece of feedback is that you should absolutely use the API!
Parsing the HTML output is an odd solution when an API exists (despite its alpha status).
If API changes are required in order to get the relevant metadata then that should be mentioned and worked on (potentially by myself).
[I will add that even without using the API, parsing the HTML would have been wrong as opposed to using the internal search endpoint those web pages use]

I think this requires you to rewrite part 2 of the proposal with that in mind, and I will go part by part below for further feedback:

Part 1: Mockups look all right.
Perhaps a settings page could be good addition, considering you mention a few configurable options.

Part 2: see above the main point

Part 3: Some more mockups here would be helpful. I don’t know what the QT UI components do or look like. Similarly, what would browsing the multiple options look like?

Part 4: This just looks copied from the other proposal. While you describe what the code will do, it would be much better to show an example of the code in question, or even pseudo-code.

Part 5: Similarly, some technical details here would be welcome. I do think the sync feature is useful, but it is not clear to me if you consider this part of your project goals or if it is supposed to be an extended goal (to be worked on if you finish the main part of the project before the end)

Part 6 to 8: These sections read like an LLM response. They don’t really give me much information about how you are going to do these things, just an overview of what the topic is (“running test scripts to simplify the debugging process”, what does that mean??). Please expand on these topics in your own voice. I don’t think bullet points are particularly adapted to answer these topics.

Timeline:I am a bit confused as to why your project fits into the first part of GSOC. If you are planning to work on the extended goals during the second part, then the size of the project is incorrect.
Usually that means you are underestimating the amount of time each part will take. It could also mean that the proposed project is too small for the amount of time, and some of those “extra features” would need to move from extended goals to regular goals.

Small things:

  • In “why me?” section, the two Calibre links are the same.
  • None of the listed personal projects mention python yet you claim “a strong understanding of Python”. Can you elaborate or show anything to that effect?
3 Likes

@mr_monkey Thanks for the detailed feedback! I see your points, and I’ll work on rewriting the proposal in my own words (without ChatGPT) and refining everything you mentioned. I’ll update you once I’ve made these changes. I really appreciate your time and guidance!

To be honest, I’m a complete beginner in open source, and I was a bit nervous about getting things right. That’s why I used ChatGPT and referenced other proposals for guidance. My intention was to create something valuable, but I now understand that writing in my own voice is more important. I truly appreciate your patience, and I’ll make sure to fix everything based on your feedback. Thanks again!

Regarding the points you mentioned:

Small things:

  • In the “Why me?” section, the two Calibre links are the same.
  • None of the listed personal projects mention Python, yet you claim “a strong understanding of Python.” Can you elaborate or show anything to that effect?

The duplicate Calibre link was a typo—I mistakenly placed the same link twice. The correct second link is: https://manual.calibre-ebook.com/creating_plugins.html.

As for Python, you’re absolutely right—I haven’t used it in personal projects yet. My understanding comes from my college coursework, where I’ve learned the fundamentals, but I haven’t built anything substantial with it. I initially included it because I’m comfortable with the language, but I realize now that without real-world projects, it’s not a strong claim. That said, I’m confident in my Python skills, and I’ll share some demo code with you soon to demonstrate my understanding.

Hey @mr_monkey, I understand that using the API is the correct approach rather than parsing HTML or internal search endpoints. However, I’m facing an issue while working with the test API (https://api.test.bookbrainz.org/1/docs/).

Many editions available on the official website (bookbrainz.org) are not found when queried via the test API. This raises concerns about whether I should rely on api.test.bookbrainz.org for development or if I should be sending requests elsewhere to extract the necessary metadata.

For example, When I send a lookup request for the Atomic Habits edition using its BBID, the API returns “No edition found,” even though the edition exists on the official website:

Also, when making a search request for the Query: "Atomic Habits" the API returns only three search results, but no edition details for Atomic Habits—whereas, on BookBrainz, the edition is available:

API response - 

{
  "resultCount": 3,
  "searchResult": [
    {
      "bbid": "aeac8782-a6e2-4de2-b6b8-cef13dbb3e34",
      "defaultAlias": {
        "language": "eng",
        "name": "The 7 Habits of Highly Effective People",
        "primary": true,
        "sortName": "7 Habits of Highly Effective People, The"
      },
      "entityType": "Work"
    },
    {
      "bbid": "bba3086b-d25c-49b5-8412-3fbf47ee5f40",
      "defaultAlias": {
        "language": "deu",
        "name": "Delinquent Habits - Quantensprung zur rechten Zeit",
        "primary": true,
        "sortName": "Delinquent Habits - Quantensprung zur rechten Zeit"
      },
      "entityType": "Work"
    },
    {
      "bbid": "113842c6-a18e-4ee4-b75f-ab64d2e001d6",
      "defaultAlias": {
        "language": "eng",
        "name": "The 7 Effective Habits of Teenagers",
        "primary": true,
        "sortName": "7 Effective Habits of Teenagers, The"
      },
      "entityType": "Work"
    }
  ],
  "totalCount": 3
}

But on BookBrainz it exist:

Could you clarify whether the test API has incomplete data (or operates on a separate database)? If so, should I proceed with it for development, or is there another recommended endpoint for extracting edition details?

Should I send requests to api.test.bookbrainz.org, or is there a more reliable alternative API?

Hello!

You said it at the end: the test API (and the test website) use a separate test database.

I can update it with the latest production dump to update it for now if it makes it easier for your testing, but know that it will slowly become out of date until next time I manually update it.

But at least for initial testing, you can find an edition that does exisdt in the test datbase, and use that as your test, perhaps?

For now, do point your requests to api.test.bookbrainz.org, which will eventually need to be changed to use api.bookbrainz.org
Maybe an environment variable or setting could be used to choose which API endpoint to use?

1 Like

Got it!

No need to update the test database with the latest production dump as it will eventually become outdated. I’ll find an edition that already exists in the test database and use that for testing. Regarding the API endpoint, using an environment variable or a setting to configure it (like declaring API_URL in the config.py of the plugin for configuration) sounds like a great approach.

Thanks!

Also, I have a bit of doubt while refining the proposal. Could you please clarify what the optional feature “Sync collections between Calibre and BookBrainz” refers to exactly?

Is it:

  1. Syncing the user’s “My Collection” from BookBrainz to their Calibre library – syncing missing books from Calibre’s local collection with the BookBrainz “My Collections” of that user?
  2. Metadata sync between Calibre and BookBrainz – ensuring metadata consistency if a book exists in both systems? For example, if the title, author, or edition details change in BookBrainz, would it update in Calibre?
  3. Syncing books from Calibre to BookBrainz – adding an e-book from Calibre to BookBrainz if it exist in Calibre local collection and missing in BookBrainz database?

Is it referring to syncing both ways, such as updating metadata from Calibre to BookBrainz if something is missing? I just need a bit more information on exactly what the feature is aiming for.

With that in mind, I’ve updated the homepage UI, including the settings page button, and also designed the settings page.

Updated Home Page:

Settings Page:

What you say about them?

I think the mockups with a separate settings page looks much better :+1:

As for the syncing, I believe the proposed idea referred to syncing BB collections (i.e. personal user collections like mine) with their Calibre library.

1 Like

Hey @mr_monkey,
I have updated my proposal as per your feedback. Kindly review it again and let me know if required any further updates. Looking forward to your feedback! :blush:

1 Like

Hey @mr_monkey, I also created a PDF for my proposal. You can check it out here: Proposal -Ankit Matth. Looking forward to your feedback!

Thank you for the updated proposal @Ankit_Matth !

First of all I want to say this new version of the proposal is a great improvement over the previous one!
Much more details and more mockups really help in visualizing the project and its complexities.
For example, I love that you pointed out we will need some form of authentication to access private collections.

I have a couple of questions/comments:

  1. How does sync work?
    I know this was a stated goal in the ideas page, but I am unclear as to what this feature does.
    As far as I understand it, Calibre only shows ebooks/files that you own or have downloaded. Considering BookBrainz does not offer any content, what exactly does it sync?
    Say I have added a book in my collection on BookBrainz, what will syncing exactly do when I next open Calibre?
    I can see an automatic metadata syncing, but I’m not sure what else it could do. Is there a “want to read” sort of list in Calibre for books you don’t yet have ? What does “Calibre’s API will be used to add books to the newly created collection” translate to?

  2. last modified timestamp
    It makes a lot of sense to need some way to keep track of changes. However timestamps is not the solution in this case. BookBrainz uses a revision system (see BookBrainz Schema — BookBrainz Developer Docs 0.1 documentation), which is not currently exposed in the API response.
    Should this feature be needed, we would need to expose that revision id through the API (maybe /<entity_type>/<entity_bbid>/revisions ?), instead of introducing a timestamp.

  3. I think the timeline could be improved. Some of the items are a bit too vague (“Implement all basic required features”) while others seem to duplicate, or at least are not clearly differentiable (“Sending requests to BB API to get editions for search query” one week, and the following week “full setup to fetch, extract, display in UI, update and save metadata”)
    Having an even more detailed plan of what you think each week will bring, and thinking about those implementation details, really helps plan accordingly and anticipate issues and bottlenecks.

  4. I would also recommend doing testing and documentation as you go, throughout the project. It is usually easier to follow along with your own implementations, and prevents from having to do it all at the end of the project: 1. it can get boring doing only testing and documentation when everything already works and 2. that way it’s not left as an afterthought and doesn’t get cut out if you end up running out of time.

2 Likes

Hi @mr_monkey ,
Thank you so much for the kind words and your valuable feedback! I’m really glad to hear that the refined proposal shows better clarity and depth — your earlier suggestions truly helped me improve it.

Regarding your questions and comments:

You’re absolutely right — Calibre primarily shows ebooks/files that you own or have downloaded. The sync feature mentioned here mainly focuses on metadata synchronization and further importing books using identifiers like ISBN. For example, if you’ve added a book to your collection on BB, then when you next open Calibre, the syncing process will not fetch the actual ebook file, but it will create a metadata entry for that book in your Calibre library — including fields like title, author, edition, etc., which are fetched from BB. Even if the book file isn’t present, this effectively builds a virtual library view, or a wishlist-like setup inside Calibre. It helps users plan what books they want to read or acquire later, and when they eventually get the ebook file, they can simply match it to these existing metadata entries.

The line “Calibre’s API will be used to add books to the newly created collection” means that the plugin (using the calibredb tool) will programmatically add books to specific collections. For example:

collection_id = calibre_api.create_collection("New Collection")
calibre_api.add_book_to_collection(collection_id, book)

Which further can be implemented as -

calibredb add --with-library "/path/to/Calibre Library" --tags "New Collection" /path/to/book.epub

Thank you for pointing this out — you’re absolutely correct. The revision system clearly has advantages over using a “last modified” timestamp. So the feature you suggested — exposing the revision ID through the API (e.g., /<entity_type>/<entity_bbid>/revisions) — makes perfect sense. I fully support this approach and ready to implement it accordingly.

That makes perfect sense. I will revisit the timeline to break down vague tasks into more actionable and specific sub-items, eliminate any overlaps, and clearly differentiate the goals for each week. I will also update it to incorporate parallel slots for testing and documentation as part of the development process, rather than treating them as afterthoughts. I’ll update the timeline with these improvements and submit the revised version soon.

Hey @mr_monkey ,
I have updated my proposal based on your feedback. Now, the timeline section is more detailed & structured, and I also made adjustments to other parts accordingly. Kindly review it again and let me know if any other changes are needed. Looking forward to your feedback! :blush:

Can you please point me to the documentation of calibre where this is done?
I cannot find a reference to collections.

The command below with the --with-library flag isn’t much clearer: calibredb — calibre 8.2.1 documentation

1 Like

Ahh, I see it now. As you know yesterday was the final day to submit the proposal for GSoC, so I was rushing to update the PDF for my revised proposal. In that hurry, I mistakenly assumed that “collections” might exist as a native concept in Calibre and ended up adding a boilerplate-style code snippet — not realizing it doesn’t actually exist in Calibre’s official API. Extremely sorry for the confusion caused…

As Calibre doesn’t have native support for “collections” as a distinct feature or API. To mimic collection-like behavior, one workaround comes in my mind is to use a custom column, which also appears in the Tag Browser UI. Here’s an example:

If adding a custom column:

calibredb add_custom_column --display"{"enum_values": ["True"]}" "#mycollection" "Collection" text
calibredb set_custom "#mycollection" <book_id> True

Where:
- --display "{"enum_values": ["True"]}" : Sets the display options so that only the value “True” is available.
- "#mycollection": The machine-friendly label (custom column lookup name). By convention, custom columns typically begin with a “#”.
- "Collection": The human-friendly name as it appears in the Calibre interface.
- text: The data type. In this case, it's treated like tags and appears in the tag browser.

With this setup, the collections can be managed easily:

Any book with the custom column set to “True” will appear under the “True” tag in the custom “Collection” column in the tag browser. Books with the column left unset (empty) will not appear under this tag. The “Collection” itself will be shown as a category(custom-tag) in the left-hand tag browser panel in Calibre.

What you think about it?

To be honest, your proposal very much appears to be written by a large language model, including confidently hallucinating code that doesn’t exist to satisfy your prompt (i.e. my questions pressing for technical details).
Also all these “You’re absolutely right — […]” and “Thank you for pointing this out — you’re absolutely correct.” don’t sound like your own style and are exactly what LLMs produce, casting more doubt as to the authenticity of your proposal.

You say “Extremely sorry for the confusion caused…” but this doesn’t seem to me like the “boilerplate code” you claim it to be, and instead of confusion I would call it deceptive, and lazy.
It’s apparent there was no research done initially on the collection syncing part of the proposal.
If you’re getting an LLM to generate your proposal for you the least you can do is to check that it is correct.

In all honesty this is very disappointing. I have no intention of mentoring ChatGPT over the summer and I don’t feel like I can trust your proposal or capacity to research and code the feature, after what I’ve seen.

3 Likes

Hey @mr_monkey ,

Firstly, a big sorry for the disappointment caused. For one of my first proposals, I used an LLM and copy-pasted several parts, including the HTML parsing solution. I had also mentioned this in the proposal itself. However, after your clarification, I have not used any LLM for anything related to the project again. I am fully aware of MetaBrainz’s terms and conditions—that if AI is used, it must be disclosed and the submitted solution must be thoroughly reviewed.

Regarding the hallucinated code that doesn’t exist—please believe me, that was not an AI-generated response. It was part of a plan I had to implement it as a separate module. Here’s the code I had in mind:

class calibre_api:
 
    def create_collection(name):
        # Internally runs using subprocess or something else:
        calibredb add_custom_column --display '{"enum_values": ["True"]}' "#mycollection" "Collection" text
    
        # Simulate the custom column name as collection ID
        return "#mycollection"
 
    def add_book_to_collection(collection_id, book_id):
        # Internally runs using subprocess or something else:
        calibredb set_custom "#mycollection" <book_id> True


And it would be used like this in another module:

collection_id = calibre_api.create_collection("New Collection")
calibre_api.add_book_to_collection(collection_id, book_id)

It might have looked out of place without the full context, but it was genuinely something I was thinking about and working on manually—was not an LLM output. I was considering this approach, but I wasn’t entirely confident it would work properly since calibredb is a command-line tool, while our plugin is interface-based that’s why I not provide exactly this snippet.

In the rush, as I mentioned before, I was super busy since it was the last day for GSoC proposal submissions – Call it laziness or a big blunder—whatever you prefer—but in the process of finalizing the proposal, I definitely cut corners. That was my fault, and I completely understand how it came across as deceptive and careless. It wasn’t my intent, but I take full responsibility for it. I mistakenly assumed that “collections” might exist as a native concept in Calibre and tried to simulate that behaviour.

I completely agree with you that there was no initial deep research done on the collection syncing part of the proposal. But that doesn’t mean I ignored it completely. I did start crafting a potential solution, created some mockups, and even explored parts of the technical implementation.

Regarding the “You’re absolutely right — […]” stuff — I was simply trying to be respectful and formal in our communication. I didn’t realize those phrases might come across as LLM-generated. They weren’t copied or generated — just my attempt to maintain politeness. That said, I completely understand how it could have raised doubts, and I’ll definitely be more mindful of this moving forward. Also, those kinds of phrases were only used in some of our communications, not in the actual proposal itself.

By the way, if you think I done everything using an LLM, please take a look before drawing any conclusions – here is what I have done so far:

  • I created detailed Figma designs: check here

  • Scraped and collected BBIDs of my favorite songs, artists, and books from BookBrainz.

  • Write demo script for testing and created demo table using PyQt5 for BB plugin.

  • Researched and crafted my own solutions for private collections, optimized strategy, and proposed ideas like adding collections as custom columns or virtual libraries.

  • Studied and tested the BB API and its current limitations — you can see screenshots and related explanations in my proposal and previous posts. I even asked for your feedback on this.

  • Set up the BB environment, submitted 4 PRs, and got hands-on experience with the codebase. I know the changes were small, but I genuinely tried my best to contribute meaningfully.

  • Joined all the communication channels of MetaBrainz and have attended two weekly meetings so far — I’m actively exploring about the community.

  • Just yesterday, I worked for around 4–5 hours continuously debugging the prototype — trying different ways to create custom columns in Calibre. The effort is visible through the deleted zip files and timestamps in the screenshot below -

  • Also, for a screen recording showing the prototype I’ve built so far or or what I am currently working on — please have a look here. and apart from it I done so many more things..

From my side also, I have no intention of relying on ChatGPT or any LLM over the summer to do all the work or just keep prompting it for everything. I genuinely want to work on this project myself and put in real effort. I’m not here to let a tool generate code or ideas for me — I’m here because I truly care about the project, GSoC and want to contribute meaningfully.

That said, it’s totally your call — you’re free to decide or think whatever you feel is right. I’m still completely open to making further refinements and more than willing to keep working on this.

In end, I am truly sorry for every mistake or misstep along the way. But just for your info — after doing so much hard work and putting so much time and effort, it really felt disheartening to be reduced to just a label: “LLM.” It honestly hurts a lot. :disappointed_face::disappointed_face::disappointed_face:

@mr_monkey , I don’t say all of above to “prove” anything — I just want to be transparent about the effort I have been putting in. I completely understand if it has shaken your confidence in me, but I hope you might still give me a chance to learn and improve…

@Ankit_Matth Sorry if I was too quick to dismiss your work as AI-generated.
If I was mistaken I apologize for this mischaracterization.

Overall I think your proposal had some merit if you did write everything yourself, but ultimately does not convince me that you can implement the project successfully with minimal mentor intervention.
To name just one example I don’t understand why you mention running a CLI command in a subprocess when you can access the calibre internals in your python file (from calibre.db.cli import ... calibre/src/calibre/db/cli/cmd_add_custom_column.py at master · kovidgoyal/calibre · GitHub).
Or how the get_metadata and set_metadata functions are imported from the wrong module.

Considering this type of internal access underpins the development of the entire plugin, a clear understanding of it would be required for me to feel comfortable mentoring this project, considering my knowledge of Calibre plugin development is basically nonexistent. This clear understanding was not shown despite requests for more details.

For your information, from my point of view this style of formatting with random bold highlighting and em-dashes everywhere reads like LLM-formatted text.

1 Like

@mr_monkey, Thank you so much for taking the time to explain everything. I now completely understand where I went wrong. To be honest, I am quite new to Calibre plugin development, but I genuinely excited to dive deeper and learn. Initially, I spent a lot of time designing detailed mockups for the plugin, and as I was using Figma for the first time, it took longer than I expected. Additionally, I had some college farewell events and tight academic deadlines, which left me with very limited time to explore Calibre’s documentation in depth.
Note: here, I am not trying to make excuses, I just wanted to provide a bit of context about what actually happened. I fully accept that it was my fault and, frankly, a bit of laziness on my part. I’m truly sorry for that, and I promise that I will never let it happen again.

Regarding calibre.db.cli, I couldn’t find any mention in the official documentation about using it as accessing Calibre internals. The documentation mainly focuses on calibredb as a command-line interface, which is why I thought of implementing it via subprocess. But now I agree that accessing it directly is definitely a better approach.

Regarding get_metadata and set_metadata, in my code, I used: self.db = self.gui.current_db.new_api …and then accessed metadata like: self.db.get_metadata(). Actually, I found this approach in the official documentation here, and it worked well for extracting and setting metadata. I think there might have been some confusion from my earlier screenshots, they were incomplete, and I didn’t clarify that self.db was set using self.gui.current_db.new_api.

Regarding the LLM-style formatting, please believe me I wrote that manually. I just tried to make the content easier to follow. I see now that it was inappropriate, and I’ll avoid doing that moving forward. From now on, I’ll keep my responses clean and simple, without special formatting.

All I ask is a little more time to explore Calibre plugin development properly. I’ll come back with complete research, a detailed understanding of the codebase, and a working prototype. I want to prove to you that I am the right fit for this, and that I can implement the project successfully with minimal mentor intervention, just give me little time…