GSOC 2022: CritiqueBrainz reviews for BookBrainz entities

Personal information

Name: Ansh Goyal
IRC nick: ansh
Email: anshg1214@gmail.com
GitHub: anshg1214
Time Zone: UTC+05:30

CritiqueBrainz reviews for BookBrainz entities

Project Overview

Book reviews are a glimpse into a world you may or may not choose to enter. Reviews give books greater visibility and a greater chance of getting found by more readers. BookBrainz and CritiqueBrainz should enable users to rate and write reviews directly through the website using the CritiqueBrainz API. BookBrainz currently lacks this feature. This project aims to extend the infrastructure of CritiqueBrainz to support reviews and ratings for BookBrainz entities.

My PRs: Check Out
My Commits: Check Out

Implementation:

I would first complete the project for the EditionGroup entity. After I complete the work for this functionality, I will begin expanding it for other entities.

Connect the BookBrainz database to CritiqueBrainz:

  1. Create a new Docker container for saving the BookBrainz database

    bookbrainz_db:
        image: postgres:12.3
        environment:
            POSTGRES_USER: critiquebrainz
            POSTGRES_PASSWORD: critiquebrainz
            POSTGRES_DB: bookbrainz
            PGDATA: /var/lib/postgresql/data/bbdata
        volumes:
            - ../data/bbdata:/var/lib/postgresql/data/bbdata:z
            - ./pg_custom:/docker-entrypoint-initdb.d/
        ports:
           - "127.0.0.1:35432:5432"
    
  2. Write scripts to download and import data dumps to PostgreSQL.

Add support for bookbrainz entity types Author, Work, Edition Group and Series.

  1. Modify the existing CB database and existing functions to support new entity types.
    For this, I would add BB entity types in create_types.

     CREATE TYPE entity_types AS ENUM (
         "release_group",
         "event",
         "place",
         "work",
         "artist",
         "label",
         "recording",
         "Author",
         "EditionGroup",
         "Publisher",
         "Series",
         "Work"
    );
    
  2. Write and modify existing functions to fetch data from BB DB to get the required information from existing views in the database:

    I will create functions for every entity_type to bulk lookup its information using SQL queries. It would take a list of dictionaries with entity_type and BBID of the entity and then return the related information.

  3. Create functions for fetching relations between various entity types:

    I will create functions that take in the BBID and return all the necessary details of entities with a relationship with it. As for Author, It would return the details of his works and series.

  4. For searching BB items, I would make the following changes:
    a. I would add support in the code to search for BB entities and then make UI changes to allow users to search.
    b. Then would add backed support to fetch the search results from the BB search endpoint and return the data in the required format.

    def search_entity(query, entity_type, limit, offset):
         baseURL = 'https://bookbrainz.org/search/search'
         params = {'q': query, 'type': entity_type, 'size': limit, 'from': offset}
         data = requests.get(baseURL, params=params)
         return data.json()
    
  5. Then will update the APIs and functions fetching reviews from the CB database.

  6. Update APIs to allow users to fetch, write and delete reviews for BookBrainz entities.

Display the reviews, ratings, and other identifiers for a BBID in CB.

  1. Add BB entities to the ​​reviews page by updating the front-end code.

  2. Add pages for all entity types which would show the following:
    a. Reviews and Ratings
    b. External links
    c. Discography => Edition for Edition Group, Work and Series for Author, Edition for Work

    Example: The author’s page would look like this.

  1. Then would update the page for writing a review and modifying the existing review to support BookBrainz entities.

Documentations:

  1. I will add the setup process of the BookBrainz database.
  2. Update API documentation to include submission of BB reviews.

Show CB reviews in BB via API without authentication:

  1. I would add functions to get reviews from CB associated with BBID.

    const getReviewsFromCB = async (entity_id, limit, offset) => {
        const url = `https://critiquebrainz.org/ws/1/review?entity_id=${entity_id}&limit=${limit}&offset=${offset}`;
        const response = await fetch(url, {
            method: "GET",
            headers: {
                "Content-Type": "application/json;charset=UTF-8",
            },
        });
        return response.json();
    };
    
  2. Then, I would show reviews and ratings on the entity pages.

Implement OAuth login from BB to CB:

  1. Connect to CB through OAuth2:
    a. I would add functions to get the authorised URL, fetch access token, and refresh access token from CB.
    b. Then I will write functions to fetch and save these tokens in the database.
    c. Then I would create API routes for CB callback, getting tokens, refreshing tokens, and connecting and disconnecting users, which the React front-end components would call.

  2. Add front-end components to support connecting to CB:
    a. I will first add UI components for connecting CB.
    b. Then I will add functions in react to call the API to support connecting to CB.

Allow users to post ratings and reviews:

  1. Add functions to post reviews associated with a BBID.

    const submitReviewToCB = async (accessToken, review) => {
        const url = `https://critiquebrainz.org/ws/1/review/`;
        const response = await fetch(url, {
            method: "POST",
            headers: {
                Authorization: `Bearer ${accessToken}`,
                "Content-Type": "application/json;charset=UTF-8",
            },
            body: JSON.stringify({
                is_draft: false,
                entity_id: review.entity_id,
                entity_type: review.entity_type,
                text: review.text,
                license_choice: "CC BY-SA 3.0",
                language: review.languageCode,
                rating: review.rating,
            }),
        });
        return response.json();
    };
    
  2. Add front-end components to allow users to post ratings.

Timeline:

Here is a detailed week-by-week timeline for the 175 hours of the GSoC coding period, which I intend to maintain and follow.

Pre-Community Bonding Period ( April ):

I plan to work on tickets and features to get more understanding of the workflow, code structure, and related technical skills.

Community Bonding Period ( May - June ):

I plan to fix existing bugs, help to merge pending PRs and close the issues. Also, I will discuss with my mentors regarding the roadmap, Finalising the UI and get a clear sight of the plan of action for the project.

Week 1:

I will begin by creating a Postgres database in a docker container and writing scripts to import the BB data dumps.

Week 2-3:

Then will write functions to fetch data from the BB database. Then would implement the search feature.

Week 4:

I’ll update the front-end code in CB to show reviews and other associated information for BB entities. Then I will add support for new entity types in the CB database. Update existing functions and features to support them.

Week 5:

I will update the CritiqueBrainz APIs to allow users to get and post BB reviews.

Week 6:

Add backend functionality in BB to get reviews from CB.

Week 7:

Create front-end components to show reviews and ratings in BB.

Week 8:

In BookBrainz, I would implement an OAuth login to CB. (Backend)

Week 9:

I would create front-end components and routes to allow users to connect to CB.

Week 10:

I will add backend functionality to post reviews and ratings in BB.

Week 11:

I will update the frontend to allow users to post reviews and ratings in BB.

Week 12:

Clean up the code and write documentation. Discuss with my mentors and make the relevant changes before the final submission of the work.

Stretch Goals:

  • Expand this functionality for other entity types also.
  • Allow users to view all their reviews and ratings on one page in BB.
  • Unify the reviews where an entity in BB also exists in MB.

Detailed information about yourself

My name is Ansh Goyal. I am currently a freshman in my B.E from Birla Institute of Technology and Science (BITS), Pilani, India.

Tell us about the computer(s) you have available for working on your SoC project!

I own an Apple Macbook Air, M1 Chip with 8 GB RAM.

When did you first start programming?

I remember writing my first code in C++ back in my 9th standard.

What type of music do you listen to? (Please list a series of MBIDs as examples.)

I generally listen to rock music like one’s from Linkin Park and Nirvana.

If applying for a BookBrainz project: what type of books do you read?

I enjoy reading works of mystery and thrillers like Panama Red and The Da Vinci Code.

Have you ever used MusicBrainz to tag your files?

Yes, I have! :smiley:

What aspects of the project you’re applying for (e.g., MusicBrainz, AcousticBrainz, etc.) interest you the most?

BookBrainz provides an open and free database and enables avid book readers like me to make full use of collections and retrieve corresponding details of books.

Have you contributed to other Open Source projects? If so, which projects and can we see some of your code?

I am relatively new to open source.

If you have not contributed to open source projects, do you have other code we can look at?

Yeah, over the past year, I’ve played with Python, NodeJS, PostgreSQL and Redis by building some personal and hackathon projects. A few of them are available here: FridgeToTable, Taapmaan.

What sorts of programming projects have you done on your own time?

As a hobbyist, I have done various python and javascript projects.

How much time do you have available, and how would you plan to use it?

I plan to devote 30 hours per week starting July. My university classes will end in June.

7 Likes

Hi @anshgoyal31, thanks for your proposal.
It looks like you’ve got a pretty good understanding of this task so far.

In some cases I think that it would be good if you added a bit more detail to each of the steps. Not necessarily detail in terms of what functions to add or what changes to make to database tables, but more to indicate that you’re aware of the things that will have to be added or changed at each step.

I was talking with @mr_monkey and we were wondering about the order that you plan to finish tasks.
We think that it’d be great to be able to completely finish a specific functionality completely rather than do a bit of everything and then possibly run out of time to complete it. We think that it could be broken down like this:

  • Support for BB entities in CB
  • Pages in CB for viewing and reviewing entities
  • API in CB
  • Show CB reviews in BB via the API without authentication (like we do in MusicBrainz)
  • BB Authentication to CB
  • BB Add interface to write reviews

For each task you should add more detail about what each task requires, for example, to add a new entity to CB, make a list of each page that you need to add or change on CB and what information will you put on each page? search for an entity, write a review, API to get data, API to submit a review? How will you test each of these? If the data that you will show on a page for each entity type is different for each entity, make sure you list each of them (e.g. is the data for an Author, Edition, and Work the same or different?)

I’d like to see some more discussion with @mr_monkey about which BB entities you are planning on adding. The list that you suggested is different to the list that is on the ideas page. I think that it would be a good idea to pick a single entity type to get end-to-end working. This means that if we run out of time to finish the project, at least we would have one entity working rather than 6 entities, none of which work.

Some feedback on a few specific items in this proposal:

Allow CB to fetch data from the BB database to get the information associated with BBID:

Can you add a bit more detail here, explaining where you plan to add the code to connect to the BB database? This can just be a brief description of what python package you plan to make, and a small description of the types of methods that you will implement for each entity (We have 2 types of methods for accessing the MB database, bulk lookup and single lookup, so BB will probably need the same).

For MB entities in CB, we have direct DB access to get items, but we use the musicbrainz API to perform search. Can you describe how you plan to do search for BB Items?

I don’t think it’s necessary to add a review_types enum to CritiqueBrainz database. We already have the entity_types enum, and we can name these in a way that the bookbrainz ones are easily identified.

I’m not as familiar with the BookBrainz code, so I won’t comment too much on this part of the project.
Could you update your timeline to make it a bit clearer which project you will be working on during each week?

I note that you proposed 175 hours over 12 weeks, but you mentioned that you were planning on doing 30+ hours/week - this doesn’t match, can you indicate if you are planning on doing 175h over 12 weeks or if you think that it might take more than 175h to complete.

Thanks!

2 Likes

Thanks for your feedback and guidance at all times!

I agree with your opinions and have made the relevant changes to the proposal.

  • Changed the order of tasks to complete the project.
  • Added more information for tasks.
  • Discarded the proposed schema change.
  • Added implementation for search for BB entities.

As discussed on IRC

I will complete the project for the EditionGroup entity first and then finish it for other entities.

I plan to add the code to connect the BB database directly to the CB codebase. If required later, we can migrate the code into a separate python package.

I plan to work 30 hours/week as I believe this project requires more time for complete implementation for all entity_types. I plan to work on the stretch goals mentioned after completing the tasks proposed.

Thank you so much for your insights! :smiley:

Hi!
Thanks for updating the proposal, the CB descriptions are much clearer. I have no further feedback on the proposal, make sure you submit it to the google platform before the deadline!

I had a discussion with @mr_monkey and we agree with your assessment that the amount of work that you have proposed is closer to 350h. Make sure that you specify this in the official proposal submission.

good work, and good luck!

1 Like

I just thought of another possible stretch goal based on a question that you asked a few weeks ago - what if we try and work out how to unify the cases where an author in BB also exists in MB as an artist? BB has relationships to MB, so in the cases that a relationship exists, we could unify these reviews (both on the artist page on CB, and on the author page on BB).

1 Like

The proposal looks great, thanks!

In general, I’d love to see a bit more detail in the BB section (in particular “Implement OAuth login from BB to CB”), it’s currently a bit thin.
How are we delivering the accessToken to the front-end to allow users to submit reviews?

Any ideas ho we would show review on BookBrainz? The average rating and write a review button are nicely placed; I’m curious how you would suggest displaying reviews.

Here are a few existing pieces of code that could help make your project easier:
Script to download and import the BookBrainz db: bookbrainz-site/download-import-dump.sh at master · metabrainz/bookbrainz-site · GitHub
And you already found the CB review modal in Listenbrainz: listenbrainz-server/CBReviewModal.tsx at master · metabrainz/listenbrainz-server · GitHub

And finally, as a complement to alastairp’s suggestion, here is an author that is in both databases, with relationships added between each other:

https://bookbrainz.org/author/b2507eee-1391-47c5-93e6-ca972bd8e0e0

1 Like

@mr_monkey Thank you so much for your feedback! :smiley:

I’ve added more descriptions for the “Implement OAuth login from BB to CB” section. Also, I have updated my UI mockup to include reviews for the entities.

For the stretch goal, you and @alastairp suggested, it looks pretty exciting and is quite helpful for BB and CB. I have added this in my proposal as well.

1 Like

If I get it right this project doesn’t include links to external reviews.
Is there a possibility to add something like this as additional feature? —>

Vollbildaufzeichnung 16.04.2022 115920

2 Likes

hi indy, CritiqueBrainz is specifically for Open Reviews. ones which have the same licenses that other MetaBrainz data has.
This is idea will not really fit into the BB-CB integration project.

BB already allowes linking to Goodreads, and eventually when we have URL relationship, it should be possible to allow links to other review sites as well.

3 Likes

Thx, good enough for me, it’s not really an urgent issue :wink:

1 Like