GSoC 2018: Reviewing more entities and migration of rating system into CritiqueBrainz

manofcode · March 21, 2018, 7:59pm

Personal Information:

Name : Mratyunjay Soni
IRC Nick : manofcode
E-mail : mratyunjay.soni@gmail.com
Github : https://github.com/man-of-code
LinkedIn : https://www.linkedin.com/in/manofcode
Time Zone : UTC+0530

Project :

Overview:

In reference to ticket CB-270.
Currently, the main task at hand is migrating the ratings from musicbrainz db to critiquebrainz.
In order to accomplish this, support for more and more entities needs to be provided.

The entities that can be reviewed currently are :

Release Group
Event
Place

In my SoC Period, I aim to add support for the entities :

Artists
Recordings
Work
Labels

Reviewing Process after SoC period :

After the support for entities has been provided, migration of ratings can begin.

Challenges:

MusicBrainz identifies its users using string_identifiers, whereas CritiqueBrainz does the same with the help of unique UUIDs.

During the process of migration of ratings, some conflicts can be predicted for sure, for eg. the same user can be present in both the db(s).
A single user might have rated an entity differently in both the databases.
There can be some users, who have rated the entities in musicbrainz, but they do not exist in critiquebrainz.

Goals :

Adding support for Entities :
Provide support for the above mentioned entities i.e add reviewing support and integrate it with the UI.
To implement rating system, reviewing functionality for more and more entities has to be provided.
Adding tests for functions :
To test the efficiency of the functions, test files will be added accordingly.
Migraton of ratings :
The ratings from mbdump/derived can be migrated now and merged with the critiquebrainz.
Change UI:
The UI will be redesigned , in accordance of the discussion with the mentor. Forms for the added entities will be added.
Documentation:
All the changes made during the SoC period, will be documented in an orderly fashion. Also, documentation will be updated accordingly.

Implementation Details:

Writing Database Queries

CritiqueBrainz uses SQLAlchemy ORM to interact with the database. To add support for more and more entities, codebase can be appended with different files for each entity as been done earlier, i.e separate files for each entity like work.py , label.py etc.

Example code for work.py is as follows :

from collections import defaultdict
from mbdata import models
from brainzutils import cache
from critiquebrainz.frontend.external.musicbrainz_db import mb_session, DEFAULT_CACHE_EXPIRATION
from critiquebrainz.frontend.external.musicbrainz_db.utils import get_entities_by_gids
from critiquebrainz.frontend.external.musicbrainz_db.includes import check_includes
from critiquebrainz.frontend.external.musicbrainz_db.serialize import to_dict_works
from critiquebrainz.frontend.external.musicbrainz_db.helpers import get_relationship_info


def get_work_by_id(mbid):
    """Example function to get work with mbid.
    """
    key = cache.gen_key(mbid)
    work = cache.get(key)
    if not work:
        work = _get_work_by_id(mbid)
        cache.set(key=key, val=work, time=DEFAULT_CACHE_EXPIRATION)
    return work


def _get_work_by_id(mbid):
    pass


def fetch_multiple_works(mbids, *, includes=None):
    """Example funtion to get info related to multiple works using their MusicBrainz IDs
    """
    if includes is None:
        includes = []
    includes_data = defaultdict(dict)
    check_includes('work', includes)
    with mb_session() as db:
        query = db.query(models.Work)
        works = get_entities_by_gids(
            query=query,
            entity_type='work',
            mbids=mbids,
        )
        work_ids = [work.id for work in works.values()]

        if 'artist-rels' in includes:
            get_relationship_info(
                db=db,
                target_type='artist',
                source_type='work',
                source_entity_ids=work_ids,
                includes_data=includes_data,
            )
        if 'place-rels' in includes:
            get_relationship_info(
                db=db,
                target_type='place',
                source_type='work',
                source_entity_ids=work_ids,
                includes_data=includes_data,
            )
        if 'series-rels' in includes:
            get_relationship_info(
                db=db,
                target_type='series',
                source_type='work',
                source_entity_ids=work_ids,
                includes_data=includes_data,
            )
        if 'url-rels' in includes:
            get_relationship_info(
                db=db,
                target_type='url',
                source_type='work',
                source_entity_ids=work_ids,
                includes_data=includes_data,
            )
        if 'release-group-rels' in includes:
            get_relationship_info(
                db=db,
                target_type='release_group',
                source_type='work',
                source_entity_ids=work_ids,
                includes_data=includes_data,
            )
return {str(mbid): to_dict_events(works[mbid], includes_data[works[mbid].id]) for mbid in mbids}

The function get_work_by_id fetches the information of the entity from musicbrainz db associated with the provided mbid.The function first generates the cache key of the given mbid and then fetches the entity using the generated cache key.

The function _get_work_by_id invokes the function fetch_multiple_works providing the mbids and the list of information to be included.

The function fetch_multiple_works basically fetches the information of the entity related to multiple works using their mbids. It returns the dictionary of multiple works keyed with the provided mbids.

The above provided file is just an example of how database queries will be handled for each entity.

Also mock changes to be made in the entity.py are :

from critiquebrainz.frontend.external.musicbrainz_db.release_group import fetch_multiple_release_groups, get_release_group_by_id
from critiquebrainz.frontend.external.musicbrainz_db.place import fetch_multiple_places, get_place_by_id
from critiquebrainz.frontend.external.musicbrainz_db.event import fetch_multiple_events, get_event_by_id
from critiquebrainz.frontend.external.musicbrainz_db.artist import fetch_multiple_events, get_artist_by_id
from critiquebrainz.frontend.external.musicbrainz_db.label import fetch_multiple_events, get_label_by_id
from critiquebrainz.frontend.external.musicbrainz_db.work import fetch_multiple_events, get_work_by_id
from critiquebrainz.frontend.external.musicbrainz_db.recording import fetch_multiple_events, get_recording_by_id

def get_multiple_entities(entities):
    """Fetch multiple entities using their MBIDs.
    """
    pass

def get_entity_by_id(id, type='release_group'):
    """A wrapper to call the correct get_*_by_id function."""
    if type == 'release_group':
        entity = get_release_group_by_id(str(id))
    elif type == 'place':
        entity = get_place_by_id(str(id))
    elif type == 'event':
        entity = get_event_by_id(str(id))
    elif type == 'artist':
        entity = get_artist_by_id(str(id))
    elif type == 'work':
        entity = get_work_by_id(str(id))
    elif type == 'label':
        entity = get_label_by_id(str(id))
    elif type == 'recording':
        entity = get_recording_by_id(str(id))
return entity

The whole SoC period will witness changes(example provided above) in many files like /critiquebrainz/db/review.py and many more.

UI change after implementing providing support for reviewing more entities(browsing and artist review to be exact) would sort of look like :

Changes to be made in testing :

To test the above implemented .py files providing support for newly added entities, a test file needs to be made i.e. work_test.py to test work.py , similarly for every other entity.

Example implementation for work_test.py is as follows :

from unittest import TestCase
from unittest.mock import MagicMock
from critiquebrainz.frontend.external.musicbrainz_db import work as mb_work
from critiquebrainz.frontend.external.musicbrainz_db.test_data import atlas, wildest_dreams
from critiquebrainz.frontend.external.musicbrainz_db.tests import setup_cache
import critiquebrainz.frontend.external.musicbrainz_db.utils as mb_utils

class WorkTestCase(TestCase):

    def setUp(self):
        setup_cache()
        mb_work.mb_session = MagicMock()
        self.mock_db = mb_work.mb_session.return_value.__enter__.return_value
        self.work_query = self.mock_db.query.return_value.filter.return_value.all

    def test_get_work_by_id(self):
      """ Example function to test work  """
        self.work_query.return_value = [atlas]
        work = mb_work.get_work_by_id('example_mbid')

    def test_fetch_multiple_works(self):
      """ Example function to test multiple works """

    def test_unknown_work(self):
      """ Example function to test unknown work """
        self.work_query.return_value = []
        mb_utils.reviewed_entities = MagicMock()
        mb_utils.reviewed_entities.return_value = ['example_mbid']
        work = mb_work.get_work_by_id('example_id')
self.assertEqual(work['name'], '[Unknown Work]')

Migrations of Ratings:

After the support for entities required have been provided, the elephant in the room can be addressed, i.e. migration of ratings can be started to work on.

The migration will involve the current technologies, i.e. python and postgreSQL, and port the ratings from mbdumps to critiquebrainz.

The process will be dealing with certain conditions like conflicting users, etc. In order to resolve these conflicts, we can check if the user exists in both of the databases.

If a single user exists in both db(s), we can map the string_identifiers(form MB) to the UUIDs(from CB).
If a single user has rated an entity differently in both the db(s), then the ratings from the CB can be preferred for calculating avg_rating.
If a MB user does not exist in CB, then he/she can be provided a choice to port.

The process can be simplified with the help of this flowchart :

The tables that will be accessed are :

The user <-> entity rating mappings are provided in the *_rating_raw tables.
The aggregate ratings are contained in *_meta tables.

To access these tables, a rating.py file will be added, which will be accessing the db to provide the support for rating system, it will be working on postgresSQL and SQLAlchemy ORM, the ratings along with the reviews will be accessed through the UUIDs involving several functions such as get_by_id, get_count, update, create, delete.

from random import shuffle
from datetime import datetime, timedelta
import uuid
import sqlalchemy
import pycountry
from brainzutils import cache
from critiquebrainz import db
from critiquebrainz.db import (exceptions as db_exceptions,
                               revision as db_revision,
                               users as db_users,
                               avg_rating as db_avg_rating,
                               RATING_SCALE_1_5)
from critiquebrainz.db.user import User

RATING_CACHE_NAMESPACE = "Rating"
DEFAULT_LICENSE_ID = "CC BY-SA 3.0"
DEFAULT_LANG = "en"
ENTITY_TYPES = [
    "event",
    "place",
    "release_group",
    "artist",
    "place",
    "work",
    "label",
]


def get_work_by_id(rating_id):
    """
        Example function to get a rating of the entity work by it's id
    """
    with db.engine.connect() as connection:
        result = connection.execute(sqlalchemy.text("""
          	SELECT work.id,
                work.rating,
                FROM work
          INNER JOIN work_rating_raw wr
                ON work.id = wr.id
        """), {
            "rating_id": rating_id,
        })
    return rating


def get_work_count(*, is_draft=False, is_hidden=False):
    """
        Example function to get a count of ratings in CritiqueBrainz
    """


def update_work_rating(rating_id, *, drafted, text=None, rating=None, license_id=None, language=None, is_draft=None):
    """
        Example function to update a rating of the entity work 
    """


def create_work_rating(*, entity_id, entity_type, user_id, is_draft, text=None, rating=None,
           language=DEFAULT_LANG, license_id=DEFAULT_LICENSE_ID,
           source=None, source_url=None):
    """
        Example function to create a rating of the entity work 
    """

To test the above mentioned functions, a rating_test.py will also be added, which will be working on the critiquebrainz/db to test the efficiency and correctness of the implementation. Functions like test_rating_creation, test_rating_deletion, test_rating_updation will be created.

from critiquebrainz.data.testing import DataTestCase
import critiquebrainz.db.users as db_users
from critiquebrainz.db.user import User
import critiquebrainz.db.rating as db_rating
import critiquebrainz.db.revision as db_revision
import critiquebrainz.db.exceptions as db_exceptions
import critiquebrainz.db.license as db_license

class RatingTestCase(DataTestCase):

    def setUp(self):
        super(ReviewTestCase, self).setUp()

        # Review needs user
        self.user = User(db_users.get_or_create("Tester", new_user_data={
            "display_name": "test user",
        }))
        self.user_2 = User(db_users.get_or_create("Tester 2", new_user_data={
            "display_name": "test user 2",
        }))

        # And license
        self.license = db_license.create(
            id=u'Test',
            full_name=u"Test License",
        )

    def test_review_creation(self):
        """
            Example function to test the creation of the ratings
        """

    def test_review_deletion(self):
        """
            Example function to test the deletion of the ratings
        """

    def test_update(self):
        """
            Example function to test the updated rating of the entities
        """

    def test_reviewed_entities(self):
        """
            Example function to test the rated entities
        """

According to me, the workflow I have proposed shouldn’t require any change in the musicbrainz server, as it is accessing the db independent of the workflow of musicbrainz.

Changes required in Web Services:

Web services is a field which needs to be explored and improved more. 2 ideas off the top of my head are :

Integration of Rating and Reviewing system :
In spite of keeping ratings and reviews apart, a functionality can be added which provides the user to rate the entity along with the review.
Providing follower functionality :
After witnessing many platforms, it can be said for sure that following functionality is quite popular.
There are users who are quite renowned in the world, thus have a huge fan base. So, to provide a follower functionality would definitely level-up the website and attract more traffic.

Obviously, the above ideas will be implemented after thorough discussion with the proposed mentor and the community.

Proposed UI mock-up after implementing the above ideas(following functionality and merging rating and reviewing) :

To provide the above functionalities, there can be 3 types of reviews present afterwards.

Only text
Only rating
Both text and rating.

As the first one has already been implemented, second and third would require some changes in the /critiquebrainz/ws.
After going through the codebase, I observed that the ws code fetches the entire data from db/review.py thus, it would be quite easier to handle as only the newly added entities would require to be appended in the entities list.

Timeline

A broad timeline of my work during summer in a nutshell:

Community Bonding(April 24 - May 13) :
Will start setting/completing up the docker containers for the development environment of CB. Also, discuss design decisions with mentor and discuss UI changes.
Phase I(May 14 - June 11) :
Start working on the support for entities work, artist and labels,i.e. database queries ,tests and UI modification.
Entities will be added one by one , making it much easier to keep the track of the work done.
Phase II(June 12- July 9):
This phase will include the addition of support for the entities recordings according to the designs discussed.
Then, the discussed changes in UI will be implemented and finalized.Also, start working on migration of ratings.
Phase III(July 10 - August 6):
In this period, I plan to complete the migration of the ratings from musicbrainz to critiquebrainz. Now comes the easy part, I’ll be updating the documentation accordingly as well as fixing the bugs, making the project ready for final submission.
After Summer of Code:
Continue contributing on CritiqueBrainz as I have been since February. Will start working on porting the rating system as the requirement of more entities has been fulfilled.

Here is a more detailed week-by-week timeline of 13 week coding period:

Week 1 (May 14 - May 20): Start working on the database queries of the entity artist and its interfacing with the frontend.
Week 2 (May 21 - May 27): Write tests for the newly added entity and fix the bugs.
Week 3 (May 28 - June 3): The addition of the entity work begins here , along with its interfacing.
Week 4 (June 4 - June 10): Write tests for the entity work and finalize it.

First evaluations

Week 5 (June 11 - June 18): If behind on schedule, then catch up.Also, start working on the database queries of the entity labels, and the UI change for the same.
Week 6 (June 18 - June 24): Finalize the support for the entity labels , as well as, start working on the entity recordings.Complete the database queries for the entity recordings as well as fix the bugs.
Week 7 (June 25 - July 1): UI change for the entity recordings will be implemented as well as start working on migration of ratings from musicbrainz db.
Week 8 (July 2 - July 8): Finalize the desing plans for migration of ratings as well as prepare for the evaluations.

Second evaluations

Week 9 (July 9 - July 15): Resume the work on migration of ratings as well as finalize the UI designs.
Week 10 (July 16 - July 22): Finalize the porting of rating system(remove conflicts as well as provide tests).
Week 11 (July 23 - July 29): If behind on stuff, then catch up. Encapsulate the changes made and finalize the project in accordance with the mentor.
Week 12 (July 30 - August 6) CUSHION WEEK : Document the changes made throughout the summer.
Week 13 (August 7 - August 14): PENCILS DOWN WEEK : Finish the documentation and make sure everything is in the right place for making the final submission.

Detailed Information about me

I am a sophomore CS undergrad at the Indian Institute of Information Technology, Una. I came across CritiqueBrainz
while looking for a website with music reviewing functionality.

Question: Tell us about the computer(s) you have available for working on your SoC project?
Answer: Currently I have an Acer Aspire VX15 with Intel i7 7th gen processor and 8 GB RAM, running Linux Mint.

Question: When did you first start programming?
Answer: I have been programming since 11th grade, mostly C++. I picked up Python in my freshman year, working on small projects which can be viewed on my github.

Question: What type of music do you listen to?
Answer: I listen mostly rock music. My Favourite artists and songs are:
Imagine Dragons, ColdPlay, Arijit Singh. Presently top listens are: Whatever it takes, Something just like this.

Question: Have you ever used MusicBrainz to tag your files?
Answer: Yes I use Picard to tag my files.

Question: Have you contributed to other Open Source projects? If so, which projects and can we see some of your code?
Answer: I have contributed to the source code of CritiqueBrainz, and various other projects which can be
found on my GH.

Question: What sorts of programming projects have you done on your own time?
Answer: I have worked on a group music player with peer to peer connectivity and an autocorrect application. Both of these are based on python and are available on my GH.

Question: How much time do you have available, and how would you plan to use it?
Answer: I have holidays during most of the coding period and can work full time (45-50 hrs per
week) on the project.

Question: Do you plan to have a job or study during the summer in conjunction with Summer of Code?
Answer: None, if selected for GSoC.

manofcode · March 21, 2018, 8:00pm

This is an initial draft of my application. Any kind of suggestions and feedback are welcome!

iliekcomputers · March 25, 2018, 6:52pm

Hi @manofcode, thanks for the proposal!

I like the fact that the proposal has implementation details which are good to see! Well done.

I still have some comments on the proposal that you should consider before submitting the final proposal.

Your timeline seems really spread out to me. Do you think that writing tests for one week is reasonable? Maybe we can think of adding more work to the proposal? I would like to see something about importing MusicBrainz ratings into CB, if possible.
The first diagram in the proposal makes little sense, what are you trying to convey with it?
You should have more details in general in the proposal. What new webpages / endpoints will we need to make? Where will changes need to be made in the CB webservice? (you do not the mention the ws at all in the proposal). What changes in the documentation will be needed exactly? (Hint: ws changes )
In the mockup you show a cropped out part of the MusicBrainz webpage, do you think that we should emulate MusicBrainz here? If yes, how? If not, then you should remove that part from the mockup.

In general, I think the timeline is a bit too stretched out and we should consider adding more work to be done, maybe related to porting MusicBrainz ratings (with details about how that work could be done).

Again, thanks for the proposal!

manofcode · March 25, 2018, 7:19pm

Hi @iliekcomputers, thanks for reviewing my initial draft of the proposal, it was much awaited.
I totally agree with you on the stretched timeline, and I was also considering to add the migration of ratings from MusicBrainz to CB.

I’ll work on improving the first diagram.
I’m already halfway through the final draft, will also be adding the details related to ws and will be working on the mockups too.

manofcode · March 26, 2018, 2:27pm

Hi @iliekcomputers, Good Evening!!

I have updated my proposal after considering your suggestions, and it is ready for the final review.

iliekcomputers · March 26, 2018, 6:32pm

The migration of ratings section is a really good thing to see. You should add more details to communicate that you know what exactly this will entail. I would like to see details about how this would be implemented. What scripts would you write? What tables do we need to access? How will we access those tables? How will you test these scripts? Do we need to make changes in musicbrainz server to allow for the workflow that you have proposed? If so, then what changes exactly?

Thanks!

manofcode · March 26, 2018, 6:47pm

I’ll be doing a research on the tables tonight, but currently I don’t have much idea about the scripts that will be needed. Also, as you mentioned on the IRC there won’t be any major changes in the musicbrainz server.