Code Readibility

fannyvieiira · February 5, 2019, 1:17am

Hi, i’m joining the critiquebrainz community. My goal is to delve into the code, because i would like to participate in the gsoc this year.

I’ve gone deeper and studied a lot of the code and some things caught my attention. The first, is that the whole sql code is done manually, but sqlalchemy itself offers built-in functions to manipulate database.

The second, is that the code has some bad smells, if you don’t knows what a bad smell meaning, check this link, it can be useful.In many, code snippets we see code duplication, especially, when the topic is validation, for example:

 if review["is_hidden"]:
raise NotFound("Review has been hidden.")

this verification is done in many function of this file. In my view point, this could be easily avoid if we add a validation layer. The code has also very large methods that do many things, which at times can hamper understanding the main purpose of method, see below:

    filters = []
    filter_data = {}
    if not inc_drafts:
        filters.append("is_draft = :is_draft")
        filter_data["is_draft"] = False

    if not inc_hidden:
        filters.append("is_hidden = :is_hidden")
        filter_data["is_hidden"] = False

    # FILTERING
    if entity_id is not None:
        filters.append("entity_id = :entity_id")
        filter_data["entity_id"] = entity_id

    if entity_type is not None:
        filters.append("entity_type = :entity_type")
        filter_data["entity_type"] = entity_type

    if license_id is not None:
        filters.append("license_id = :license_id")
        filter_data["license_id"] = license_id

    if user_id is not None:
        filters.append("review.user_id = :user_id")
        filter_data["user_id"] = user_id

    if exclude is not None:
        filters.append("review.id NOT IN :exclude")
        filter_data["exclude"] = tuple(exclude)

    if language is not None:
        filters.append("language = :language")
        filter_data["language"] = language

    filterstr = " AND ".join(filters)
    if filterstr:
        filterstr = " WHERE " + filterstr

    query = sqlalchemy.text("""
        SELECT COUNT(*)
          FROM review
            {filterstr}
        """.format(filterstr=filterstr))

    result = connection.execute(query, filter_data)
    count = result.fetchone()[0]
    order_by_clause = str()

    if sort == "popularity":
        order_by_clause = """
            ORDER BY popularity DESC
        """
    elif sort == "published_on":
        order_by_clause = """
            ORDER BY review.published_on DESC
        """
    elif sort == "random":
        order_by_clause = """
            ORDER BY RANDOM()
        """
    # Note that all revisions' votes are considered in this popularity
    query = sqlalchemy.text("""
        SELECT review.id,
               review.entity_id,
               review.entity_type,
               review.edits,
               review.is_draft,
               review.is_hidden,
               review.license_id,
               review.language,
               review.source,
               review.source_url,
               review.user_id,
               "user".display_name,
               "user".show_gravatar,
               "user".is_blocked,
               "user".email,
               "user".created as user_created,
               "user".musicbrainz_id,
               review.published_on,
               MIN(revision.timestamp) as created,
               SUM(
                   CASE WHEN vote='t' THEN 1 ELSE 0 END
               ) AS votes_positive_count,
               SUM(
                   CASE WHEN vote='f' THEN 1 ELSE 0 END
               ) AS votes_negative_count,
               SUM(
                   CASE WHEN vote = 't' THEN 1
                   WHEN vote = 'f' THEN -1 WHEN vote IS NULL THEN 0 END
               ) AS popularity,
               latest_revision.id as latest_revision_id,
               latest_revision.timestamp as latest_revision_timestamp,
               latest_revision.text as text,
               latest_revision.rating as rating,
               license.full_name,
               license.info_url
          FROM review
          JOIN revision ON review.id = revision.review_id
     LEFT JOIN vote ON vote.revision_id = revision.id
          JOIN "user" ON review.user_id = "user".id
          JOIN license ON license.id = license_id
          JOIN (
                revision
                JOIN (
                    SELECT review.id AS review_uuid,
                           MAX(timestamp) AS latest_timestamp
                      FROM review
                      JOIN revision ON review.id = review_id
                  GROUP BY review.id
                  ) AS latest
                  ON latest.review_uuid = revision.review_id
                 AND latest.latest_timestamp = revision.timestamp
               ) AS latest_revision ON review.id = latest_revision.review_id
        {where_clause}
      GROUP BY review.id, latest_revision.id, "user".id, license.id
        {order_by_clause}
         LIMIT :limit
        OFFSET :offset
        """.format(where_clause=filterstr, order_by_clause=order_by_clause))

    filter_data["limit"] = limit
    filter_data["offset"] = offset

    results = connection.execute(query, filter_data)
    rows = results.fetchall()
    rows = [dict(row) for row in rows]

    # Organise last revision info in reviews
    if rows:
        for row in rows:
            row["rating"] = RATING_SCALE_1_5.get(row["rating"])
            row["last_revision"] = {
                "id": row.pop("latest_revision_id"),
                "timestamp": row.pop("latest_revision_timestamp"),
                "text": row["text"],
                "rating": row["rating"],
                "review_id": row["id"],
            }
            row["user"] = User({
                "id": row["user_id"],
                "display_name": row.pop("display_name"),
                "show_gravatar": row.pop("show_gravatar"),
                "is_blocked": row.pop("is_blocked"),
                "musicbrainz_username": row.pop("musicbrainz_id"),
                "email": row.pop("email"),
                "created": row.pop("user_created"),
            })

return rows, count

this can be avoided by using auxiliary functions.

Besides that, methods has large parameters list, inside them is made many checks to know if values are None, to solve this, we use objects to encapsulate the data.

def create(*, entity_id, entity_type, user_id, is_draft, text=None, rating=None,
language=DEFAULT_LANG, license_id=DEFAULT_LICENSE_ID, source=None, source_url=None):

Anyway, I’d really like to help you refactor the code, but I also understand that every problem I’ve cited may have a reason and I’d like to understand it.

reosarevok · February 5, 2019, 9:47am

I think we generally prefer to do SQL directly rather than use libraries on top of it, just because we like to be able to do whatever we want with SQL and we’re comfortable with it. But I haven’t worked on CB myself, so the reasonings might be different project to project

fannyvieiira · February 5, 2019, 11:26pm

Oh yes, i understand!

Freso · February 6, 2019, 2:53pm

Gentlecat (Roman) was the original coder of CB, but I believe it’s under @iliekcomputers’s sphere of responsibility now—although I’m not sure he’ll be able to answer all your questions either.

Also, AFAICT you haven’t already, so you might want to come on IRC (the #metabrainz channel) and introduce yourself there. That’s the main venue for development discussion (outside of actual code, PRs, and tickets), and all current GSoC prospects hang out there, as well as all the main developers of our various projects.

code-master5 · February 8, 2019, 6:49pm

Hi Fanny,

We welcome you to the MetaBrainz community. I appreciate your comments on CB codebase. I also began to contribute to CB only last year but I still think I should say a few things here.
I would suggest you to find more about the aim of the project, what community thinks about it and what are the community’s preferences regarding how things should move on. This can only be done if you try to connect more to other people of the community (joining IRC channel should be the priority here).

Having said that, I would try my best to answer your concerns.

For raw SQL queries, if you had tried to find out a bit more about this, you might’ve found the answer by yourself ( MetaBrainz tends to choose students who can find answers to their own questions). It’s the community’s preference to use raw SQL queries. For example, look here. We prefer more speed, flexibility, and most importantly, freedom in expanding our projects. Moving to raw SQL queries was deliberately done and a LOT of other students like you contributed to make that happen.
For bad smell, I would like to remind you that coding is a forever learning experience. A lot of students like you worked hard to add whatever code we have right now. They obviously never meant to write bad code. We proudly acknowledge that it is the best they could contribute, and we are forever thankful for their contributions, howsoever small they are.
Our GSoC students, and in fact, any contributor for that matter, are very important to us. That includes you too. We do not mean to say that you should not criticize our projects, but, we would be very grateful if you would try a bit more to get involved and help us understand each other better. We love freedom, but we try to abide by our core values, which involves respecting each other and their contributions.

Finally, if you’re really interested to contribute to CB, please go through this document and links given inside it very carefully. I personally would like to suggest you to discuss any such issues on IRC channel where you can propose your ideas and improvements. We assure you that we would discuss them before arriving on any final decision.

We are very excited that GSoC 2019 is going to start. Wish you very good luck for GSoC and your journey into the world of open source and MetaBrainz.

fannyvieiira · February 8, 2019, 7:55pm

Yes, I understood and I support the decision a lot, it was with many errors that I improved the code, I’m sorry if I came across the idea of criticism in a bad sense, it was more to understand how the decisions were made, I love the idea of facilitating for new contributors, mainly because most people feel incapable when in fact it is not. Thanks for the suggestions I will analyze the document, and once again I am sorry if with my comments I annoyed the community.

jesus2099 · February 8, 2019, 11:53pm

Honesty, your post was very clever and thoughtful. I don’t see how it could annoy any sensible person.

You did not critisise, you just suggested good stuff with reasons and suspected there was also good reasons for things to be what they are.

fannyvieiira · February 9, 2019, 1:38am

Thank u very much for the words!

iliekcomputers · February 9, 2019, 2:45pm

@fannyvieiira, Hi!

Thanks a lot for taking the initiative and wanting to contribute to CritiqueBrainz, we really appreciate it. It is really great to see new contributors for any of our projects.

You also make a lot of great points about code smells and similar stuff. We personally prefer to use raw SQL instead of ORM, you can see the history and reasons for that in this ticket.

Also, regarding code smells, thank you for pointing them out! Feel free to open tickets and discuss problems with the code on IRC and open pull requests. We really appreciate any improvements to our code from anyone!

Thanks again!