Hi, i’m joining the critiquebrainz community. My goal is to delve into the code, because i would like to participate in the gsoc this year.
I’ve gone deeper and studied a lot of the code and some things caught my attention. The first, is that the whole sql code is done manually, but sqlalchemy itself offers built-in functions to manipulate database.
The second, is that the code has some bad smells, if you don’t knows what a bad smell meaning, check this link, it can be useful.In many, code snippets we see code duplication, especially, when the topic is validation, for example:
if review["is_hidden"]:
raise NotFound("Review has been hidden.")
this verification is done in many function of this file. In my view point, this could be easily avoid if we add a validation layer. The code has also very large methods that do many things, which at times can hamper understanding the main purpose of method, see below:
filters = []
filter_data = {}
if not inc_drafts:
filters.append("is_draft = :is_draft")
filter_data["is_draft"] = False
if not inc_hidden:
filters.append("is_hidden = :is_hidden")
filter_data["is_hidden"] = False
# FILTERING
if entity_id is not None:
filters.append("entity_id = :entity_id")
filter_data["entity_id"] = entity_id
if entity_type is not None:
filters.append("entity_type = :entity_type")
filter_data["entity_type"] = entity_type
if license_id is not None:
filters.append("license_id = :license_id")
filter_data["license_id"] = license_id
if user_id is not None:
filters.append("review.user_id = :user_id")
filter_data["user_id"] = user_id
if exclude is not None:
filters.append("review.id NOT IN :exclude")
filter_data["exclude"] = tuple(exclude)
if language is not None:
filters.append("language = :language")
filter_data["language"] = language
filterstr = " AND ".join(filters)
if filterstr:
filterstr = " WHERE " + filterstr
query = sqlalchemy.text("""
SELECT COUNT(*)
FROM review
{filterstr}
""".format(filterstr=filterstr))
result = connection.execute(query, filter_data)
count = result.fetchone()[0]
order_by_clause = str()
if sort == "popularity":
order_by_clause = """
ORDER BY popularity DESC
"""
elif sort == "published_on":
order_by_clause = """
ORDER BY review.published_on DESC
"""
elif sort == "random":
order_by_clause = """
ORDER BY RANDOM()
"""
# Note that all revisions' votes are considered in this popularity
query = sqlalchemy.text("""
SELECT review.id,
review.entity_id,
review.entity_type,
review.edits,
review.is_draft,
review.is_hidden,
review.license_id,
review.language,
review.source,
review.source_url,
review.user_id,
"user".display_name,
"user".show_gravatar,
"user".is_blocked,
"user".email,
"user".created as user_created,
"user".musicbrainz_id,
review.published_on,
MIN(revision.timestamp) as created,
SUM(
CASE WHEN vote='t' THEN 1 ELSE 0 END
) AS votes_positive_count,
SUM(
CASE WHEN vote='f' THEN 1 ELSE 0 END
) AS votes_negative_count,
SUM(
CASE WHEN vote = 't' THEN 1
WHEN vote = 'f' THEN -1 WHEN vote IS NULL THEN 0 END
) AS popularity,
latest_revision.id as latest_revision_id,
latest_revision.timestamp as latest_revision_timestamp,
latest_revision.text as text,
latest_revision.rating as rating,
license.full_name,
license.info_url
FROM review
JOIN revision ON review.id = revision.review_id
LEFT JOIN vote ON vote.revision_id = revision.id
JOIN "user" ON review.user_id = "user".id
JOIN license ON license.id = license_id
JOIN (
revision
JOIN (
SELECT review.id AS review_uuid,
MAX(timestamp) AS latest_timestamp
FROM review
JOIN revision ON review.id = review_id
GROUP BY review.id
) AS latest
ON latest.review_uuid = revision.review_id
AND latest.latest_timestamp = revision.timestamp
) AS latest_revision ON review.id = latest_revision.review_id
{where_clause}
GROUP BY review.id, latest_revision.id, "user".id, license.id
{order_by_clause}
LIMIT :limit
OFFSET :offset
""".format(where_clause=filterstr, order_by_clause=order_by_clause))
filter_data["limit"] = limit
filter_data["offset"] = offset
results = connection.execute(query, filter_data)
rows = results.fetchall()
rows = [dict(row) for row in rows]
# Organise last revision info in reviews
if rows:
for row in rows:
row["rating"] = RATING_SCALE_1_5.get(row["rating"])
row["last_revision"] = {
"id": row.pop("latest_revision_id"),
"timestamp": row.pop("latest_revision_timestamp"),
"text": row["text"],
"rating": row["rating"],
"review_id": row["id"],
}
row["user"] = User({
"id": row["user_id"],
"display_name": row.pop("display_name"),
"show_gravatar": row.pop("show_gravatar"),
"is_blocked": row.pop("is_blocked"),
"musicbrainz_username": row.pop("musicbrainz_id"),
"email": row.pop("email"),
"created": row.pop("user_created"),
})
return rows, count
this can be avoided by using auxiliary functions.
Besides that, methods has large parameters list, inside them is made many checks to know if values are None, to solve this, we use objects to encapsulate the data.
def create(*, entity_id, entity_type, user_id, is_draft, text=None, rating=None,
language=DEFAULT_LANG, license_id=DEFAULT_LICENSE_ID, source=None, source_url=None):
Anyway, I’d really like to help you refactor the code, but I also understand that every problem I’ve cited may have a reason and I’d like to understand it.