Development processes for AcousticBrainz

alastairp · April 7, 2016, 8:52am

Now that we have more people working on AcousticBrainz I’ve been thinking about how we can add some structure to the way we plan and develop new features. I’m putting down some ideas here, and hope to get feedback from the community about what we think will work well or not.

Bugs and new feature tickets
We use MetaBrainz Jira for all tickets, in the AcousticBrainz component.
Feature tickets should be fully described. This means that the layout of any websites should be fully explained, either in text, sketches, or mockups. All items/links on the page should have their behaviour described (e.g. “Table columns should be sortable, with the default being descending date”, or “Clicking ‘evaluate’ takes you to a page showing options for evaluation, which are x, y, z” (and then a description of how the next page works))
Implementation details are not as important as behaviour, although if there are some open questions they should also be discussed on the ticket.

Development
Development should only be started once features are agreed on in the jira ticket.
All development must happen on a feature branch. Even if you create a fork and plan to make a pull request, development in your fork must be on a branch and not master.

Code review
We use github pull requests for code review. When creating the PR, you should always include a link to the Jira ticket in the message. On Jira, Resolve the ticket (this puts it into the PR state).
Every change should be OKd by at least one other person. For emergency bug fixes, they should have a PR opened even if the change was previously deployed.
PR discussion should be limited to style and bugs. New functionality that discovered to be needed should make discussion move back to the jira ticket.

Open questions
I have a few questions which I’m not sure about.

Should we have a ticket for every piece of work that we want to work on? Sometimes we find a small bug or think of a new feature that we want to add, and I currently just add it. The first notice that I was doing it comes at the PR stage. Should we require public outlines of all new components?
Work in progress code reviews can be useful to see how a task is progressing and catch problems early, but they might move implementation discussion from jira to the PR. Is this OK?
How strict should be be on commit messages? Should all commits for a ticket include the ticket number, or should we put this in the branch name and always use merge commits?

Please let me know what you think, or if there are any unanswered questions you have!

Deleted_Editor_1240840 · April 7, 2016, 11:38pm

These seem to be too restrictive. If someone (especially not involved in development process) wants a particular feature, do they need to describe all these details of how it needs to work? I think that’s asking too much and not really enforceable.

Does that mean no pushes directly to master? I’d prefer that to reviewing changes when they are ready to be deployed or already are deployed. Even if it’s something small. See teamwork - Good guidelines and practices for mandatory code review - Software Engineering Stack Exchange, for example. I especially like the point about making reviews short. I’d prefer that to having pull requests with features that just get more and more complex and don’t get merged for months. This causes merge conflicts and significantly slows down the development process.

I don’t think that’s necessary. If there’s not enough info in the commit description, then we need to fix that instead.

I think it is. At some point discussing actual code for me is better than just throwing ideas around. Though we probably should decide what to do with pull requests that just get abandoned. I’d prefer not to have them around unless author is still working on them.

Putting a ticket number doesn’t seem that useful unless we squash all commits before merging. For example we have two commits in one PR:

“AB-1234: Fix a bug that caused that bad thing to happen”
“Add some thing that I forgot to do in a previous commit”

This is very similar to what I have in Better dataset validation by gentlecat · Pull Request #166 · metabrainz/acousticbrainz-server · GitHub (there are other examples). It would be nice to squash these two because one just fixes another that wasn’t yet merged.

For example, tools like Gerrit don’t even allow you to have multiple commits in one change (https://gerrit-documentation.storage.googleapis.com/Documentation/2.12.2/user-upload.html#_git_push). You are supposed to amend all updates.

Freso · April 9, 2016, 1:38pm

One benefit of requiring that all (somewhat significant; fixing a typo or removing trailing whitespace might not warrant this) changes is in the ticket tracker, is that you can use them to both generate a changelog for releases but also to keep track of what things you’re targeting for a specific release. E.g., Error - MetaBrainz Tickets

As long as the ticket links to the PR and the PR links to the ticket, I personally don’t see an issue.

Always using merge commits seem like a good road to go down - this also helps keep track of who actually pulled the change in and can contain some additional notes that may not be specific to any one single commit in the branch being merged.

Deleted_Editor_1240840 · April 10, 2016, 6:01am

We don’t do releases though. Just deploy updates occasionally and announce big features on the blog.

This actually reminds me that we should set up proper deployments and testing in Jenkins. I see changes in master that are still not deployed. If something breaks with the next one, it might be harder to find the source of problem. And I don’t see any reason to not deploy them.

This doesn’t require merge commits. Author and commiter can be different people. See github - What is the difference between author and committer in Git? - Stack Overflow for example.

Blog post with some good ideas that I just found: https://blog.spreedly.com/2014/06/24/merge-pull-request-considered-harmful/.

chirlu · April 10, 2016, 6:45pm

Has nothing to do with merge vs. non-merge, or in fact git at all, though; only with limitations in GitHub.

Freso · April 11, 2016, 9:45am

The “occassional deployments” could be considered releases, even if they’re not widely announced.

Yes, but that requires that whomever pulls in the branches actively changes the commits, at which point we might just as well submit patches to a mailing list rather than actual Git objects that are already committed…

chirlu · April 11, 2016, 12:40pm

I don’t see a problem with that, in principle. It’s certainly necessary when the author isn’t able to squash a messy development process into reasonable commits themselves.

alastairp · April 15, 2016, 2:39pm

Thanks for the comments everyone!

This is a really good point. The thing which I want to try and promote is that everyone works in the open so that anyone else can contribute and give feedback. Since MusicBrainz is a very open project I wanted to encourage that also in the development of AcousticBrainz. I agree that we shouldn’t stop anyone from working on a project personally and then presenting it to the community, but people who are working on AB in an “official” capacity (e.g., SoC students or MeB employees) should be planning and developing in the open. How can we describe this better?

Yes, I think so. I said this in the Development section (always work on a feature branch). I want a little bit of flexibility especially for emergency bug fixes, though.

Good point. How can we keep them small? Either start the PR before it’s finished so that we can start discussion, or break the code into small parts. I don’t have a problem with having multiple PRs per jira ticket if it can be split up into small parts

This comes back to the point about main developers being open about the work that they are doing. Again, we shouldn’t stop someone from submitting a PR where they add some extra functionality, or fix some spelling, however if someone comes to us and says “I’m interested in working on a feature which does x”, the first thing we should try and do is get a record of it so that we can start discussing it.

I agree. I don’t think we need to squash before merge, unless the PR author wants to do that as a last step. Perhaps if they’re working on a branch they can squash before they make the PR. I sometimes rebase/squash branches, but perhaps we could decide to stop this behaviour.

Right. I also would argue that you can make this with a git log, though it may be a little more difficult. What I was getting at by making this suggestion was that it would keep everyone else more up to date with exactly what everyone else is working on.

Deleted_Editor_1240840 · April 15, 2016, 5:28pm

Yeah, as long as they are reviewed afterwards as well.

Leftmost_Cat · April 17, 2016, 6:39am

I think a combination of this works well. Aim for small PRs and, if you have to make a larger one, make the PR early in the process as [WIP] to invite comments as you go along.

I tend to think it’s easier to follow a single PR per ticket. (It should even be possible to automatically link PRs to their related ticket.) If a ticket can be split up into small parts, I’d encourage people to create subtasks and do one PR per ticket number. I typically find this easiest to follow.

My own tendency balances a predilection for commits to tell a story and a desire not to rewrite history: I tend to work on a branch locally until I have something I think is at least worthy of getting feedback, then take that feedback and repeat. Between pushes, I’ll often rebase and edit commits to squash fixup commits and try to keep changes self-contained, but once I push, I don’t play with history. This, of course, reflects my own personal views of how best to communicate with someone reading through commit history.