This is a much larger question and could be extracted out into another thread. We’re interested in automatic review of data, we think that with some tools that we have been working on that we can determine that if a recording has multiple submissions, which ones are bad.
Do you have other ideas about review? What data in specific do you mean? For the results of high-level data, we can definitely do a lot about improving the models that we have. Do you know about our dataset creation tools? We had some ideas about tools for giving feedback (e.g. saying that a prediction is incorrect), but haven’t proceeded on this. If you’re interested then I’d definitely be happy to talk more about it!
Thanks for making these changes and opening the pull request! There are a lot of changes here, so it might take me a while to get through all of the changes. I’ll send you more comments on the PR.