Tags: #<Tag:0x00007f0509ead318>


My initial reaction to this is that I don’t really like it. My preferred approach is to have the pure data model recorded in the database - so there is no difference in that aspect. But that the ONLY way to work with the data, apart from certain administrative/maintenance functions, is through the web service. EVERYTHING else works through that. All the business logic associated with the Ontology would be enforced in the web service. I’m far more worried about duplicating the business logic and enforcement of policies in the clients than I am about duplication of the schema. As I understand what you have setup, critical business logic will need to be implemented in the client side JavaScript. That will inevitably lead to duplication of the business logic in other client libraries or in the clients themselves.

Yeah - rate limits and such are absolutely critical. And yes, we’ll need to support various mechanisms of the OAuth 2.0 protocol including a token/secret approach. I also think that much of the interface will work without authentication. Essentially, unauthenticated connections would be read-only. No edits, no comments, no ratings, no usage data submissions.

Speaking of OAuth 2.0. How is it deployed/integrated in existing MetaBrainz servers and services? Is this deployment/integration evolving? What is the end goal?


The mention of free-form relationships so anyone can just add a role doesn’t sit well with me. Even leaving aside the quality problems (misspellings, case differences, …), this is very English-centric. In order for the data to be localizable properly, I think relationships need to be curated properly.


Hmmm. I’m trying to follow this thought in the context of this thread and I don’t see it discussed in the context of VideoBrainz much. Are you bringing in topics that have occurred elsewhere into this discussion - because I’m confused. However, your comment does bring in several topics worthy of discussion. I see issues of editorial rights, schema design approaches, and localization entering in with your comment.

On Editorial rights. I don’t see why every account should have equal access to the database. I believe our purpose is to create a free and open database of video metadata of the highest quality, accuracy, and breadth. If that is the case, we have several problems we need to confront.

First, we need to encourage people to become contributors. The more contributors we have, the more data we will accumulate. The videos I’m most interested in will differ widely from other people. And by pulling from as broad a base as possible we will have a database that spans as many interests and cultures as possible. However, that comes with its own problem. Not all contributors do good work.

And that brings me to the second problem: we need to encourage high quality contributions. We cannot rely entirely on the editorial process to keep the data of high quality. As Oliver Charles pointed out years ago in this blog post, submissions are outpacing the ability of reviewers to look everything over. While the new editing system is designed to make matters better; I highly doubt it will eliminate it. We will need a hierarchy of accounts with increasing levels of access to the database. What those tiers are, what rights go with them, and how people move up and down the tiers is not something that I’m ready to discuss. But the existence of different levels of access and editorial rights I think is clearly needed.

Schema Design Approaches Part of the ontology will be various taxonomies that categorize things. Some of these will occur in relationships. For example, suppose we have a “Movie” entity, and a “Person” entity. We may then have relationships between those two kinds of entities. A specific Movie will have a whole host of Person’s who contributed to that Movie in one form or another: the Cast and Crew to employ the terms typically used. So we’ll have Cast relationships which will also include what “Role” that Actor performed. We’ll also have Crew relationships which will describe the Job that Person had in the production of the Movie. Will we have a semi-fixed taxonomy of jobs? If we do, how is that taxonomy created and maintained? Will it evolve over time? I think that this list will exist, and that it will change over time, and that the process for changing that list will be similar to changing any other part of the schema - in other words - very difficult and with significant impact. Changing any taxonomy that classifies items will imply that all data created using the older version of the list may need to be changed to use the new categorization system.

One approach that we can have to managing such taxonomies is to represent the taxonomy in the database itself. Not as part of the database definition; but in tables of its own. But having those represented in the database does not mean that they are freely editable.

Localization Localization needs to be a first tier capability. What I mean by that is that localization of all data should be incorporated in all of our designs from the very beginning. The discussion on instruments with disambiguation comments has been very instructive. As in the case of instruments, job titles in VideoBrainz will have need of translations. We are in a good position to ask the MusicBrainz folks: “If you could design the schema from scratch now - what would you do different?” Maintenance of the MusicBrainz schema is significantly burdened with a large existing database and mature collection of software built up. Fundamental changes to the schema represents a HUGE undertaking. VideoBrainz currently doesn’t have that burden so has the opportunity to make fundamental changes to the approach.


My other reply really went on several tangents and didn’t directly address this comment. I have a couple questions: How is the use of free-form relationships “very English-centric”? How does structure of the schema become language specific?

What do you mean by “free-form relationships”? Where was that discussed? Can you point me to the discussion?


I assume that’s about this section:

That sounds a bit like you want to have the user just write down what the person did rather than pick from a closed set of options, which makes it hard (if not impossible) to translate it simply.

This is a bad idea IMO. MusicBrainz is trying to move away from votes and auto-editorship and towards a Wikipedia style “just revert errors” philosophy, because of multiple reasons, but among them, that if you expect most people add good (or at least not bad) information, putting roadblocks on them is not ideal, and it discourages additions - which, when they happen, either go unnoticed, or are policed too strictly (since some users would rather reject any submission that isn’t perfect, which is clearly problematic because not having any data is worse than having imperfect data).

There are a few things that we do intend to keep limited (who can add new relationship types, probably areas and instruments) but the general idea is that there should be as little a difference as possible between all users, and that we shouldn’t roadblock people (like we currently do with the 7 days voting phase).


I think you read too much into what I said. In fact, you said: “There are a few things that we do intend to keep limited”. So you agree that there are multiple levels of access? In fact, I agree with all that you said. But it’s important to recognize that, however limited, not all editing is equal. Some things must be controlled. I would even go so far as to give certain “moderators” the power to, very judiciously, lock certain entries or to restrict certain editors. It’s a rare occurrence; but sometimes there can be “editing” wars that degrade the database or rogue editors that seem to want to do what they want to do regardless of the rules and guidelines. Having the ability to intervene in such cases is a necessary ability; but should be used as a last resort. As a rule I am an optimist who believes that most of our editors seek the best interests of the community; and that some data is better than no data.


That wasn’t my intent at all. In fact, I believe that we should have a carefully worked out taxonomy of jobs. Regarding the “Roles” - well, the editor would need to provide that. Unless we wanted to have a new entity for “Roles”. That could be interesting because some roles are recurrent; how many “Bat Man’s” have we had? How many “Sherlock Holms”, or whatever. Or perhaps “Person” should include fictitious persons. But I digress.

On the topic of “Jobs” for crew. There does seem to be a discernable taxonomy. But that taxonomy is itself evolving. I have an idea on how to handle the seemingly conflicting desire to control changes to taxonomies such as “Jobs”, as well as reduce roadblocks to editing as much as possible. My idea is that we should have a taxonomy - with very tight controls on making changes to that taxonomy. But that taxonomy should always include an “Other” category. When a user uses the “Other” crew job; they should also provide (or be able to provide) additional data so that those who update the Jobs taxonomy can use it as input for extending it.


My reaction was based on:

I’m totally with you here. This is exactly the thing I most like about MusicBrainz. I don’t see any problem with this. All of the things you just listed are a variation of a single kind of relationship - associating a person with the video of interest. It’s a simple relationship that states ; i.e. <“Joe Dancer”, “My Dance Video”, “Dancer”>. There will certainly be many discussions on what kinds of relationships there should be; much like there is for MusicBrainz. But in the end, these are just enumerations and not all that difficult to implement and support.

I may have misunderstood that, but it sounded like the job would be just a piece of entered data. For a role that makes sense (but can also be subject to a need for localization, e.g. in the case of children’s films that frequently have dubs and where characters will often have different names than in the original; the Harry Potter films are an especially good rxample of this). But for jobs that just makes for a messy database.


Having the jobs be a database entity (like instruments and areas) is fine (so no schema change needed for adding them). And having an Other where the UI would enforce the addition of extra information sounds goid too (avoids the “add an annotation” solution used by MusicBrainz).


I was simply presenting a conceptual construct without addressing how “Dancer” would be represented. Perhaps I should have said <"Joe Dancer, “My Dance Video”, DANCER>. The actual mechanism we use to represent jobs in the database wasn’t addressed in that response. I do in fact support the idea of a taxonomy of jobs that is controlled and that supports localization. In fact, my line: “There will certainly be many discussions on what kinds of relationships there should be”; and “these are just enumerations” seems to support the idea that these are NOT just user data entries.


I’m in the process of making yet another media streamer/organizer/blah app myself and would love to see a central place to get good JSON metadata from an API. While the big three (themoviedb, tvdb and tvmaze) have decent data they all have their little quirks. I still don’t get what tvdb’s issue is with shows like WWE and the like. If people are providing the data why would you purge it?

Anywho, some of the limits I see using all the providers.

  1. Not as easy to tie cast/crew together via uuid and the like. Therefore I have alot of “duplicate” rows containing person data. Having 600k rows isn’t a big deal but not having matching id’s makes “also in…” queries problematic.
  2. Lack of sporting data…not really interested in game stats like yards, players, etc. But who played what/when and final score seems reasonable.
  3. Lack of international titles.
  4. Anime…this relates to #3 as well.
  5. Limited localization of returned JSON data.
  6. Ignoring of “internet” based shows/streams.


Another good source of data is wikidata so putting a useful api on top of it.


3 posts were merged into an existing topic: ‘brainz’ fantasy list