MovieBrainz?

I’ve got a few bits of input here:

  1. We own the domains moviebrainz.org and videobrainz.org. What are the goals for your project? Which of the two names is more appropriate?

  2. The MetaBrainz team members (especially myself) are on a moratorium for creating new projects. However, this does not apply to new projects being created by the community.

  3. The board does not need to be involved in this decision, especially since this project is in it’s infancy right now. We’ll need to wait and see if the project really does become mature enough to launch. Many projects that start like this have high goals, but actually seeing the project through to an initial launch is quite a challenge and a lot of people tend to not get that far. This isn’t a reflection on your abilities to deliver, but a cold hard look at the history of people promising things and then not delivering much. So far, BookBrainz is the only real project that came out of the community without direct initial support from MetaBrainz.

  4. Please participate as Freso suggested – if you continue to be part of the project and get to a point where something needs to go into an alpha/beta for a wider audience, we can provide server space and a logo.

4 Likes

Just adding a few thoughts of my own:

  • For all intents and purposes, a video is a video regardless of whether it contains two people having intercourse of the one kind or the other kind, so putting up some kind of barrier as to what can and can’t be added to VideoBrainz based on morals alone is arbitrary and, IMHO, against the “spirit” or “idea” of MetaBrainz (capturing all information!). A discussion similar to this was had not too long ago in another topic here on the forum.
  • There are a variety of “ranking” systems out there. These could well be used to limit information from minors or people who do not with to see things of various natures. That said, my vision of VideoBrainz is that it contains information, textual information. The title of “Girls Gone Wild” (or whatever) is not by itself NSFW (IMHO at least :slight_smile: ), though the cover art of it would likely be. VideoBrainz is unlikely to get cover art in its first iteration. BookBrainz still doesn’t have anything for covers.
  • I want to be able to add <30 second YouTube clips. One of my favourite YouTubers, frezned, had (he has taken down all his videos :cry:) a lot of videos which were very short, but in some cases very well produced. They would fit well in a series of some form, which the current offers (notably TVDb) out there are unable to fit. I’d also want to be able to capture information about remixes of e.g.,
    https://www.youtube.com/watch?v=EzNhaLUT520
  • I want to be able to catch all the information about videos. Who are the background dancers? Who did their makeup? Who did the choregraphy? Who were the runners of the set? (Did the runners later “graduate” to more “important” positions on film sets? If we don’t know who the runners were, we’re unable to derive this information.)
  • Related to the above, I also want for VideoBrainz to capture music video information. A lot of information about music videos are not relevant to MusicBrainz but would naturally belong in VideoBrainz - e.g., cinematographers, dancers, choreographers, runners, … (Maybe I already mentioned this elsewhere in the thread - if so, sorry for repeating!)
  • One of the things that make MusicBrainz great, is that MusicBrainz focuses on capturing the data. MusicBrainz does not care about how people use this data, and this usage agnosticism allows us to capture some advanced data that might be hard to translate to a direct use case - but it also means that the data is usable by both Spotify for improving their internal data, by BBC and other radio stations to basically replace their internal databases, by “end-users” looking to tag their files, by music analysts/researchers/academics to do some advanced analyses of music data, etc. Be very wary of linking how you structure the data(base) with your perceived use cases. (This is, IMHO, one of the flaws of TMDb/TVDb - it is, as I see it, very much being modelled for use with Kodi, Plex, and similar media centers, rather than being data-first centric.)
3 Likes

@Freso- that is a lot of thought. In order…

  1. Agree with that statement, as long as the video qualifies as a video by specification, content, morals or any other opinions on the actual content should not matter. Just like explicit lyrics in audio, it is just labeled as such, it is not excluded.

  2. True. For the purposes I proposed, cover art would be needed, at least as a reference, or the usefulness would be reduced and possibly bypassed for the current options. Agree on adult content, the images are more the concern over titles.

  3. Clips is way out of scope from my original thought, but can be accommodated. TVDB does not do this really, but IMDB and TMDB do, short film or video flags would apply. A series name can also be applied, like the Star Wars series as an example. Would that satisfy your interest on this point?

  4. That would be great and easy I think. In the “crew” section, or whatever it may be called, you could add as many roles as you like and assign as many people to those roles as you like. We could define core roles, like director, writer, producer, etc, but there does not have to be any limit on who can be added.

  5. Yes, agreed. Although I typically see music videos in a class of its own from video and audio, I think it could fit in just fine. The alternative would be something like… MovieBrainz, TVBrainz, ClipBrains, VideoBrainz, ShortFilmBrainz, etc.We could separate, or work to be all inclusive of video. Thinking about BookBrainz, it is only books, not flyers, magazines, comics, etc correct? I wonder how we decide this. Is BookBrainz a “printed on paper item” listing or a listing of “books”? Would VideoBRainz be a listing of anything with video, or a clearer definition of a video similar to now MusicBrainz uses bootlegs, where a bootleg could be excluded even though it has full artwork and pressed CDs, but it is not really available sort of thing. What qualifies the video content to be included?

  6. I will be light on this one, because I somewhat disagree on parts. MusicBrainz does not do well with fitting my needs, being a person who collects music, plays music, performs music, etc. It is great as an encyclopedia type reference, but near useless for tagging to me. Reason is when I make digital files, I do not need or care to tag what CD by what label and what barcode the digital copy came from, I want to tag the “12 track deluxe release”. MB is by far the best I have used, which is why I support it. The reason is not due to the fact that I can actually use the data, it is that I like the data. I tag myself in MP3TAG, then import and modify it to how MB wants it, then leave it be. Personally, I do not feel the data is “data-centric” I say this because, for example, if it was data-centric, we would differentiate between a MP3 release and a M4A release. I also think the database structure is missing a tier of data in the release area that moves it away from being data-centric. I feel it is less object orientated than it could be, making it less data-centric than it could be. Just the opinion of one person, not complaining.

I think you are correct, it cannot be designed for one use only, because even that one use will change over time. Just like in MB, the lacking of structure for digital releases will show its ugly head at some point, but it was designed around physical releases which is clear as the only thing not physical in the medium options is “Digital Media”, the rest is all physical variations. But we would also need to consider that data is only valuable if it can be used. If no one can use the data, demand and draw lack and the marketplace will evolve to satisfy its own needs. Meaning, someone else will provide a solution to meet real needs, even though its quality will likely be lesser. Give and take, the exact reason that TVDB is number 1 to its alternatives… it fits the need of the consumer, but is less than perfect in many ways.

From the BookBrainz front page:

From discussions on IRC and elsewhere, this includes comic books, colouring books, fliers, etc. (There was at one point a great discussion about what the tag line should be, debating on whether to use “book” or “literature”, since neither of the two felt like they fully encapsulated the goal. It seems like they went with “book” in the end, likely for the simplicity of it.)

In the same vein, while MusicBrainz’ primary focus is music, it also deals with audio books and other forms of audio that could arguably be said to not be “music” per se (e.g., recordings of steam trains in Denmark and nature sounds and… whatever the heck this is).

3 Likes

I see. I would second my agreement that VideoBrainz is best option so far given this.

So conceptually speaking, we have AudioBrainz, VideoBrainz and LiteratureBrainz. Given the structure of 2 of the 3, I agree to match it with focus and intent.

Couple things. #1: I’ll try to be there, but I’m on the east coast in the US and I’ll be at work at the time of the meeting. I should be able to participate; but work will have priority. #2: How does a topic get added to the meeting agenda?

I’ve setup a GitHub organization here: https://github.com/videobrainz. @thwaller, @Zastai, and myself are set as owners. It currently has one project under that to aid in the early discussions.

and

These give me a very clear picture of how things should move out. Thanks.

This is very much understood and appreciated. Both @thwaller and myself have indicated that we don’t think we will have the time to bring this to light by ourselves. But, here we are, and currently things are moving ahead as our time and abilities allow. So, we’ll see how far we get. I’m hopeful this idea will see the light of day, and for the time being at least, I’m doing what I can to move the idea forward, as are others.

2 Likes

Sure. The “standard meeting time” was set to accommodate having active community members in both Europe and North America, which means that it’s slightly late for us in Europe, and perhaps a bit too early for the North Americans - but this is still preferable than having it as something like 3 AM in either of the timezones. :slight_smile:

Also, it’d be preferable if you hung around in the channel outside of meetings too. Discussions/talk happens in the channel all week, not just in the ~1 hour of the meeting on Mondays. :stuck_out_tongue:

You simply add it to the IRC channel’s topic. Ask me when you’re in the room, and I’ll help you. :slight_smile:

Totally agree. There are two main reasons for exclusion to my mind:

  1. It isn’t legal. Typically that would apply to items that are copy protected. Very little metadata fits that definition. The one exception might be for certain images.
  2. It doesn’t fit our very broad definition of a video: a recording of visual content; which may also include other content (typically audio). Thus silent movies fit. But soundtracks apart from video doesn’t fit.

We would be wise to strongly advice that commonly objectionable material should be appropriately labeled so that users can use filters to avoid content that offends them.

I’m not so sure about that. One of the most common reasons for using online movie metadata sites that I have seen is to obtain imagery associated with the movie - fan art, covers, backgrounds, etc. Think about it. VideoBrainz is focused on a very visual media; unlike many books.

I see where it’s out of scope of your original idea; but I think it very much needs to be in scope with the evolved idea of VideoBrainz. It’s a matter of the taxonomy / categories we need to come up with. Data “about” a clip, from our software/database perspective, is little different than talking about a 3 hour movie. I think we can structure the schema in such a way that is flexible in associated attributes such as credits and such so that adding new categories will require few changes to the schema or code to accommodate it.

I’m totally with you here. This is exactly the thing I most like about MusicBrainz. I don’t see any problem with this. All of the things you just listed are a variation of a single kind of relationship - associating a person with the video of interest. It’s a simple relationship that states <Person, Video, RelationshipType>; i.e. <“Joe Dancer”, “My Dance Video”, “Dancer”>. There will certainly be many discussions on what kinds of relationships there should be; much like there is for MusicBrainz. But in the end, these are just enumerations and not all that difficult to implement and support.

I agree and disagree with this. From a data modeling perspective; I agree 100%. However, I think that in the API side of things; particularly in the libraries we should do things with an eye toward how the data will be used. I think we’ll need to make some accommodations in the schema for efficiencies sake; but I don’t think we’ll have to make any compromises in the data modeling. My full time job and what I’ve been doing for the past 15+ years is in modeling and simulations. I have built many schema’s to model highly complex systems that also have to perform well and support a wide variety of end use cases. I’m sure we can develop a schema that doesn’t compromise the depth and fidelity of data modeling but also supports the dominant use cases well.

But I take your point. The counterpoint, though, is that you can also build a schema that is unnecessarily complicated if you totally ignore the main users of the data. So I think it’s important to consider all sides: the data modeling side, as well as the dominant use cases. I don’t see those as conflicting requirements if we consider them together.

Exactly. However, I think much of what makes TVDB good is that they have a nice clean website, and a wide variety of libraries available that make integration of TVDB with other applications easier. It’s not just the simplicity of their data model.

3 Likes

To add a specific application perspective shared by many… what makes TVDB the best there is currently is the data and number of editors. No other site is able to compete simply due to not having the data. From the point of this user trying to make edits in TVDB due to data that is incorrect, slightly off or problematic, that is where the issue is. I will refrain to use specifics here, but most who know what I refer to likely already know the specifics. This is where the Brainz community would come in, those issues do not exist here in my experience.

To me, what makes the Brainz community great is the culture. There is an inherent flaw in any system where the database is open and accepts edits without full and complete supervision. Excluding all that is included in that, the Brainz community has good policies and procedures. They may not always work and might sometimes fail to result accurately, but as I was told when I started, if something gets messed up, we can always fix it.

4 Likes

You make some excellent points. I think that VideoBrainz will always lag sites such as TVDB for the very reason that those other sites make it so extraordinarily easy for someone to become an editor and VideoBrainz submission process would necessarily be more burdensome. Submitting music data to MusicBrainz is very intimidating to a newcomer and will discourage some people from submitting what might be useful data. But that is the cost of striving for quality, accurate data. We will, I’m sure, be examining closely the processes and styles adopted by VideoBrainz; and perhaps we can make things less intimidating. But we’ll only be able to take it so far since most of the intimidation and difficulty will be an unavoidable consequence of striving for quality and accuracy.

On a side note: I came up with a draft Purpose for VideoBrainz, drawing liberally from the BookBrainz purpose. Let me know what you think (I have it posted on the Wiki in the VideoBrainz GitHub project too):

The Open Video Database

VideoBrainz is a project to create an online database of information about every video ever produced; whether it be a movie, TV show, documentary, music video, vintage newsreel, or your favorite cat video on YouTube. We make all the data that we collect available to the whole world to consume and use as they see fit. Anyone can contribute to VideoBrainz, whether through editing our information, helping out with development, or just spreading the word about our project.

2 Likes

I have a question only on this specific segment. This is not a matter of such a video being out of scope as I mentioned earlier, I see no issue with including video from more than just movies and tv shows as I originally identified. When you look at MusicBrainz, there are criteria for a release to be considered a release. If we allow “your favorite cat video on YouTube”, is that not a similar situation to MB, but such a release would be disallowed?

Now… I will add that a homemade compilation music mix is not original content, while taking a video on your phone is. I do see a difference there. My point is more that are we really wanting to allow for anything and everything whoever posts on YouTube to be included? Or, would we want to have some sort of guideline on what qualifies? For music for example, a YouTube release could be considered valid simply by the video being uploaded by the artist, someone like Rihanna for example. What about if your kid posts a video of him making weird sounds and banging a pot with a stick. Now, this is music, but is it allowed?

If you make something and release it “properly”, e.g. like this, then it applies to MusicBrainz. If you make a mixtape and copy it to a handful of friends, then it’s not applicable. The same thing here (IMHO): if you make a home video and make it public (YouTube, Vimeo, Facebook, …), then it’s applicable for VB, but if you only copy it locally (e.g., via mail, USB-sticks, (private) Dropbox, …), then it’s not.

3 Likes

I’m inclined to say - yes, definitely, such things should be allowed. If someone wants to add metadata to VideoBrainz about their kid making weird sounds and banging on a pot; why should we not allow it? What harm does it do? One response to “What harm does it do?” is that it may clutter the database with data that most people find irrelevant and/or annoying; thus hindering the usefulness and appeal of VideoBrainz. But I think that is resolvable by establishing good categorization and attribution policies together with good search/filter capabilities that employ those category and attribution data.

Exactly. One of the key phrases in the Purpose statement is: “about every video ever produced”. Perhaps that should be changed to: “about every video ever published”?

I need to walk that back a bit. For Movie buffs, and TV Buffs too I suppose, titles are advertised well in advance of actual publication. It seems to me that collecting data about yet-to-be released videos is something we want to support as well. So perhaps the more precise definition is: “every video ever published or with a reasonable expectation of publication”. And publication simply means that the video is made available to a reasonably sized audience/market. Essentially, all except “personal” videos shared with a small circle of friends.

But again I have to ask - what is the harm of even including “personal” videos? I’m not saying that we should go out of our way to support such. But should we go out of our way to exclude them?

I’d like to examine that point a bit more closely. Assume that we can ensure videos are properly categorized and attributed; so that search/filter mechanisms work well. Do the videos entered into the database need to be accessible? Do they need to physically exist? I certainly don’t think we should support things that are purely fictional. But what about old movies that have been lost? What about videos that are not publicly available; but are known about publicly. I think I’m convincing myself that “every video ever produced” is the right phrase. I’m certain the vast majority of entries will be about published items. But why should we excluded videos that have not been published? What is the harm in allowing them?

I think that the categories we devise will have very precise qualification standards. To be a “Movie”, the video must meet the precise definition of a movie. But that doesn’t mean that we can’t have a category: “Other”; which simply means - it doesn’t fit any of the other categories; but it is a video.

1 Like

I agree that making a GitHub organisation is a good idea. I think the next logical step would be to try to arrange a meeting (or series of meetings) on IRC involving the key players and try to work out your ontology. You could use Doodle to find a time that works well for everybody.

Having “Artist”, “Creator”, “Publisher” and “Label” entities isn’t really optimal. These entity names refer to roles in a relationship with content. Ideally from my point of view we would instead have “Person” and “Collective”, which could then be used for all four of the above, with MBIDs shared between the different Brainz. Similarly, all works could be made database-independent.

I think one you’ve got your ontology set out, then you should create an SQL schema, like we have in bookbrainz-sql. This is fairly technology independent, and would help get everybody on the same page. I’d suggest using the same vote-less, revertible revision system as BB, because the MB editing system is moving towards that with more and more auto-edits.

I like the idea of being able to record metadata for any YouTube video. Not so keen on storing information for private videos - if they’re not going to be accessible to anyone else, I don’t see the point in storing the metadata for them for others.

I’d like to aim to share as much code as possible between BookBrainz and VideoBrainz. At the moment, most of the work to turn BB into VB would be in redefining the entity display and editing pages, and the revision display page - maybe a month’s work for someone familiar with React and JavaScript. We could also share some of our data access later/data models, like editors, revisions, relationships and the gamification system.

3 Likes

I created the GitHub organization the other day: https://github.com/videobrainz.

I’m still very uncomfortable with IRC. Just not my thing. I’ll go there when I must. I logged into today and added VideoBrainz to the weekly topic list, for example. But I prefer other means of communication - strongly. I tend to be much more of a contemplative developer / collaborator. Which is a polite way to say that I think slowly and tend to express my thoughts in depth (some would say all my posts should start with TL;DR;). Chat oriented collaboration environments tend to make me shut up; which might be a good thing - I don’t know.

I tend to agree with you there. I prefer the names “Person” and “Organization”; but I guess “Collective” is ok - but it gives me visions of the Borg. I view Artist, Creator, Publisher, Label, Best-Boy, Actor, etc as part of a vocabulary of roles - a taxonomy.

Yup - I think we’re all on the same page. I need to learn more about the “same vote-less, revertable revision system as BB”. Beyond “use the source Luke” approach to learning something - is this described somewhere that describes the design?

So also exclude data on lost videos (old movies that have been lost due to fire, chemical deterioration, etc.) but are of historical interest? But I’ll come back with my question: what’s the harm? I guarantee that I can put private music in MusicBrainz. I bet you that some folks already have. Consider the following use cases:

  1. Collection Management. MusicBrainz has some support for this. Why not embrace it 100% and let users store information about private videos that are in their collection. So long as it’s labeled/attributed properly - what’s the harm?
  2. Usage data. Lots of folks want to collect data on listening/watching their media. Even if the video isn’t available publicly; someone may want to share the fact that they listened to / watched something they own privately.

There does seem to be a fairly wide spread and consistent opinion that private items should not be included. But I really haven’t seen a convincing reason given as to why. So I ask again: what’s the harm?

Seems like a good idea to me. I think our priorities will diverge somewhat. For example, I think associated imagery is more important to VideoBrainz than it is for BookBrainz. But that’s not a bad thing.

I’d like to come to a better understanding of the technology selections that have been made in BB. Python, for example, was described by @Zastai as “a language i hate with a passion”. Personally I love Python and have used it off and on for decades. Personal likes and dislikes have a role in these decisions; but I prefer to make such decisions mostly on the basis of sound technical reasons. So - why have you chosen to use the languages, frameworks, and packages that you have?

1 Like

I only prefer “Collective” over “Organization” because it seems to work better for describing a band, for instance - a collective can be a collective of musicians or a collective of businessmen in a company, while organization is more suited to the latter.

It’s sort of like NES
but it has diverged a bit. We don’t have any documentation of the exact system we’ve implemented (yet).

The difference here for me is that lost movies were at one point publicly available, so recording them has historical usefulness. My problem with including private videos is that it pollutes the publicly useful data, potentially making it less useful. For example, if a user searches through the database for a certain phrase, and there are a large number of private videos with titles containing the phrase, the useful public metadata might not be found. And metadata for private content will be of lower quality than the public metadata, since editors won’t be collaborating on improving it. If those matters were solved or made irrelevant, I wouldn’t be against storing private metadata.

We definitely want covers in BB, but the process of getting it set up has stalled a little (@reosarevok - let’s discuss that soon!). I guess VB might also want promotional posters (although BB could use them too), promotional stills, perhaps any bonus concept art included in the release? The big obstacle to getting all of this done is communicating what we want with the internet archive and getting stuff set up there.

We aren’t actually using Python any more. Python was my favourite language, and Node.js was what @LeftmostCat favoured for site development. We started out with Node.js for the website, and Python for the web service (flask) and schema definition (SQLAlchemy). The website accessed the data through the Python web service. However, this meant that we ended up maintaining three data model definitions - one for the schema, one for the web service output in Python, and one for the web service output in JavaScript. So for 6 months until April, we changed the site to directly access the database. The schema is now defined using SQL, and we have a JavaScript interface to the database which the site uses directly. I’ve also just started writing a new web API making use of the same JavaScript models.

So now we have the Node.js site, with pages rendered server-side and client-side with the React framework. We use bookshelf.js for our data models, but we’re planning to ditch the ORM and just using the knex query builder and data manipulation functions, since JavaScript ORMs aren’t very good. Unit testing is performed by mocha, with some browser testing done using Selenium (though this was only added a couple of weeks ago and doesn’t do much yet). We use Redis for storing session data.

The web API will allow applications to use and edit data (an editable web service was one of our key goals with the original Python web service - it worked quite well). In the new web API users will be able to generate a secret API token which will allow them to authenticate with BB via third-party applications and access and modify BB data. We’re going to have per-application rate limits. The web API is being written using the koa.js framework, and I’m taking the opportunity to try out a bunch of new JavaScript features with the hope that some of them can be backported to the site to improve the code there.

4 Likes

I am agreeing with this mostly. When you look at the “rules” in MusicBrainz, meaning what the majority of auto-editors enforce, publically available bootlegs (private) even if available online sometimes do not qualify as a release. So not to call anyone out, I will just say I have examples of this being strictly enforced. I have come to learn that this does not matter to me either way. I was more pro private data when I first started, now I am more against private data, even mildly public data…

The mindset I have come to have, based on and molded by mostly auto-editors in MusicBrainz, is to look at the release and ask a few things. First, is, or was, this release available to the public in a manner that would give it distribution? Second, would the adding of this release benefit anyone aside from myself? And lastly, is the data I have to add complete enough that if no one ever is able to add anything to this release, does my add fairly portray a release that can be identified or of any use?

The last question is more open. If all I have is a track list and recording times with nothing else to offer, to me, that is not a release as it is useless. There is no cover, label, specific release info, source, etc. How would anyone ever accurately match this to their content?

I hope I have explained what I have been taught here. Although I have more or less signed onto this logic, I am not dismissing of the idea to go against it. I just wanted to toss this logic out there for all to consider in addition to what @LordSputnik stated above. It is my belief now that a private video (someone kid playing, a graduation, people at a shooting range for fun, etc) is no different than a home-made compilation of music.

1 Like

Thank you. I do think these issues could be resolved; but it would take a lot of time and effort and would introduce concepts that are currently foreign to the Brainz community; and probably would impact the performance of certain queries. I think that to work properly, private metadata would need to appear as if it is not there at all unless you explicitly choose to include it. So far as reviews and editors are concerned; it’s certainly true that the community cannot verify the accuracy of private data. But it can enforce certain standards of quality such as completeness, internal consistency, spelling and grammar, quality of images provided, and proper attribution to a limited extent.

Another approach would be to address the “Collection Management” use case directly. By that I mean that there would be completely separate and simplified tables to handle private data. This data would only be visible to the account that created them and would be provided purely as a service to that user. Cloud storage for personal media metadata, if you will. But that definitely goes counter to the established purpose of the Brainz community. As an aside, I really want to refer to the community as the MediaBrainz community.

So I guess I’ll have to find some other avenue to satisfy the Collection Management use case. I fully expect to maintain the idea of personal collections as MusicBrainz does. But that would still be inadequate because it will lack certain critical features. Together with the Media Player use case; these two use cases are very important to me. I care as much or more about my private media as I do about the public data. I want to collect extensive information about them; just as much as I want to about the public items. I want that data to be on the internet so I can share it with those few people who care (my extended family and friends), and with myself - I love the cloud paradigm. But currently I have few options; and all are inadequate in several ways. Sad Panda :disappointed:

just wanted to chime in here and say I really liked this idea; the limitations of it makes sense as well. (perhaps this is a subject for another topic?)