MovieBrainz?

Zastai · July 26, 2016, 9:31pm

When I said schema, I meant it more in its general term, not technical specifics like xsd/rng etc. So we agree there - defining the schema/ontology is the major first thing to do. And while it won’t cover everything, it would be nice to cover a lot.

For the cultural thing, I was thinking about looking at things produced by, say, Bollywood. That might suggest categories/relationships/… that would not exist (or only as a very niche thing) in western video.

And I mentioned the web service early because the need for an API was mentioned as a big reason for wanting VB over imdb & co.

Sticking points I see at this point:

film tends to have a huge amount of credits. Do we want everything tracked in VB?
there is going to be significant overlap with musicbrainz for music-related things (dvd/bluray especially). Transclusion of some sort, so that data only needs to be entered once, would be cool. Or, if we keep the schema very similar, imports (and later syncs) may be fairly easy (adding mutual url relationships to shared artists etc).
with bb having creators, mb artists, and vb X (possibly artists too), I’m wondering whether some sort of common people database wouldn’t make sense, to keep the basic details like birthdate etc. in one place and prevent duplicate maintenance. Same would go for areas (and possibly places). CommonBrainz?

thwaller · July 26, 2016, 9:34pm

Something to consider here is an ongoing discussion in MB, what is and what is not a release and the exclusion of home-made releases. The question I have is this… is that content valuable for people today and in the future? YouTube will have some real content of value, but also some person filming himself jumping off of things. It would need to be determined what qualifies and it will be a grey area, just as a bootleg is in MB.

Sure. I use TMDB and TVDB with trakt.tv and Kodi. First, I need to say I am not at all tied to any of these services except as a user and editor and/or reporting data issues for admins to correct. Trakt.tv is a movie/tv scrobbling and history service. It keeps track of what movies and tv shows you have watched and will keep lists. Lists like a watch list, scary movies list, etc, any way you wish to collect. Kodi is a media player, a bit more elaborate than let’s say VLC, but a media player none the less. I use Kodi with trakt.tv in that when I watch a movie on Kodi, it uses the Trakt API to tell Trakt I watched the movie and give it a user rating. That data is then stored on Trakt as well as internally to Kodi.

Hoping all that makes sense, generally speaking, Trakt uses TVDB and TMDB to get its data on the movies and shows it keeps record of. So if I watch “Movie A”, it needs to be referenced in TMDB in order for Trakt to know what it is, so that when it is told I watched it, it knows what I watched. There is the need for a database with complete and correct information. TMDB does well with movies, and to use the terms I use as well as many othere, TVDB is used because there is nothing else to use. It may be flawed and many have complaints, but they have the most editors giving them the best overall content there is to use with an open API which is needed.

Another use for TVDB and TMDB is to create nfo files for movies and tv shows. Although nfo files were originally used for warez content, that extension has also been adopted to store metadata on video content. From that file, media players, such as Kodi, can read the file and populate the media player with the details of the file you have included in its library. In the generation of those nfo files, you need sources of data, and TVDB and TMDB are two common and top choices for that. In addition to Kodi pulling data for this purpose, I also use an external media manger called TinyMediaManager. TMM also pulls data in order to populate its meta data fields and TVDB and TMDB are again top options there.

So in summary, the initial need that I was looking to satisfy is a replacement for TVDB at a minimum due to certain issues that exist there. Replacing TMDB was not a intention, but to replace 2 services with 1, that is just more efficient.

Please note that when I say replace, I am not looking to shut other services down. Replace meaning for my uses, and the many others that do the same, replace what I am currently using with a new source of data. This exact discussion has been started by many others in other places, so I know there is an interest in the above outlined usage.

thwaller · July 26, 2016, 9:41pm

We can look to IMDB, and TMDB for resources on how this is handled. I would not think it is worth tracking everything, just key roles.

Just my opinion here, but I think that for example with “Movie A”, there is one main entry for the movie. It can then be selected that it was released on BluRay, DVD, etc. Similar to release events on MB releases. Although I understand the multiple releases used in MB, I do question its real tangible use. Now I need to add… I do have multiple versions of the same physical CD in my collection, as do I have multiple versions of the same DVD in my collection. Simple explanation, the CDs have a different cover and one DVD cover is in Chinese and the other in English. In MB, those are 2 different releases. In MB, do they need to be? Or can it be 2 release events. There is no track list as there is on a CD, so the movie is defined as theatrical, extended, uncut, unrated, etc. Each of those would need distinction. Maybe that is the release versions under the release group of the movie itself?

thwaller · July 26, 2016, 9:43pm

YES! TMDB has an API for non-commercial use. API codes have been known to be canceled for excessive use. TVDB I do not believe has that issue. IMDB is not an open API, so it cannot really be utilized in the manner needed.

Zastai · July 26, 2016, 10:03pm

I would definitely consider the availability of language tracks (audio/subtitle) as an important distinction point. And packaging (plain box, steelbook, etc). And barcodes. And special features (which brings up the question of how to handle “track lists” correctly for dvd/bluray).

I don’t think it makes sense to take it as far as musicbrainz when it comes to different pressings (which might have different media markers), but I would definitely want different editions to be separate entities.
In my case, for some films (animation mostly), I have different editions, to have the flemish and english, or japanese and english editions. I’d want those to be multiple entries in my collection on VB, not one that has multiple release events.

dukeja · July 26, 2016, 10:09pm

I’m not so worried about that because I think it will be limited by what the community of data contributors actually provide. I don’t see us needing to do anything in the schema to have limits. However, we may want attributes in those credit records to assist with the API in providing rich search capabilities, or subcategories of credits. For example, typically for a Movie the Actors are listed first, either in order of appearance, or based upon the importance of the role (leading character, vs supporting); with other major non-acting credits such as Directory, Producer, Screenplay, Soundtrack credits. We may want to provide a means to list the credits through various filters. But in general, I’m against limiting the schema from being able to represent credits people want to provide.

Definitely with you on this. As I said before, I’d really like to see this merge eventually. But that may never be. And it may turn out for the best for the Databases to be separate. I’m a big fan of entering data once. What happens here will depend alot on how we plan to integrate the different databases.

I think this is a great idea. I think there are other areas of commonality as well.

Absolutely. As well as subtitles (by language), special audio tracks (commentary tracks, or tracks for the visually impaired). I don’t think tracks work well for our purposes. The order of the contents of a DVD or Blu-Ray disc have little to do with the experience a user has with them. For those the user normally goes through some kind of menu system built into the DVD/Blu-Ray. Not only that - but think of the case of movies that span discs; or multiple variants of the same movie on a single DVD/Blu-Ray release. For example, just today I was ripping the movie Avatar for my media library. It comes in 3 DVD’s that have 3 variants of the Movie. The Theatrical release, an extended cut, and a special edition cut. In addition, the movies span multiple discs. So I see having Release entities that represent the “Package”. That “Package” would contain media (discs) that in turn contain content. But the Package as a whole contains the Movie. And a DVD Package might contain MANY movies. The other day I was working on a Humphrey Bogart collection with dozens of his movies included in the set.

So I’ve seen cases where one DVD package contains many movies spread across multiple discs - often with a single disc containing multiple movies. And I’ve seen DVD packages that contain one movie in multiple forms that span multiple discs. And that doesn’t even talk about all the other stuff that is often included with the DVD. In the Avatar case the Movie itself (in its three forms) spans Disc 1 and Disc 2. But Disc 3 contains bonus material (and that’s often the language used in the packaging) such as a documentary on “The making of Avatar”; some trailers and other material. For my part, it’s this other material that has caused me to want to move away from My Movies. With the My Movies database I have a really good way to represent all of my Movies and TV shows. But the extra material is left out. I’d really like to have things like “The Making of Avatar” reflected in the database. Because that translates into how I use their service. I no longer view my movies through the “DVD Menu” of the original media. I extract all that material, store it and serve it as I want. But to do that well, I need a way to catalog all the extra material so that my media player can present it to me in a meaningful way.

dukeja · July 26, 2016, 10:11pm

Absolutely. A well designed open API is essential. Ideally in multiple forms to make adoption of the service easier.

thwaller · July 26, 2016, 10:27pm

I have 2 points on that. First one is that I would agree that there can be as many releases as needed, but a change in the mentality is what I mean there. A release in MB is a FULL release. I am thinking more on the Discogs and AllMusic method where there is a master release, then versions under it. In this way, there would not be duplicate information, like track lists, added, just a more elaborate version of a release event. I am not really aware of any of the current resources going to that level, typically a movie is indexed as a movie, the barcode and all of that is not considered. Release country to country is a release event with a date, certification and type.

Second is that on a digital level, most of that does not even matter. If I have the movie in a language, I have that movie in that language. So, although I do respect the details noted in the prior, for my purposes, all I need is the movie to be listed. Meaning, I can look and see “Movie A” and say… yes I have it and yes I watched it. It does not matter for the purpose I mentioned whether or not the movie has barcode 123 or 456, or came in a DVD case or jewel case, all that matters is “Movie A”. Having said that, I am not at all disagreeing that more detail would make the implementation here stand out, as long as there would be consideration for those who want to use it for purposes that do not want that detail. Example, maybe one could reference the “release group” for those purposes and not a specific release under it.

That is one area that MB fails for half of what I use it for. If I am making a playlist for an event, I do not need to be worried about a label or country or what sticker is where, I just need to know “Song A”. Personally, if I made it for myself, I would make a release group for “CD1” separate for a 12 track standard issue and a 15 track deluxe issue. Under each group would be the different meta data for the barcode and such, but the meat of the release would be in the group and the release nuder it would be an enhanced release event if that makes sense. Just an idea that may accommodate all users.

thwaller · July 26, 2016, 10:31pm

To further elaborate on my prior post, in the MB example, that concept would solve the M4A vs MP3 digital release issue. I could look under the 12 track standard issue and see it was released digitally. Look at that, it would say available in M4A 256, MP3 320 and MP3 192 … then state the references as to where they are. There is no need to have multiple duplicates differing only in source and file type (or barcode) when the music itself is the same… same recordings, same order, etc. The track list and the meat is just kept at the high level.

Maybe that mentality can be used here?

dukeja · July 26, 2016, 10:33pm

thwaller:

Second is that on a digital level, most of that does not even matter. If I have the movie in a language, I have that movie in that language. So, although I do respect the details noted in the prior, for my purposes, all I need is the movie to be listed. Meaning, I can look and see “Movie A” and say… yes I have it and yes I watched it. It does not matter for the purpose I mentioned whether or not the movie has barcode 123 or 456, or came in a DVD case or jewel case, all that matters is “Movie A”. Having said that, I am not at all disagreeing that more detail would make the implementation here stand out, as long as there would be consideration for those who want to use it for purposes that do not want that detail. Example, maybe one could reference the “release group” for those purposes and not a specific release under it.

From my experiences with My Movies - having a barcode is an excellent way to find the metadata that matches what I have; if what I have is a DVD. If all I have is an MP4 file of the movie; then I would want to look up the metadata about that Movie - not for the DVD. If you follow what I’m saying. By modeling things properly in the database, it makes the job of finding what you’re looking for work better. That, and some kind of “disc signature” method as well.

I should also mention - my digital content retains all the various language tracks of the original. So it’s not correct to say that my digital version of “Avatar” is the english one. It actually retains all of the languages of the original DVD. But from a practical basis, I do take your point. But how do you add to your collection? How do you match up your collection with data in our service? I think we’ll need that extra detail to be able to find things from multiple different approaches so that many different cases can be handled.

thwaller · July 26, 2016, 10:45pm

I agree with all you say, your point taken as well. When I add to my collection, I do not keep what I do not need when doing digital. So I keep the movie with only the language and subtitle (if any) that I want.

Next, I will add a sample structure, in simple form. I believe that what you reference would be a level C in that example, the most specific release. What I refer to in my above example would be a level B reference, caring that it is a directors cut for example, but not what country or barcode it had on it.

In your above example, the digital release you have that retains the DVD content exactly is not really a digital release in my opinion, but a copy of a DVD/BluRay, etc. The release I refer to where I strip out unwanted stuff, is not referencing a release at all, thus the level B reference. The level C ones tie it down. To me, a digital release would be what I bought from iTunes for example. So you might get the M4V file with the iTunes extras folder for example.

Looking at the view, you would see this:

Music
A: My Band
B: Release 1 target
B: Release 1 deluxe
B: Release 1 standard
C: Date, country, barcode, language, etc

Same concept adapted for Movies:
A: My Movie
B: Theatrical
B: Extended
B: Directors Cut
C: Language, country, date, certification, etc.

One could look and quickly see what editions of a movie were made, and what releases are available for each.

thwaller · July 26, 2016, 10:51pm

In my thought, release events would have to rethought. You could tag release event specific. So for you, could could tag each of those separate, or at the high level, ie not separate.

For users, I am sure there are many that want to tag something but do not know or do not care of the specifics, they just want it tagged in general, so they only need to really tag at the deluxe, or directors cut level, noit the label and barcode leve.

Does that make sense at all?

dukeja · July 27, 2016, 12:14pm

Continuing the discussion from How to start a new project: VideoBrainz:

Ok. Now I am extraordinarily confused. Gonna go away, scratch my head a few thousand times with the hopes of articulating my question better. Because I don’t see it the way you do at all.

thwaller · July 27, 2016, 12:37pm

I found that comment a bit off, for lack of a better way to put it. It i loosely mentioned here, but not very clear. First is to go to the IRC:

It is left to me as well very unclear, but step 1 appears to be go to the IRC and talk about starting a project for VideoBrainz. At this time, going there, we can present the general concept as outlined here. We have the general scope and a list of interested participants. It all needs refining, but we have more than just I got an idea to discuss.

There is also this, where a concept was already started:

dukeja · July 27, 2016, 1:09pm

I think ultimately my confusion comes from the fact that I come from a very different world than many of those who are part of the “MetaBrainz establishment”. In my world there are established processes where project proposals are put together by interested parties and submitted to some power that can decide to allow the project to go forward or not. And if it goes forward there is a discernable lifecycle the project goes through and resources provided.

What are the processes in MetaBrainz? Are there any? Or is it simply to convince those with power over the website to support your idea? And what does “support your idea” mean? I suppose what I want now is a place where ideas can be recorded and refined. Some place to store and refine simple documents (perhaps no more complicated than Wiki pages or a text file in a GitHub project).

I can go ahead and create a GitHub project for VideoBrainz on my own personal space and invite others to join me in refining the ideas there. It would come with issue tracking and a place to experiment with ideas. But is that the right thing to do? Is that the proper thing to do? Are there better ways? How have other projects started? For those who have been through creating MetaBrainz projects; are there lessons learned? What did you do? What went well? What didn’t go so well?

I’m trying to do what’s right and that fits into the culture of MetaBrainz. But I have no idea what that is. Can someone help me understand that and give me guidance. I’ve never used IRC before, and when I visited there yesterday there were very down and dirty discussions concerning builds and such going on. It didn’t feel right to jump in with my questions in the midst of what was going on. Again - I don’t know the culture.

thwaller · July 27, 2016, 1:27pm

I cannot agree with you more. The main reason I have not gone there is simply that IRC is a demotivator for me. In my understanding, we simply go there and toss the idea out and where it currently stands - before going much further. From there, we can see what kind of support is available… from my understanding at least.

For reasons I do not understand, having this conversation on the IRC is different than having it here, I am assuming.

dukeja · July 27, 2016, 3:10pm

I’ve gone out on a limb and created a GitHub organization called videobrainz and have invited @thwaller and @Zastai to it. Nothing is there yet, but I plan to create a design project to hold our early concept development work. If anyone else is interested in joining in the fun please let me know or stop by the organization on GitHub.

https://github.com/videobrainz

Zastai · July 27, 2016, 5:00pm

Have joined. The wiki tab doesn’t seem to be working; I suppose we could start by each having a markdown file in the tree to use for braindumps, but that would unnecessarily pollute the commit history.

dukeja · July 27, 2016, 5:04pm

Try again. I’m still a bit of a newb with regards to managing GitHub projects. I just enabled write access for all members to the project.

Freso · July 28, 2016, 9:19am

Yes, coming on IRC to propose your idea. More specifically, it would be good to have a topic for discussing it at the weekly (Monday) #MetaBrainz meeting. (I also provide notes of these meetings here on the forum: https://community.metabrainz.org/tags/metabrainz-meeting-notes )

It definitely isn’t the worst way to get started. Actually poking around with some code and getting some schemas set up would make it much more apparent to @Rob (and possibly the MetaBrainz board? Not sure if they’re needed to “officially sanction” a new *Brainz project… they’re probably not ) whether the project is viable.

@LeftmostCat and @LordSputnik are the leads of BookBrainz, so maybe they have some input here?

What “fits into the culture” would be joining us on IRC and hanging out. Don’t necessarily jump into the discussion right away, esp. if you don’t feel comfortable with it. We, MetaBrainz, would definitely appreciate if all leaders of the MetaBrainz projects were available in at least the MetaBrainz IRC channel. All the MeB employees are there, as well @alastairp of AcousticBrainz and @LordSputnik and @LeftmostCat of BookBrainz - not to mention other community members, incl. people from other projects/companies (we have someone from Kodi, the lead of FanArt.tv, someone from BBC, and others hanging out in there).

Feel free to poke me anytime on IRC, and I’ll help you get started once I’m around. (Note the “once I’m around” - I’m online on IRC 24/7, but I’m obviously not sitting and staring at IRC all the time - so hang around and I’ll reply when I am.)