[Suggestion] FeedBrainz - A new MetaBrainz data project for feeds (Podcasts/News etc.)

ClipArtJoel · November 1, 2018, 5:10am

I have a suggestion for the MetaBrainz Foundation and any other developer interested. It’s a new project to store RSS feeds for podcasts or maybe news sites. I call it “FeedBrainz”.

What?
In my early phases of brain storming I imagine a MusicBrainz-like site, That contains podcast data (and news sites if we are game). We enter the name of the podcast like it is a release group and fill out a publisher/artist field. We add an official RSS feed link and we are (maybe) able to add secondary or old ones with disambiguation attached. For each show/news site, I think we would need hard categories to work out if it is a blog, news, audio or video feed. Sometime later it would be cool to have some sort of an episode/article scraper similar to gpodder.net.

Why?
Whenever I want a good data project around media, I immediately come to MetaBrainz. MusicBrainz is such a clean and (very) complete database of music with very good checks against duplicate items. The UX with MB is easy to navigate and even includes extra features for the user.

The fact is, we currently lack a good open-source data project around the idea of feeds and podcasts. The following below I list very good reasons why and make comparisons with gpodder.net, the current open-source database for podcasts.

The curated aspect of MeB means that duplicate RSS feeds (also a proliferation of dead/old ones) and unnamed shows will be easily fixed
The review system that CB has means we can easily leave reviews for podcasts. We could finally power open-source pod/news catching apps with open-source user reviews! This aspect has been very much missing. But if this project could happen, it would mean we no longer feel left out by the “leave a review on your podcast app” comment in shows.
Powering open-source news/podcast apps. With data from MeB we could have a flourishing market of FLOSS applications that handle news or podcasts. These apps could finally have an excellent “discover” tab based off user data in the project.

I don’t expect forward movement on this idea very quickly. But MeB has impressed me in the past with LB being released right around when I wanted good open-source listening data. What does everyone think?

–Joel

Freso · November 1, 2018, 9:38am

For audio based podcasts, those can already be (and are!) entered into MusicBrainz. For text based articles etc., those would (to my understanding) be very welcome on BookBrainz. Did you look into these options and do you still think an entirely new project needs to be created?

There is still a lot of discussion about how to do podcasts in MusicBrainz though, but maybe you can contribute to this discussion to get an outcome that works for your usecase?

dns_server · November 1, 2018, 10:27pm

Musicbrainz is generally against using bots to automatically add entries to the database.
The general argument is this would lower the quality and allow a lot more junk in the database.

Instead of using musicbrainz as the database we could base this on messybrainz database as the place to store this data and allow extra metadata to be stored in that database or create a parallel project that is based on messybrainz id’s as the identifier.

I’ve thought about this before and created the below post with my ideas.
Feedbrainz would fit with my proposal.

ClipArtJoel · November 2, 2018, 12:10am

I don’t think MB is a good fit for audio/video podcasts. Since any app which would use such data would be entirely focused on the RSS feeds, not to mention the fact that the MusicBrainz database is mostly aimed at music releases. I think a separate data project collecting RSS feeds would be better. Also take into account that manually adding weekly shows into a database would become tiresome for a lot of podcasts.

BookBrainz could be a good project to add articles, but I think some else could be explored here with an RSS feed database.

Freso · November 2, 2018, 2:02pm

That might have been a fact prior to 2011’s NGS release (when MB was still mostly an advanced CDDB replacement). Since NGS MusicBrainz’s aim has been much more in line with its stated purpose (specifically: to be “the ultimate source of music information”), capturing a lot of things that are not “music releases” (e.g., works and events). It is still mostly about music, but that doesn’t mean we don’t capture other audio related information too. We’ve been gathering audiobook information for ages, long before NGS came around too.

We are already gathering podcast information and people are not discussing whether to capture this data about them, but rather how.

Aren’t RSS feeds basically just a way to encode series? My position in previous discussions on how (note: not if) to capture podcast information in MusicBrainz have been to collect them as stand-alone recordings collected in Series entities (e.g., like Wild Guesses).

I can absolutely follow and agree with this point. (It would be nice if RSS feeds contained good enough metadata that they could be (at least semi‐)automatically imported into MB.)

I think BB also has or will have the concept of series entities. @mr_monkey may know.

Freso · November 2, 2018, 3:30pm

Sorry if my last reply feels like I’m trying to “shut you down”. That’s not my intention, I’m just failing myself to see what this new project would bring that we can’t do in existing projects.

Maybe you could describe in more detail exactly what it is you want from FeedBrainz? What information should it store? And maybe consider what is missing in current projects (MB, BB) preventing them to fulfil this role?

I never intended to claim that MB (or BB) are currently 100% adequate for capturing this information you want to capture, but I do believe (based on your statements so far) that they could become so—but they won’t unless we figure out what is needed/missing to get them there… Or if such a thing is even possible at all and a new project is indeed the proper path forward.

aerozol · November 2, 2018, 9:30pm

I’ve added a few podcasts in the past, I definitely don’t think a new project is required, especially as it would make use of so many of MB’s existing structure (tagging, acoustid, picard…) and ongoing improvements. Plus potential music podcast crossover and duplication of artist data etc.

What would be awesome is a script that automatically imports a podcast based on the feed! As long as you give the entries a check (mass pure-machine imports are frowned upon as dns_server says) you’re up and running.

I stopped adding podcasts because it did get tedious - some automation is definitely required to do it long-term

edit: I would put some $ towards this if anybody wanted to take the initiative

ClipArtJoel · November 5, 2018, 10:12am

I think BookBrainz is perfectly OK. Have some feedback

The three different data types (work, edition, edition group) are far less automatic then MB. With MB I could create a completely new release and have recordings and the release group made for me. Trying out BB I have to add a lot of this stuff one-by-one. I hope that this improves.
Maybe add a web or blog category into the formats. I interpret eBooks as a self-contained file not a webpage.
For works distributed via the web or republished public domain, I should be able to add links like I can on MB. Without that BB cannot replace my FeedBrainz idea, as FeedBrainz would have direct links in each “work” or “recording”

I am going to try MB and give feedback, However I think I should join your style guideline discussion. If someone could bring me up-to-speed with the problems or different methods I would be happy to weigh in.

arturus · November 11, 2018, 2:23am

I’ve written python scripts to semi-automate adding podcasts before. None of my current scripts are sufficiently generic to be worth sharing, but you can get a pretty long way with release editor seeding, although there’s some pain points. An automated way to get the newly added release joined to a series would be a big help for podcasts, as would being able to seed the [none] barcode.

One idea I’ve toyed with is making a website where known podcasts could be set up with appropriate customizations for importing. Once set up, episodes not yet added to MB would be displayed, and interested users could click to get a seeded release editor, review the details, and add the episode relatively easily.

I’ve got enough going on that I’m not up for taking on that right at the moment, but it’s good to know that other people might be interested in that sort of thing.