Import data from bookogs and comicogs

Madir · July 22, 2020, 8:30pm

Dear all,

I’m not sure if you are familiar with Bookogs (https://books.discogs.com/) and Comicogs (https://comics.discogs.com/). Both sites together had more than 20.000 Users. 150.000 books and 50000 comics were added by the community.

Unfortunately the site comicogs will close at the end of the month and bookogs will close at the end of August. The database dumps (and the images) will be preserved and I think this is the chance to import them at bookbrainz. The data is available in json format. I know the structure of bookogs and I guess comicogs will be similar so I would like to support the import.

It can give a great push in data and the numbers of users might increase if they can continue without loosing their submissions.

What do you think?

best regards
Madir

mr_monkey · July 24, 2020, 5:24pm

Hello Madir, and welcome to the community !

I have been catching up on Bookogs and Comicogs. It’s a shame that they are closing down.

However I think we could transform this into an opportunity for both communities.

BookBrainz is dedicated to open data and will never take it away from its users. MusicBrainz is the best example of that, borne of a similar situation and has been going on strong for 20 years.

BookBrainz is still in beta, but moving forward at a good pace.

User collections are being implemented as we speak, which will the Bookogs community could be interested in.
I would recommend everyone hold on to their exported collections and we can figure out how to import them.

I’d be interested in discussing what parts are currently missing in BookBrainz in order to import the *ogs databases correctly, and shift priorities wherever needed.

In particular some work has been done on a database importer but it is as of yet unfinished.
An obvious currently missing piece of the puzzle are a way to represent series, which will particularly be an issue for Comicogs.

It’s the right time for me to push these two items higher up the priorities list.

All in all, I’ll do whatever I can to help preserve the Bookogs and Comicogs data and welcome their communities, and appreciate your help to achieve that goal!

sound.and.vision · July 25, 2020, 5:17pm

Hi Mr Monkey,

Another fellow migrant from the land of the 'ogs. Been a firm lover of MusicBrainz and so when I saw you had BookBrainz I informed the other refugees of the project!

Hopefully things will go from strength to strength with this project, I am working to get as much of my own contributions previously made to Bookogs over.

My main request would be some way of uploading images or just covers, similar to how we do it on MusicBrainz?

Thanks for taking us in like this!

sound.and.vision

mr_monkey · July 28, 2020, 11:14am

Welcome sound.and.vision!

Thanks for the outreach!

Honestly I find it heartbreaking when a well-meaning volunteer community sees their work disappear before their eyes, and I want to help with the preservation of both the community and of the data.

Hopefully things will go from strength to strength with this project

It sure has been the case over the past year or two, and continuously so!

With regards to images, that’s a bit more of a pickle to solve (because we want all the data we store to be CC0), but it is definitely planned. Unfortunately I don’t have an ETA for you.
However I’m certain we will find a way to circumvent the issue like MusicBrainz did with a collaboration with the Internet Archive (https://coverartarchive.org) to host the images.

Madir · July 28, 2020, 4:11pm

Thanks for the positive response. I already wrote some scripts to import the *ogs data into a postgresql database. You can find them here https://gitlab.com/Madir/ogs-database. The ogs sites have some more columns or fields at the moment but I think combining them to an annotation might not be a wrong way. And sooner or later it will be possible to move them so separate fields. In addition to comicogs also bookogs supports series and an about/subject field, which I think can be pretty helpfull to organize things.

And I also managed to set up a bookbrainz server on a virtual machine. But looking at the database of bookbrainz, I’m note really sure to understand all of the tables. Do you have an ER-diagram to give me an overview on the database? Would an import directly run on the database (as SQL statements) or would it be a script on the website to import from a foreign source?

Deleted_Editor_2146972 · July 28, 2020, 6:14pm

Thanks @Madir for all your efforts to preserve the Bookogs and Comicogs data.

mr_monkey · August 4, 2020, 3:34pm

@Madir: Sorry, I forgot to answer your last message.

Unfortunately the schema diagram we currently have available is a little out of date, but more problematically not easy to understand.

I have started writing a document that explains how the system work, a work in progress you can find here: https://docs.google.com/document/d/1rEQj7c3jfW_Wvrukr5M57Oy22LePR2NbdRSquPQYsf8/edit?usp=sharing

One thing particularly of interest for you that is not covered by this document, and unused/unfinished at the moment is anything regarding $entity_import tables (in short, a sort of staging area for imports to be added/merged into the regular entities or to be discarded)
See here and here

I hope this helps a bit in getting used to the schema; in any case I am here to answer your questions!

Tsivihcra · August 5, 2020, 6:39pm

I too am here from Bookogs and the related sites. Hoping to figure out how to submit books to the database.

mr_monkey · August 6, 2020, 8:59am

Welcome to the community @Tsivihcra!

sound.and.vision · August 6, 2020, 9:33am

Hello friend!

So far I have been manually adding my original submissions, the general structure is:

Author > Work > Edition, where a work is almost like a Master Release in Discogs terminology - so for example Bram Stoker’s Dracula would be a work, and then various printings of it over the years and in various languages are editions.

There’s a lot of basic information that also needs to be added - such as publishers, and editors. There aren’t as many unique fields as there was on Bookogs at the moment, nor any support for images so far (which I hope comes soon) - however the crucial parts are there, such as ISBN identifiers.

BookBrainz is also quite keen to link to other large databases such as Goodreads, Open Library and Worldcat, you can get those identifiers by plopping the ISBN13/ISBN10 code into those websites, copying the edition URL into BookBrainz and it will figure the rest out.

The community here are top-notch and very helpful for any questions, ideas or feedback

KR
sound.and.vision

Deleted_Editor_2146972 · August 6, 2020, 9:44am

Welcome @Tsivihcra. It is a challenge swapping platforms, so if you have any questions bring them to the Forum. Although, I see you have made a good start.

Hopefully, BB will attract more ex-Bookogs contributors. I have been actively spreading the word to the people I know from that site and with any luck they will make the move.

Madir · August 6, 2020, 7:36pm

Thank you for the documentation. This makes things more clear to me. It’s the part storing the history of the entries and the concept with sets, which I didn’t understand by looking at the tables. Now I think Igot it.

Looking at the import: From what I see, an import always stores data in the regular tables like author_data, work_data, etc., but in addition it has an link_import which connects it to an entity. Am I right, or did I miss something?

What I don’t get at the moment is the author_credit. It is written, that it offers a way to use the name used for the edition. But on the webpage there is no author for an edition but only for a work. And also for other credits, this variation of credit names cannot be entered.

Tsivihcra · August 7, 2020, 1:04am

Thanks @Madir for this topic and thanks to BB for hosting all of us. I see that @hmvh is here as well as many others. For now, I’ve been adding authors and works to get a feel for things, and a few books here and there.

mr_monkey · August 7, 2020, 9:42am

The first part is correct: imports are stored in the same way as regular entities, but what is different is the $entity_header vs $entity_import_header.
The header tables are what allows us to say “Here’s an entity of this specific type, and here’s where you can find the latest revision (current state)”.
By having separate headers, we can identify and mark imports as such on the website. Once validated, we move the $entity_import_header entry into $entity_header and voilà! consider it imported.

The link_import table looks to me like a way to say “this item was imported with website ABC where it had ID #1234 and has been fully imported into this BB entity”.
For example, if we import the Bookogs DB tomorrow (I wish!) and store all that information, we can later re-import the database with images, and know which item has been imported (or not) into which entity.

Sorry, I forgot to mention this feature is currently being implemented, and not currently on the website.

For more on this topic, see Author aliases - no method of crediting works under pseudonym/s

Deleted_Editor_2146972 · August 7, 2020, 9:53am

I don’t envy this task and I sincerely hope it does work.

One thing that did occur to me is that on Bookogs the Credits encompassed authors, publishers, printing companies, editors, subjects (music genres, bands, special interest topics, etc), book series names. Author pseudonyms were created as separate Credits and only linked to their PAN’s using hyperlinks in the profile (and that didn’t always happen).

I just wonder how all of this extraneous data is going to be filtered, so that a load of redundant credits are not generated on BB.

mr_monkey · August 7, 2020, 11:51am

The idea is to treat imported entities differently and mark them visibly as imported data.
From there, editors can discard or validate the import, at which point it is created or merged into an existing BB entity and considered a regular entry.

Which means initially there will be plenty of duplicates marked as imported data, but over time imports will be refined and merged manually.

Deleted_Editor_2146972 · August 7, 2020, 2:08pm

OK that makes sense.

Tsivihcra · August 16, 2020, 11:25am

Thank you for this. Excited to see the future here.