No newer search index data dump since 2021-07-14?

I asked about incomplete data dump here:

Now I see that on your FTP servers, the search index dump is still from 2021-07-14.
The newest data dump is from 2021-08-14, one month later.

Is this just a replication issue or are you no longer exporting the search index dumps for other reasons?

1 Like

The search index data dump from 2021-07-14 is still the “latest”.
(The latest fullexport data dump is from yesterday, 2021-08-25).

Any reason why?

1 Like

Sorry, exporter’s configuration has not been correctly updated after server migration. Search indexes were correctly built but not exported. This has been fixed and dumps are available from main EU server already. Mirrors just started to synchronize, should be done soon.

4 Likes

Thanks for that fix @yvanzo

Do you know a reason why there are still many gaps between the numbers for DB and INDEX with the newest available dumps?

admin/check-search-indexes all

CORE           STATUS  INDEX     DB
editor         OK      0         /0
instrument     OK      1008      /1008
series         --      13930     /13951
place          --      48479     /48524
event          --      53977     /54004
tag            --      98843     /110667
area           OK      118563    /118563
label          --      206949    /207198
cdstub         --      288476    /288470
annotation     --      481061    /438362
work           --      1543614   /1544272
artist         --      1861509   /1863047
release-group  --      2329711   /2332074
release        --      2959583   /2962386
url            --      7988848   /8001129
recording      --      25309209  /25326734

Most of the DB numbers are larger, with the exception for CDSTUB and ANNOTATION (and the OK lines). What could be the reason for this difference?

1 Like

@yvanzo: Could it be, that the current search index data dump in

https://mirrors.dotsrc.org/MusicBrainz/data/search-indexes/20211110-041005/

is incomplete? Several bigger dump file are missing, like
recording.tar.zst 2021-11-10 04:47 25G
or
release.tar.zst 2021-11-10 05:01 2.0G
and others does not seem to be replicated yet.

Or am I just too impatient again?

Possibly related to indexed search issues from June that still haven’t been resolved:

1 Like

Thank you for this hint.

The export dump worked fine (at least the data I need for tagging purposes) until last sunday.
The replication to the .dortsrc.org server seems to be stuck since then.

@yvanzo Should the mirror

https://mirrors.dotsrc.org/MusicBrainz/data/search-indexes/20211113-041003/

not be fully replicated until Monday noon?
Today, it seems that 5 entities are still not replicated.

Has something changed in this replication process?

md5sum: release.tar.zst: No such file or directory
release.tar.zst: FAILED open or read
md5sum: series.tar.zst: No such file or directory
series.tar.zst: FAILED open or read
md5sum: tag.tar.zst: No such file or directory
tag.tar.zst: FAILED open or read
md5sum: url.tar.zst: No such file or directory
url.tar.zst: FAILED open or read
md5sum: work.tar.zst: No such file or directory
work.tar.zst: FAILED open or read

https://mirrors.dotsrc.org/MusicBrainz/data/search-indexes/
currently still has
latest-is-20230506-041002

but
data.metabrainz.org/pub/musicbrainz/data/search-indexes
already has
latest-is-20230510-041003

Is something stuck on the mirror process?

Now dotsrc has the latest one too - I expect it gets uploaded to ours, then the others mirror it, so there’s a delay? :slight_smile:

1 Like

Yep, I can see it now too, thanks.
The mirror servers usually receive the data until Wednesday evening CET :wink: