Can we have more advanced search?

I’ve recently been dragged into a debate about capitalization style of Dutch release titles. Apparently there are a lot of titles not conforming to the Dutch style guide.

I would like to attempt a search for these non-conforming titles, but need advanced search options to do so. For example:

Search for Release title OR Release group title OR work OR recording * where language is Dutch OR where Artist area is Dutch

^ That’s a simplified search query obviously but maybe we could develop something which approaches this.

Even cooler if we can additionally use REGEX so I could do something like [A-Z].*[A-Z] (this would let me filter down to titles where there are at least two capitals used.

EDIT:
Just learned we already have Regex. I’s trying to set up an advanced query, help welcome:

/[A-Z].*[A-Z]/

recording country nl
recording tag Dutch
recording tag dutch
recording tag nederlands
recording tag Nederlands

release country nl
release lang nl
release tag Dutch
release tag dutch
release tag nederlands
release tag Nederlands

release group tag Dutch
release group tag dutch
release group tag nederlands
release group tag Nederlands

work lang nl
work tag Dutch
work tag dutch
work tag nederlands
work tag Nederlands

edit 2:
I tried this in the advanced search for Work and didn’t get the right results yet:
/[A-Z].*[A-Z]/ AND lang:nld

Also I noticed that if I set the website to Dutch, I can not search all of the same indexes I could in English, this looks like a bug.

Edit 3:
getting closer but the regex isn’t matching as expected just yet
work:/.*[A-Z].*[A-Z].*/ AND lang:nld

edit 4:
now the lang gets ignored
lang:nld AND work:/.*\p{Lu}.*\p{Lu}.*/

edit 5:
Looks like your regex engine will not understand [A-Z] nor \p{Lu} . What’s up with that? Any idea how to work around this issue?

edit 6:
lang:nld AND work:/(?-i)[ABCDEFGHIJKLMNOPQRSTUVWXYZ].*[ABCDEFGHIJKLMNOPQRSTUVWXYZ]/

No sigar.

2 Likes

Hi @Stargaz3r, it isn’t currently possible as title search is case-insensitive, see mbsssss/common/fieldtypes.xml at v-2021-05-14 · metabrainz/mbsssss · GitHub

1 Like

I figured haha
could we enable mode switching like (?-i) ? That would enable more advanced queries including the one I need here.

There isn’t any support for such flag in Lucene regular expressions. One solution would be to have another search field workcase that preserves the letter-case of the title, or even to preserve the letter-case in the currently case-insensitive field workaccent.

1 Like

In the meantime, I made this search in SQL for you. There are currently 3149 works with Dutch lyrics and 2 uppercase letters. See the full list.

6 Likes