[Style] Disambiguation Guideline

I’d like to contribute to the BookBrainz style guidelines. I thought I’d start by taking a crack at a guideline for the disambiguation field. There was some discussion on IRC about a process for this; meanwhile I wanted to get the beginning of a draft in front of the contributors, for comment.

(Very) rough draft follows. My examples can use some work.


Disambiguation

The disambiguation field is used to help distinguish ambiguous records. They are visible in the record pages, and also appear in the search results next to their names.

The disambiguation should be written in English if possible and kept fairly short.

When a disambiguation field should be used

Disambiguation fields of records should be specified when any of the following conditions occur, in any combination:

  1. When two or more records of the same type contain an identical name.

  2. When two or more records of the same type contain names where only articles or short prepositions are added or are different, e.g. “the”, “and”, “for”, etc. Examples:

  • “Enemy Territory” and “In Enemy Territory”
  • “Sands of Time” and “The Sands of Time”
  • “Cause or Effect” and “Cause and Effect”
  • “This and That” and “This for That”
  1. When two or more records of the same type contain names where the differences are singular vs. plural. Examples:
  • “Rude Awakening” and “Rude Awakenings”
  • “The Wild Horse” and “Wild Horses”
  1. When the name of a record contains a single word, or two words where the first word is an article. Examples:
  • “Windows”
  • “The Duel”
  • “An Harmonica”

[Editorial note: My rationale for including 4. above this is that: a) it contains very low content (single word) and disambiguation can be helpful in simply identifying it, and b) it stands a very high probability collision in the future and lightens the burden of those entering the same title later, deciding if it’s the same as theirs or not.]

  1. When the difference between names are variations in spelling, spacing, hyphenation or the abbreviation of words. Examples:
  • “The Color of Money” and “The Colour of Money”
  • “Starships in Space”, “Star Ships in Space” and “Star-Ships in Space”
  • “Doctor Teeth” and “Dr. Teeth”
  1. When the work is originally in another language, and the name of the work being entered is the translation of the name in its original language.

When disambiguation field should not be used

  1. To provide a description of a record that would otherwise not be ambiguous. Use annotations to provide such descriptions.

  2. For records of different types with the same name. There are sufficient cues in the icons next to the name to disambiguate identical names of different types.

Disambiguation field content

For works

  1. Translated work. If a work has been translated from its original language, the disambiguation field should reference the author(s) of the translation, followed by “translation”.
  • “Statesman (Benjamin Jowett translation)”
  • “War and Peace (Louise & Aylmer Maude translation)”
  • “The Art of War (Lionel Giles translation)”
  1. Different authors. If the name of a work is ambiguous, the disambiguation field should contain the name of the author and the work type. Examples:

    • The Apple (O. Henry short story)
    • The Apple (H. G. Wells short story)

For editions and edition groups

For editions and edition groups that require disambiguation, the author name(s) should be listed in the disambiguation field. For example, there are a number of editions and edition groups named “Short Fiction”, each for a different author. Examples:

  • “Short Fiction (Philip K. Dick)”
  • “Short Fiction (P. K. Wodehouse)”

That’s where I am so far. Still to do:

  • add content guidelines for: series, author and publisher
  • get guidance on whether editions and edition groups should combine translations of the same work

Your feedback would be greatly appreciated.

8 Likes

I’ve updated the draft disambiguation style. I’ve also taken the liberty of drafting in a fork of the user guide on Read The Docs here: Disambiguation - BookBrainz User Guide. Feedback and better examples than I’ve cited would be greatly appreciated.

1 Like

I agree with essentially everything you wrote here, @pbryan, just a couple notes if I haven’t bothered you enough.

Your detailed descriptions are very English-centric. For example, Chinese doesn’t have plurals or articles, but has other features that require a disambiguation (e.g. variant characters or erhua). The same is probably true of every language, and it doesn’t make sense to go to this level of detail for every language — as a guideline. I think the rule should be general: whenever two titles are similar enough to be confused, a disambiguation note should be added. What you wrote is still useful, these are examples, in English, where we should consider the titles are similar enough to warrant a disambiguation. Or am I wrong in thinking the guidelines should apply to all languages? A book DB is necessarily multilingual, translations are very common, maybe most works published are translations. I wouldn’t want to focus only on one language.

I think 4. is quite unnecessary, it’s fine to have one-word titles without disambiguation. But I also don’t feel that’s something worth arguing over.

For translated works you seem to be assuming they always have the same title, but this is not the case. An example I was looking at recently: Eça’s novel O Primo Basílio was published in English translation as Dragon’s Teeth in 1889 and more recently as Cousin Bazilio.

  • Dragon’s Teeth (Mary Jane Serrano translation)
  • Cousin Bazilio (Margaret Jull Costa translation)

This would lead anyone to lead anyone to think these are different novels when, in fact, it is the same work. This isn’t rare. Lu Xun’s short story collection 呐喊 Nahan has been translated as Call to Arms, Cheering from the Sidelines, and Outcry. (These are editions, just mentioning as an example of different titles for different translations).

In this case, I use the original title in the disambiguation and shorten the translator’s name to the surname:

  • Dragon’s Teeth (Serrano translation of O Primo Basílio)
  • Cousin Bazilio (Jull Costa translation of O Primo Basílio)

Even if you don’t read the original language or script it’s still clear this is the same work.

  • Call to Arms (Yang & Yang translation of 呐喊)
  • Cheering from the Sidelines (Lyell translation of 呐喊)
  • Outcry (Lovell translation of 呐喊)

(Again, these are editions, I’m just using as title with different translations, so I don’t have to ho check the titles of the actual short stories — which are also different.)

Translations should be so common on BB I think this should be part of the UI: works that are translated should always display the name of the translator and original work, with links to these entities.

2 Likes

I like 4, myself.

There’s an issue in MB with common titles getting messy/used for a bunch of incorrect stuff. When someone finally creates and disambiguates a new entity, they don’t necessarily have the inclination (or the ability to understand the mess/original edit and) disambiguate that one. Being proactive is a good idea.

I believe that BB should be inherently multilingual. So, I agree, @blackteadarkmatter, it should not just apply to English ambiguities. So, I guess the question is whether we should document rules for identifying ambiguities, or simply allow the editor to decide. If the former, then rules for each language.

I’ve been on the fence about #4. I’ve been following the practice in my editing so far, and have wondered if it’s useful, especially for very unique one-word names. As @aerozol pointed out, in MB we’ve seen one-word titles left ambiguous, so indeed I was trying to be proactive here. I’ll be happy to follow consensus on this point.

I haven’t assumed translated works would have the same name; rather, that translated works should always have disambiguation, regardless of what the original name is. In fact, I’ve seen variations of a work name translated more than once to the same language. I do kind of like your idea of including the original title in the disambiguation though: TranslatedName (AuthorName translation of OriginalWorkName). How does everyone else feel about this?

I too would like to see additional cues in the UI where we can have implicit disambiguation based on information we’re already capturing in the BB data model. So, the current disambiguation guideline really addresses current shortcomings, which one day can be resolved by improvements to the user interface.

Note: I realize I’ve been naive about edition and edition group disambiguation. I’m planning a revision to address this.

Edition groups I have a pretty good handle on I think: disambiguate by author and/or language.

Editions I’m still struggling with. Old titles, like those in the public domain have had countless editions. In my editing, I haven’t tried to disambiguate them as you can view them listed in the edition group by issue date at least. But in search results, multiple editions of the same work will stand-out without some unique identifier. Imagine a work issued by multiple publishers, on multiple dates, in multiple languages. Thoughts appreciated. :thinking:

I have often thought about this issue and have always come to the conclusion: It is still too early to make final decisions. Looking ahead a bit, here’s what I see:
The “Disambiguation Field” for “Works” will have an automatic disamb tag for authors. So we will only need it if the works of a certain author have the same title.

For different translations there will be sorting algorithms on the “author page” that might look like this: (example: Isaac Asimov’s Foundation):

To enter the original work in the disamb field seems to me not to be necessary. (On the work page the “relation” also shows the original title.

For editions, editions with different ISBNs and/or different publishers should not need a disamb field entry. For editions without ISBN we will also have images available (they are urgently needed).

Adding and editing Edition Groups is still a total mess. You have to do everything manually and I don’t even want to know how many wrong or duplicate Edition Groups there are by now (do you need a disamb field there too? I honestly don’t know ;-)).

So everything we enter into the disamb fields now has rather a preliminary character and for a detailed discussion it seems to me just too early.

3 Likes

I agree it’s still early. The thing about the guidelines is they’re not written in stone. Many of the disambiguation rules we need are making up for deficiencies in the site, and will likely change as the site changes. It was like this in MB too.

What I’m aiming for is to come up with some set of repeatable set of rules that at least I can follow, and better yet still we reach some consensus. If this doesn’t get ratified as an official guideline, I’ll live with that, for at least we’ll have discussed it.

To that end, I’ve drafted a new version of the guideline, which I’ve dramatically simplified.


Disambiguation

A disambiguation is used to help users distinguish between entities that have identical or confusingly similar names. Disambiguation values are visible in search results as well as entity detail pages. When including a disambiguation, it should be written in English and kept fairly short.

When to use disambiguation

1. If two or more entities of the same type contain an identical name, in most cases a disambiguation is required.

2. If two or more entities of the same type contain names where only grammatical articles, punctuation or plurality differ, a disambiguation is recommended.

When not to use disambiguation

1. Do not use disambiguation to provide a description of a entity that is not ambiguous. Use annotations to provide such descriptions.

2. Do not use disambiguation to distinguish entities of different types. Existing cues in the user interface already handle this.

Disambiguation content

For works

1. Different authors. If the name of a work is ambiguous compared to the work of another author, the disambiguation should contain the name of the author and the work type. Examples:

  • Misery (Anton Chekov short story)
  • Misery (Stephen King novel)

2. Translated work. If a work has been translated from its original language and is ambiguous with works of the same author, the disambiguation should indicate the language of the translation. Examples:

  • Madame Bovary (French) ← original work
  • Madame Bovary (German translation) ← translated work

If the work has been translated to the same language by different translators, include them in the disambiguation:

  • Madame Bovary (English translation: Alan Russell)
  • Madame Bovary (English translation: Eleanor Marx)

3. Identically named work and type by same author. If an author produces ambiguously named works of identical type, the disambiguation should contain the name of the author, work type and some descriptive text. If no descriptive cues are available, quote the first few words of the text. Examples:

  • Justice (Ambrose Bierce poem “She jilted me”)
  • Justice (Ambrose Bierce poem “Jack Doe met Dick Roe”)

For scenarios where more than one ambiguity type occurs for a work, combine the methods above into the disambiguation.

For editions and edition groups

For editions or edition groups requires disambiguation, the author name(s) should be listed. Examples:

  • Short Fiction (Philip K. Dick)
  • Short Fiction (P. K. Wodehouse)

If editions and edition groups in different languages require disambiguation, append the language to the disambiguation. Examples:

  • Hamlet (William Shakespeare, English)
  • Hamlet (William Shakespeare, German)

If editions are issued by multiple publishers or multiple editions are issued by the same publisher, the edition name can be left ambiguous.

For authors

If an author has an identical name to another author, then disambiguation should be the years of birth and/or death of the authors. In the unlikely event of birth/death date collisions, add descriptive information following the years. If precise years are unknown, indicate circa. Examples:

  • Agatha Christie (1890–1976)
  • Aristotle (384–322 BCE)
  • Geoffrey Chaucer (c. 1340–1400)
  • Andy Weir (1972–)

For publishers

Provide descriptive information about each ambiguous publisher.


Note, I haven’t (yet) adopted putting the original title in a translation disambiguation; I’m concerned it could grow quite large, and it would only be supplied if it matched another name anyway. So, I’m leaning away from it. Anyone else have an opinion on this?

2 Likes

Looks very good, but is very prescriptive compared to MusicBrainz disambiguation guidelines (these look much more organised than those FYI, nice work!)

For instance, the birth and death date of authors wont always be helpful, unless the author is well known. A disambiguation like ‘fantasy author’ might be more useful. Or ‘80-90s fantasy author’. It’s less tidy, but a guideline like ‘keep it short, in English’, might be more appropriate here.

As @indy133 points out, I’m not sure if stuff like putting the authors name next to works or editions either, as the display/UI should probably display that where necessary, it’s so universally useful. Same with translation language. However something like the translator might be too fringe to be shown in any UI so could be useful.

I’m not a BB editor so this is just food for thought, take it with a grain of salt :slight_smile:

4 Likes

Thanks for working on this @pbryan !
Of course your guideline –already as-is and even more so once communally refined– is a major improvement over the current lack of one :slight_smile:

I’m not sure if stuff like putting the authors name next to works or editions either, as the display/UI should probably display that where necessary, it’s so universally useful

I agree with this sentiment, this should be a UI improvement rather than be part of the disambiguation.
Similarly birth/death dates next the authors’ names seems less descriptive as a disambiguation compared to “detective novel writer” for example. In case this does not disambiguate we could add a general period of activity: “20th century detective novel writer”
However the dates could be shown in addition to disambiguation where needed in the interface.

However something like the translator might be too fringe to be shown in any UI so could be useful.

I think on the contrary this could be very useful and shouldn’t be hard to achieve in the interface. I often see on other websites the simple “Author Name, Translator Name” as authors of a translated work, which removes all ambiguity (except perhaps the case where a translator did multiple translations of the same work, which is definitely an edge case)

4 Likes

Most of the issues that disambiguation (currently) addresses could be solved with enhancements to the UI to display elements already in the structured data. Should I be aiming for a guideline that works with the way the UI works right now, or would time be better spent focusing on improving the UI?

Based on feedback, I’ve updated the proposed guideline, also viewable as a draft in Read the Docs.


Disambiguation

A disambiguation is used to help users distinguish between entities that have identical or confusingly similar names. Disambiguation values are visible in search results as well as entity detail pages. When including a disambiguation, it should be written in English and kept fairly short.

When to use disambiguation

1. If two or more entities of the same type contain an identical name.

2. If two or more entities of the same type contain names where homonyms, grammatical articles, punctuation or plurality differ.

When not to use disambiguation

1. To provide a description of a entity that is not ambiguous. Use annotations to provide such descriptions.

2. To distinguish entities of different types. Existing cues in the user interface already address this.

Disambiguation content

For works

1. Different authors. If the name of a work is ambiguous with the work of another author, the disambiguation should contain the name of the author and the work type. Examples:

  • Misery (Anton Chekov short story)
  • Misery (Stephen King novel)

2. Translated work. If a work has been translated from its original language and is ambiguous with the work of the same author, the disambiguation should indicate the language of the translation. Examples:

  • Madame Bovary (French) ← original work
  • Madame Bovary (German translation) ← translated work

If the work has been translated to the same language by different translators, include them in the disambiguation. Example:

  • Madame Bovary (English: Alan Russell)
  • Madame Bovary (English: Eleanor Marx)

3. Identically named work and type by same author. If an author produces ambiguously named works of identical type, the disambiguation should contain the name of the author, work type and some descriptive text. If no descriptive cues are available, quote the first few words of the text. Examples:

  • In Memoriam (Voltairine de Cleyre poem to Dyer D. Lum)
  • In Memoriam (Voltairine de Cleyre poem to Gen. M. M. Trumbull)

  • Justice (Ambrose Bierce poem “Jack Doe met Dick Roe”)
  • Justice (Ambrose Bierce poem “She jilted me”)

For scenarios where more than one type of ambiguity occurs for a work, combine methods above, shorten wherever practical and separate with commas. Examples:

  • Esau (Philip Kerr novel, English)
  • Esau (Philip Kerr novel, German translation)
  • Esau (Poul Anderson short story)

For editions and edition groups

If an edition or edition group requires disambiguation by author, the author name(s) should be listed. Examples:

  • Short Fiction (Philip K. Dick)
  • Short Fiction (P. K. Wodehouse)

If an edition or edition group for the same author requires disambiguation by language, append the language to the disambiguation. Example:

  • Hamlet (William Shakespeare, English)
  • Hamlet (William Shakespeare, German)

If editions are issued by multiple publishers, or multiple editions are issued by the same publisher, no further disambiguation is applicable.

For authors

For an ambiguous author, provide a brief description including: genre, language, occupation, nationality, era, etc.

For publishers

For an ambiguous publisher, provide a brief descriptions such as nationality, speciality, language, era, etc.

1 Like

I’m glad to see the work that has already been done by @pbryan in just a few days. Some of thes guidelines aren’t exactly how I’d handle it, but it works and I’m happy to stand behind it. Maybe doubts/corrections will come to mind as I start to implement it. A couple notes:

@indy133, as I was thinking about this translations issue I reached exactly the same conclusion, the UI should be hierarchical, showing the translations below the original work — and this includes translations of translations, which isn’t uncommon. But we still need to have some standard as more works and editions are added, and I think that’s what @pbryan is trying to do.

I think we need guidelines we can use now. We don’t know when we will have a better UI, as the changes that are needed will take time to implement and BB development is being very slow. But we should keep in mind this is WIP, to be simplified when there is a better UI.

I disagree, a genre is an opinion, but birth/death years are set in stone. It is also standard, it’s how libraries identify authors, so book readers are already used to seeing this.

The translator isn’t “fringe”. The translator is the author of the translation. When you read a translation, the translator wrote every word you are reading, constructed the sentences, the paragraphs. Translators have been fighting for decades to have their names on their book cover (with some success, recently). It’s not vanity, it’s not basic recognition. And if you identify a book by its author, you should also identify a translation by its (co-)author: the translator. Although… I am biased, I’m a professional translator.

This used to be easy, just write the country (or the city, at the most) they are based in. But with the recent merges, and conversion of publishing houses into imprints, this is going to be the same undefinable mess as labels in MB. I would add that, like labels, the publisher of an edition should be the imprint, not the company that owns the imprint and may or may not share its name.

2 Likes

I am perfectly happy to use the guidelines as they stand at the end of this conversation and modify them if and when* the UI makes some of these disambiguation needs obsolete.

  • “when” because as @blackteadarkmatter says it might take a while to agree on and implement the required changes, and “if” because we might find that showing all this disambiguation data everywhere might not be practical or desirable compared to a good disambiguation following the guidelines
2 Likes

I’ve definitely seen subjective dismbiguations in the MB database, especially WRT artist, so there’s precendent for at least some opinioned data. Right now I could go either way on this.

It’s already a mess. We have publishers that have undergone multiple name changes over several decades or more of reorganizations, mergers and acquisitions. In order to make sense of it, I think we’d need publishers to have some kind of temporal element, or we by policy we say when there’s a merger or acquisition, the old publisher ceases to exist, and a new one with the same name is created.

Quite right. If we think of a very basic example, say, Publisher A (company) was founded in London in 1947 and published books as Publisher A (same name, imprint); in 1997 Publisher B bought A and started publishing books as Publisher A (same name, different company). I think we would agree there’s two BB publishers called Publisher A that we could disambiguate as such:

  • Publisher A (1947-1997) [Here we would add London if there is another publisher with a similar name]
  • Publisher A (1997-, Publisher B imprint)

(The company Publisher A could be added but is unnecessary, unless it had other imprints; the one for Publisher B should be added.) The problem is: I’m using a hypothetical example because I couldn’t find a real life one that is this simple…

I think we should keep this in mind, but it’s always going to be very complicated to get this right.

3 Likes

OK, it seems discussion on this proposal has concluded. Based on what has been discussed, I don’t see any outstanding issues that can’t be addressed in later refinements to the guideline.

Here is the latest draft of the guideline, which I can submit as a PR:
https://bookbrainz-user-guide-pbryan.readthedocs.io/en/latest/style/disambiguation/

I think there’s more thinking and discussion to be had around future refinements for authors and publishers. The guideline for these in my draft pretty much represents the current practice by most editors.

Are there any objections to proceeding with this proposal, and continuing to refine it as we go?

4 Likes

The new guidelines for disambiguation are now up on the user guide: Disambiguation - BookBrainz User Guide

Thanks again @pbryan for clarifying the guidelines and leading this discussion !

While I’m here, if anyone else is interested in helping improve the guidelines but doesn’t want to have to deal with the technical aspects, don’t hesitate to open a forum post with your draft, which I can eventually format and integrate into the user guide.

5 Likes