LLM/‘AI’ Code of Conduct/Guidelines

This topic was raised at the last MetaBrainz meeting, 2023-06-26.
This thread will duplicated content from the meeting summary - please keep your discussions on the topic to this thread.


Summary

This topic will be brought up again at the next meeting (2023-07-03), and we are seeking community input before then.

There have been instances of LLM/‘AI’/chatGPT written and/or assisted content in forum posts, CritiqueBrainz reviews, and MusicBrainz annotations.

  • Three examples of how we can deal with this have been given in the meeting notes below, which can be summarised as ‘A: allow no LLM’, ‘B: allow some LLM’, ‘C: allow all LLM, with disclaimers’

The feedback we are after, based on the examples given, is:

  1. Do we address this in the Code of Conduct, or on a per-project basis (CB being the pressing one)
  2. Which of the three options (in the meeting notes) do you think is appropriate
  3. Do you have feedback on the wording of your preferred option

Further discussion is fine, of course! But if you can please address those three points succinctly first, they will be considered in the baseline feedback summary :slight_smile:


Meeting notes

Copied from the 2023-06-26 meeting notes:

Discussion:

This is a summary only! In the order that these comments were brought up. Please read the chat logs for the full discussion.

  • the semantics of using the term ‘LLM’. Is it a future proof term?
  • do we need to cover other stuff, like AI that isn’t a LLM
  • @rob: “…I think we should limit it to CB for now. I feel that applying this rule to everything is a bit overreaching when we dont understand how it might impact non CB projects.”
  • @mr_monkey: “I think AI-assisted writing is an inevitable reality of the near future. Especially relevant for people who want to write in english (for reach) but who are not native speakers. IMO that’s acceptable, so I’m personally more tempted by options C”
  • @reosarevok: “We do have issues in the forums with posts that don’t sound like a real human wrote them” “But we can have that separately in the forum rules”
  • @reosarevok: “We also had a super long, obviously LLM generated edit note in MB recently”
  • discussion around whether we should be using the term AI or LLM [@aerozol note, at the time of this post: I’m not sure if people noticed that I used both in the draft CoC/guidelines - LLM/‘AI’]
  • @atj: Annotation - MusicBrainz - “You should never add copyrighted content copied from other resources, be they online or printed.”
  • @reosarevok: “Yeah, what atj brings up is another issue - who does even own the copyright?”
  • @yvanzo: “But I agree with the general draft otherwise: Do not submit AI-generated content without adding a clear disclaimer.”
  • discussion around whether translating text falls under ‘primarily written by LLM’, or it’s the same as using Google Translate
  • @yvanzo: I think that the upcoming EU regulation goes in the same direction: add watermarking to AI-generated images.
  • @atj: i feel that this opens up MeB to legal risks, obvious mayhem can speak to that, but it seems somewhat similar to the CAA situation
  • @yvanzo: Even if using Google Translate, it would be sane to mention the translation tool being used.
  • @rob is asked if he could ask MeB’s lawyers if anyone knows what to make of the legal issue, and he already knows the answer: “the answer is: no one does. its all too early to tell.”
  • @atj: well in that case i’d say blanket ban and review in a year or something, but it depends on the attitude to risk
  • @reosarevok: I’d personally prefer a ban but I can see the point of allowing it for people who don’t have great English or whatnot (I’d much rather they posted in their own language for us to translate tbh since they might not know the bots are changing their meaning, but)
  • @reosarevok: I think it 100% needs to be disclaimed and at least that much could be part of the general CoC
  • @yvanzo: Ideally I would vote A but I’m not sure we can enforce it, so C would be at least something.

No conclusion, but it’s also mentioned that it’s a lot to take in a meeting session with no prep. It’s agreed to revisit the topic next week, and aerozol is (politely) volunteered to make a forum post and collect community feedback.


3 Likes

made a poll for ease of viewing

  • A: allow no LLM
  • B: allow some LMM
  • C: allow all LMM, with disclaimers

0 voters

3 Likes

I think a site-wide code of conduct update is overdue in any case (older thread), and would be the best place to address this (although reiterating it in the CB-specific rules doesn’t hurt).

I think the “total ban” (option B) is the right choice. Even if marked with a disclaimer, LLM text has an air of unwavering confidence that makes it easy to overlook such a disclaimer. Additionally, a disclaimer does not prevent the text from being picked up by search engines (or even the local site search) and making their results useless.

Some misc points from the lower notes:

  • The semantics of the term “LLM” might not be perfect, but it’s certainly better than feeding the popular fiction of these tools being Artificially Intelligent
  • I don’t think LLM-assisted writing is an inevitability, or certainly not to the scale some people think it will be. Once the current money-fueled hype wave dies down, more and more people will realize how dangerously incorrect these tools often are.
  • Machine-translating text should not fall under these guidelines, it’s clearly a different intent (and people have managed to internalize a decent amount of distrust around something like Google Translates’s output).
3 Likes

personally as someone who doesn’t use LLM but is very very bad at saying what i mean (autism), i theoretically don’t mind people using AI to help get their wording right, or to help them make sense (so the “allow using as a tool” approach).

but with the legal/copyright concerns along with LLM often being very wrong, i think for now a blanket ban would be good. at least until there’s more precedence for this type of thing.

but in theory if it’s practically unrecognizable as LLM and contains accurate information it’s fine imo. it’s not like we can stop anyone from doing that anyway if it’s truly indistinguishable.

(for clarity, with that i mean “the user changed enough about the LLM text to make it accurate and put it in their own voice”)

i agree with this as well.

hopefully i’m making SOME sense. i’m multitasking at work right now

2 Likes

Hey all,

Allow me to start saying that this post is not using any AI model to fix, improve or write.

As some of you guys know, I do use an AI text corrector/reviser to improve my English communication on my posts at the forum, and yes, I already have received some reactions about it. But I wonder, why?

If I use it to improve the outcome, and make my communication even better, it is a choice that I am making, on my writing process, why this would be an issue? Why the tools I choose to use on my creation process has to be surveilled and scrutinized, please, ask yourself why?

Well, I don’t see the problem, since the result is a better written text, post, review. Honestly, I think is gain/gain for all. But if this is an issue for some people, I don’t mind saying in the begging of my post: “This post has AI/LLM improvements”, for example. But why and how it change your perception of the idea/suggestion that I am trying to say? contribute? convey?

" Oh no! I can’t accept this idea, this text is way too perfect written!"

Voting against use MDL/AI, this is pretty much a philosophical vote.The way AI is evolving will be impossible to distinguish soon, there is no turning back, close the community to AI is a lost battle.

To wrap up, I am positive that I am one of the reasons this debate is happening, as I am very excited to be part of this community, if you guys want me to stop using AI on my text I will stop, (you guys will have to deal with poorly texts like this one). And every time I post I will wonder “Gosh, why can’t I revise/improve it for them!?” :stuck_out_tongue:

Now, as a humorous last note, in the end of the day, this a community geared to improve and organize a Data Set, resist to AI here is very ironic and a little amusing, it’s like workers from a meat processing plan, protesting to the world to be vegetarian. :stuck_out_tongue:

My 2 cents

I’ve voted for option B, allow some LLM. I think used as a tool it can be excellent.

My concern with allowing all LLM is that, worst case scenario, it could seriously wreck the database. Here’s an example of what could happen. If bots or people start adding thousands of purely chatGPT reviews, artist annotations, or forum posts, then those parts of MeB become useless to me. I certainly wont bother writing my own reviews or annotations, if they’re buried in LLM content, disclaimed or not.

Unfortunately this means that we would also have to ban posts that are mainly human written, but are heavily and obviously LLM edited. Whenever we can’t tell the difference between a LLM edit, or something made wholesale by LLM, we would have to err towards blocking it.

It would also be a ‘we’ll know it when we see it’ moderation situation, but I’m okay with that tbh.

I can appreciate that it gives a boost in confidence and can make writing feel better for you - but communication is about the people you’re communicating with too.

If the people you are interacting with are saying something you’re doing is not helping communication, then I would pay attention! The base assumption that you end up with better or perfectly written text is not a given.

For my part, I much prefer this post you’ve made, compared to the obviously chatGPT ones. I wont go into all the reasons why I think chatGPT is still a bad communicator/editor, but I got much more from your post here than your other ones.

So please don’t assume we all ‘resist AI’ here (though some may do), rather that we take seriously anything that may degrade the quality of our forums/database/projects :slight_smile:

5 Likes

There are many different styles of ChatGPT use and I don’t think there is really a way to cover all uses in one rule.

This is one area I don’t like ChatGPT input. It is not really a review if a bot did it.

On a similar tangent, I get confused with the use of ChatGPT in forums. I’ve seen both here, and more on other forums, cases where a ChatGPT response has been created to answer a technical question. And the answer was so many levels of wrong. It was worse than no reply as it was giving bad advice. That is the kind of input no one needs.

It is just like Google searching a question and posting the first thing that pops up. A good forum user uses personal knowledge to assist. They don’t just post any old unverified stuff.

Take this to a logical conclusion and the forum just fills with bots talking to bots… and the quality of help nose dives.

@Ketaros - I understand your idea of using ChatGPT to improve your reply, but it is weird to read. I much prefer a “warts n all” style of post. In many ways a badly worded question can be clearer. The forum is not a pretty writing competition. Even broken English makes more sense than some ChatGPT. (Your English is certainly better than my Portuguese)

I find ChatGPT reads like some kind of patronising salesman\politician. Totally turns me off wanting to interact with that person.

6 Likes

Please explain what it means.
I know AI is artificial intelligence, I am not sure about Chat GPT (IRC bot?) And no clue about LLM.
I didn’t search.

Update

Ah ok, @elomatreb told me LMM:

And my son told me for ChatGPT (j’ai pété).

We can just say text generators, no?

4 Likes

All the “AI” text generators that are relevant to this discussion (e.g. ChatGPT, Bard etc.) are LLMs, and many of the behaviours that people have issues with are inherent due to how the underlying technology works. Therefore I think it’s useful to be specific about the type of technology we are making guidelines about, as it’s quite possible a new text generation technology will appear which has a completely different set of issues.

3 Likes

I think each time that the user is not writing the text themselves, it is the same problem, no?

1 Like

Well,

I believe my main question was not yet addressed…
Why is it a problem in a forum when someone uses a LLM to assist them on their own writing process?

I am not asking about taste here, I prefer your writing as this or that.
I am not asking here if you think the text is better or not.
I am asking on the genuine sense of how the person choose to do it’s process.

Keep in mind:

  • Here is not a competition of some kind of Poetry or Literature.
  • Here is not where texts here to claim property or commercialize
  • Here is a “conversation” forum

Plz guys, the posts here have the objective of convey an idea, a suggestion, a question, etc… I honestly think, for forum msgs, let the person use whatever they want… seriously… I am against ANY kind of censorship, limitation to the freedom of your choices and expressions, since you’re not hurting anyone in any sense. And in the end of the day, if you don’t like what or HOW the person wrote it, move on…

Ps: Let’s stop also to talk about ChatGPT, for my forum posts I am using Grammarly Go (Beta), where I can choose to rewrite some parts of my text correcting my English, improving assertiveness, mood, etc.
Ps2: I am not using any corrector or AI again. :stuck_out_tongue:

Best.

Because we don’t know if we are talking to a LLM or a person, and we don’t know if we need to double check all the information in the post. Much easier to just ignore the post and look for definite human interaction.

Just like with reviews, if we had an overwhelming amount of (probable) LLM posts, I would stop using the forums.

Grammarly Go is not helping you do this with me, I’m sorry to say. Maybe it works for other people here.

2 Likes

The primary issue I have with LLMs is that they often reproduce text from their training corpus’ verbatim or close to verbatim, and as a user you have no way of verifying this (AFAIU this is a inherent issue with large language models). Due to the indiscriminate way that LLMs have been trained, companies risk inadvertently violating copyright by hosting LLM generated text, and this is an area that will inevitably result in legal disputes in the near future.

4 Likes

Okay aerow,

Let me bring you guys a REAL example.
Considering this post mine:
Is Digital Media good name? - #46 by Ketaros

  1. I wrote it in Portuguese, because is where I can organize my ideas perfectly and express better.
  2. I copied this text and I use free AI translator wordvice.ai to translate it to English.
  3. I copied then on Grammarly Go, to bring back the assertiveness I wanted.
  4. After this this I did a final revision, changing some bits here and there, and posted.

Please, read the final result, you may say you don’t like how it sounds, you may disagree with the ideas and suggestions there, but saying it’s a forbidden or invalidate the post because my process have used 2 AI tools… or that you are talking to a LLM I am sorry… that makes no sense… I can’t see why. There is not a single idea, opinion or suggestion there, that is not 100% mine.

Best

2 Likes

I voted for none in particular because I’m concerned about both the accuracy/meaningfulness of LLM content and the potential that it could leave downstream reusers of MetaBrainz data unable or unwilling to work with the content because of the unclear licensing.

Where automated grammar checking, etc. falls isn’t clear to me.

2 Likes

Here are Wikipedia’s draft, still-under-discussion guidelines: Wikipedia:Large language models - Wikipedia

This page in a nutshell: Use of large language models (LLMs) must be rigorously scrutinized, and only editors with substantial prior experience in the intended task are trusted to use them constructively. Repeated LLM misuse is a form of disruptive editing.

9 Likes

My opinions pretty much line up with @mr_monkey’s. I can see LLMs having a similar use case to machine translation tools, so on both as long as a disclaimer is shown I’m fine with it. I can understand fully banning it, but I think that will do more harm than good to those who use it as assistance for human efforts.

5 Likes

Thanks all!

This topic was concluded in the last MetaBrainz meeting, you can view the meeting notes here:

I’m back from my holiday :sun_with_face: and have drafted the guidelines on this ticket:

The guideline is based on discussion in this thread and in the IRC meetings, trying to keep it as simple and brief as possible. They allow people to use LLM/AI as a tool, but give admin/moderators plenty of leeway to remove obviously LLM/AI written content.

Please put specific wording feedback* on the ticket, but general discussion/ruminations can still be held in this thread.

*please include a explicit reason if you are increasing their specificity/complexity/length, keeping in mind that the aim is easily readable guidelines for users and moderators, rather than a codex of specific rules.

6 Likes

The CritiqueBrainz guidelines, on the About page, have now been updated with the above guideline, with some minor clarity tweaks thanks to @yvanzo:

“Your submissions must be your original work. Do not submit content that you do not hold the copyright to, or is not your own. This includes plagiarism and primarily LLM (Large Language Model) or AI generated content. You may wish to use LLM as a tool, but be aware that obviously LLM generated content will be removed.”

Going forward, this will hopefully allow people to use LLM (“AI”) tools to assist them if they need/prefer, but allow moderators plenty of leeway to remove egregious examples.

2 Likes