oh, i forgot to say, i'm on holiday from the 30th until the 8th, so I won't be around for next weeks meeting
riksucks
Sorry for being a bit late. The week is mostly uneventful, but rn setting up vscode_'s branch so that I can review it better. Apart from that nothing to report. Hope u guys are doing well :)
reosarevok
Same :)
Ok. Last call for arsh?
Ok, let's move on for now
We have 10 minutes for LLM policy (aerozol via proxy)
Drafts for discussion, re. more specific rules around LLM.
This has come up because we have had a mini-flood of LLM written reviews in CB in the last few days. One of the reviewers claims the reviews are mainly his work, but it is impossible to see how much input the user has had into the output. Though I would say they have at substantial chatGPT content (when compared to reviews I had chatGPT write for the same artists/albums). So, we have a decision to make.
I have put a set of drafts for each below, with three options: A. No LLM. B. Allow some LLM. C. Allow LLM, with a disclaimer.
NB: These drafts assume that we are happy to allow tracking of LLM data (e.g. LLM composed albums) in the database.
[reo's aside: I can't see any other choice than allowing it if we want to document all music)
*]
- Drafts for MetaBrainz CoC: - [that is, if we decide to set rules for LLM stuff globally]
A. 10. Do not add LLM ('AI') text, including in community chat and notes. Note that we do allow the tracking of LLM data in our databases, for instance LLM generated albums.
B. 10. Do not add LLM ('AI') text, including in community chat and notes. You may wish to use LLM as a tool, but be aware that obviously LLM generated content will be removed. Note that we do allow the tracking of LLM data in our databases, for instance LLM generated albums.
C. 10. Do not add LLM ('AI') text, including in community chat and notes, without adding a disclaimer, e.g. 'Post written/revised by chatGPT'. Note that we do allow the tracking of LLM data in our databases, for instance LLM generated albums.
mayhem
first thought: Is LLM the best way to describe this? is it future proof?
reosarevok
- Drafts for CB guidelines: - [that is, if we decide to do CB only]
A. “Your submissions must be your own original work. Do not submit content that you do not hold the copyright to, or is not your own, including LLM ('AI') generated content.”
B. “Your submissions must be your own original work. Do not submit content that you do not hold the copyright to, or is not your own, including primarily LLM ('AI') generated content. You may wish to use LLM as a tool, but be aware that obviously LLM generated content will be removed.”
C. “Your submissions must be your own original work. Do not submit content that you do not hold the copyright to, or is not your own. Primarily LLM (‘AI’) generated content must have a clear disclaimer at the top, e.g. ‘Review written/revised by chatGPT’”.
Done with the intro :D
mayhem: I dunno what is - it's certainly more future proof than chatgpt, but
yvanzo
First though on whether or not it is “substantial” generated content: There are tools to check whether a content is likely to have been generated by tools.
mayhem
but there will be AIs that generate text that won't be LLMs.
atj
those tools are pretty crap from what i've heard
mayhem
"text generating AI, such as LLMs"?
yvanzo
I’ve heard the opposite
mayhem
second, I think we should limit it to CB for now. I feel that applying this rule to everything is a bit overreaching when we dont understand how it might impact non CB projects.
akshaaatt
True
monkey
I think AI-assisted writing is an inevitable reality of the near future. Especially relevant for people who want to write in english (for reach) but who are not native speakers. IMO that's acceptable, so I'm personally more tempted by options C
reosarevok
We do have issues in the forums with posts that don't sound like a real human wrote them
But we can have that separately in the forum rules
yvanzo
I’m not sure why specifying LLM rather than AI either.
reosarevok
We also had a super long, obviously LLM generated edit note in MB recently, but
Well, it's about artificial neural networks, which are very much not intelligence :)
monkey: B) was meant as "LLM-assisted"
C) was meant as "anything goes"
monkey
Oh.
reosarevok
Yeah, what atj brings up is another issue - who does even own the copyright?
yvanzo
But I agree with the general draft otherwise: Do not submit AI-generated content without adding a clear disclaimer.
atj
LLMs often reproduce text from their training corpus verbatim or very close to verbatim. and the user has no way to know or verify it.
monkey
If I write in my language and ask a LLM to translate it, is that not "primarily LLM ('AI') generated content" ?
yvanzo
I think that the upcoming EU regulation goes in the same direction: add watermarking to AI-generated images.
reosarevok
monkey: That seems the same as using something like Google Translate (but worse)
(worse quality I mean)
So I wouldn't call that primarily "AI"
You wrote the thing, after all
I dunno why someone would use an LLM for the translation, but :)
atj
i feel that this opens up MeB to legal risks, obvious mayhem can speak to that, but it seems somewhat similar to the CAA situation
yvanzo
Even if using Google Translate, it would be sane to mention the translation tool being used.
reosarevok
Agreed, actually, yvanzo :)
I think given the time, we should think about it this week and continue the discussion next meeting
mayhem: can you ask our lawyers if anyone has *any* idea what to make of this copyright-wise?
Or Cory, dunno
mayhem
the answer is: no one does.
its all too early to tell.
atj
well in that case i'd say blanket ban and review in a year or something, but it depends on the attitude to risk
reosarevok
I'd personally prefer a ban but I can see the point of allowing it for people who don't have great English or whatnot (I'd much rather they posted in their own language for us to translate tbh since they might not know the bots are changing their meaning, but)
mayhem
agreed.
reosarevok
I think it 100% needs to be disclaimed and at least that much could be part of the general CoC
atj
why can't you just use google translate for that?
mayhem
shall we let this sit for a week, ponder and conclude next week?
yvanzo
Ideally I would vote A but I'm not sure we can enforce it, so C would be at least something.
reosarevok
I think that's a good idea
Maybe aerozol can make a forum post with the options too and collect some community input
Let's finish for today and revisit next Monday then :)
Thanks everyone!
</BANG>
yvanzo
B would require to be able to draw a line.
reosarevok
We can keep talking on the topic if we want to, anyway :)
yvanzo: that's my main worry, yes. I think the idea of B is "if we can clearly tell it's 'AI' then it's too 'AI'" :p
But that only helps now, better products will be harder to detect
yvanzo
It would be nice to have such notes about agenda topics ahead of meetings.
(when possible)
akshaaatt
I would vote for C as well. Because as yvanzo said, at least it is enforceable.
But as atj mentioned, would that lead to a potential strike?
reosarevok
I mean, I'm not sure tbh it's more enforceable to say "you must disclaim LLMs" than "you cannot use LLMs"
If we cannot tell for A, how do we enforce the disclaimer? :D
yvanzo
C isn’t really enforceable either (we may not be able to detect the lack of disclaimer) but at least it offers an option to submitter to do it right.
reosarevok
yvanzo: that's a fair point, although probably most people don't read the agenda beforehand :)
akshaaatt
True
yvanzo
I have not been able to read the full topic and reactions so far.
akshaaatt
Why don't we make our own AI to detect another AI?
akshaaatt jumps and hides
reosarevok
Well, good thing we have another week to think
Hopefully aerozol can make a forum thread and we can see what people think. atj: re using Google Translate for that... because the hype is elsewhere seems to be the answer tbh
I cannot imagine these tools are anywhere as good yet
rdswift has quit
yvanzo
Just read it all.
Can start with CB guidelines first and propagate to MeB CoC later on if it makes sense.
atj
I believe Google Translate uses tensors which is a big part of what made LLMs feasible AFAIU
it's another form of neural network
reosarevok
tensors, huh
Tension, apprehension, and dissension have begun
rdswift joined the channel
yvanzo
Also a close current practice: most userscripts already add a signature in MB edit notes.
If you use a tool (of any kind) to generate text, mention it.
reosarevok
(please tell me I'm not the only person here who read The Demolished Man :D)