in #metabrainz

17:47 PM
reosarevok

Either of you around?
17:47 PM
riksucks

Hey!
17:47 PM
atj

oh, i forgot to say, i'm on holiday from the 30th until the 8th, so I won't be around for next weeks meeting
17:48 PM
riksucks

Sorry for being a bit late. The week is mostly uneventful, but rn setting up vscode_'s branch so that I can review it better. Apart from that nothing to report. Hope u guys are doing well :)
17:49 PM
reosarevok

Same :)
17:49 PM
Ok. Last call for arsh?
17:50 PM
Ok, let's move on for now
17:50 PM
We have 10 minutes for LLM policy (aerozol via proxy)
17:50 PM
Drafts for discussion, re. more specific rules around LLM.
17:51 PM
This has come up because we have had a mini-flood of LLM written reviews in CB in the last few days. One of the reviewers claims the reviews are mainly his work, but it is impossible to see how much input the user has had into the output. Though I would say they have at substantial chatGPT content (when compared to reviews I had chatGPT write for the same artists/albums). So, we have a decision to make.
17:51 PM
With this having come up in the forums as well this year (https://community.metabrainz.org/t/ravebrainz-a... and https://community.metabrainz.org/t/embracing-th...) we could consider trying to cover all our projects instead of just CB.
17:51 PM
Two options: Cover all our projects in the MeB CoC (https://metabrainz.org/code-of-conduct). Or just address the CB reviews with a update to the CB Guidelines for now (https://critiquebrainz.org/about)
17:51 PM
I have put a set of drafts for each below, with three options: A. No LLM. B. Allow some LLM. C. Allow LLM, with a disclaimer.
17:51 PM
NB: These drafts assume that we are happy to allow tracking of LLM data (e.g. LLM composed albums) in the database.
17:51 PM
[reo's aside: I can't see any other choice than allowing it if we want to document all music)
17:51 PM
*]
17:52 PM
- Drafts for MetaBrainz CoC: - [that is, if we decide to set rules for LLM stuff globally]
17:52 PM
A. 10. Do not add LLM ('AI') text, including in community chat and notes. Note that we do allow the tracking of LLM data in our databases, for instance LLM generated albums.
17:52 PM
B. 10. Do not add LLM ('AI') text, including in community chat and notes. You may wish to use LLM as a tool, but be aware that obviously LLM generated content will be removed. Note that we do allow the tracking of LLM data in our databases, for instance LLM generated albums.
17:52 PM
C. 10. Do not add LLM ('AI') text, including in community chat and notes, without adding a disclaimer, e.g. 'Post written/revised by chatGPT'. Note that we do allow the tracking of LLM data in our databases, for instance LLM generated albums.
17:52 PM
mayhem

first thought: Is LLM the best way to describe this? is it future proof?
17:52 PM
reosarevok

- Drafts for CB guidelines: - [that is, if we decide to do CB only]
17:52 PM
A. “Your submissions must be your own original work. Do not submit content that you do not hold the copyright to, or is not your own, including LLM ('AI') generated content.”
17:52 PM
B. “Your submissions must be your own original work. Do not submit content that you do not hold the copyright to, or is not your own, including primarily LLM ('AI') generated content. You may wish to use LLM as a tool, but be aware that obviously LLM generated content will be removed.”
17:52 PM
C. “Your submissions must be your own original work. Do not submit content that you do not hold the copyright to, or is not your own. Primarily LLM (‘AI’) generated content must have a clear disclaimer at the top, e.g. ‘Review written/revised by chatGPT’”.
17:53 PM
Done with the intro :D
17:53 PM
mayhem: I dunno what is - it's certainly more future proof than chatgpt, but
17:53 PM
yvanzo

First though on whether or not it is “substantial” generated content: There are tools to check whether a content is likely to have been generated by tools.
17:53 PM
mayhem

but there will be AIs that generate text that won't be LLMs.
17:53 PM
atj

those tools are pretty crap from what i've heard
17:53 PM
mayhem

"text generating AI, such as LLMs"?
17:54 PM
yvanzo

I’ve heard the opposite
17:54 PM
mayhem

second, I think we should limit it to CB for now. I feel that applying this rule to everything is a bit overreaching when we dont understand how it might impact non CB projects.
17:55 PM
akshaaatt

True
17:55 PM
monkey

I think AI-assisted writing is an inevitable reality of the near future. Especially relevant for people who want to write in english (for reach) but who are not native speakers. IMO that's acceptable, so I'm personally more tempted by options C
17:55 PM
reosarevok

We do have issues in the forums with posts that don't sound like a real human wrote them
17:55 PM
But we can have that separately in the forum rules
17:56 PM
yvanzo

I’m not sure why specifying LLM rather than AI either.
17:56 PM
reosarevok

We also had a super long, obviously LLM generated edit note in MB recently, but
17:56 PM
yvanzo: because it's not AI :p
17:56 PM
But ok, I guess we can say "AI" instead
17:57 PM
atj

https://musicbrainz.org/doc/Annotation - "You should never add copyrighted content copied from other resources, be they online or printed."
17:57 PM
yvanzo

https://en.wikipedia.org/wiki/Large_language_model seems to be about AI.
17:57 PM
reosarevok

Well, it's about artificial neural networks, which are very much not intelligence :)
17:57 PM
monkey: B) was meant as "LLM-assisted"
17:57 PM
C) was meant as "anything goes"
17:58 PM
monkey

Oh.
17:58 PM
reosarevok

Yeah, what atj brings up is another issue - who does even own the copyright?
17:58 PM
yvanzo

But I agree with the general draft otherwise: Do not submit AI-generated content without adding a clear disclaimer.
17:58 PM
atj

LLMs often reproduce text from their training corpus verbatim or very close to verbatim. and the user has no way to know or verify it.
17:58 PM
monkey

If I write in my language and ask a LLM to translate it, is that not "primarily LLM ('AI') generated content" ?
17:58 PM
yvanzo

I think that the upcoming EU regulation goes in the same direction: add watermarking to AI-generated images.
17:58 PM
reosarevok

monkey: That seems the same as using something like Google Translate (but worse)
17:58 PM
(worse quality I mean)
17:59 PM
So I wouldn't call that primarily "AI"
17:59 PM
You wrote the thing, after all
17:59 PM
I dunno why someone would use an LLM for the translation, but :)
17:59 PM
atj

i feel that this opens up MeB to legal risks, obvious mayhem can speak to that, but it seems somewhat similar to the CAA situation
17:59 PM
yvanzo

Even if using Google Translate, it would be sane to mention the translation tool being used.
17:59 PM
reosarevok

Agreed, actually, yvanzo :)
18:00 PM
I think given the time, we should think about it this week and continue the discussion next meeting
18:00 PM
mayhem: can you ask our lawyers if anyone has *any* idea what to make of this copyright-wise?
18:00 PM
Or Cory, dunno
18:00 PM
mayhem

the answer is: no one does.
18:00 PM
its all too early to tell.
18:01 PM
atj

well in that case i'd say blanket ban and review in a year or something, but it depends on the attitude to risk
18:01 PM
reosarevok

I'd personally prefer a ban but I can see the point of allowing it for people who don't have great English or whatnot (I'd much rather they posted in their own language for us to translate tbh since they might not know the bots are changing their meaning, but)
18:01 PM
mayhem

agreed.
18:01 PM
reosarevok

I think it 100% needs to be disclaimed and at least that much could be part of the general CoC
18:01 PM
atj

why can't you just use google translate for that?
18:02 PM
mayhem

shall we let this sit for a week, ponder and conclude next week?
18:02 PM
yvanzo

Ideally I would vote A but I'm not sure we can enforce it, so C would be at least something.
18:02 PM
reosarevok

I think that's a good idea
18:02 PM
Maybe aerozol can make a forum post with the options too and collect some community input
18:02 PM
Let's finish for today and revisit next Monday then :)
18:02 PM
Thanks everyone!
18:02 PM
</BANG>
18:02 PM
yvanzo

B would require to be able to draw a line.
18:03 PM
reosarevok

We can keep talking on the topic if we want to, anyway :)
18:03 PM
yvanzo: that's my main worry, yes. I think the idea of B is "if we can clearly tell it's 'AI' then it's too 'AI'" :p
18:03 PM
But that only helps now, better products will be harder to detect
18:03 PM
yvanzo

It would be nice to have such notes about agenda topics ahead of meetings.
18:04 PM
(when possible)
18:04 PM
akshaaatt

I would vote for C as well. Because as yvanzo said, at least it is enforceable.
18:04 PM
But as atj mentioned, would that lead to a potential strike?
18:05 PM
reosarevok

I mean, I'm not sure tbh it's more enforceable to say "you must disclaim LLMs" than "you cannot use LLMs"
18:05 PM
If we cannot tell for A, how do we enforce the disclaimer? :D
18:05 PM
yvanzo

C isn’t really enforceable either (we may not be able to detect the lack of disclaimer) but at least it offers an option to submitter to do it right.
18:05 PM
reosarevok

yvanzo: that's a fair point, although probably most people don't read the agenda beforehand :)
18:05 PM
akshaaatt

True
18:05 PM
yvanzo

I have not been able to read the full topic and reactions so far.
18:06 PM
akshaaatt

Why don't we make our own AI to detect another AI?
18:06 PM
akshaaatt jumps and hides
18:07 PM
reosarevok

Well, good thing we have another week to think
18:08 PM
Hopefully aerozol can make a forum thread and we can see what people think. atj: re using Google Translate for that... because the hype is elsewhere seems to be the answer tbh
18:08 PM
I cannot imagine these tools are anywhere as good yet
18:09 PM
rdswift has quit
18:10 PM
yvanzo

Just read it all.
18:11 PM
Can start with CB guidelines first and propagate to MeB CoC later on if it makes sense.
18:11 PM
atj

I believe Google Translate uses tensors which is a big part of what made LLMs feasible AFAIU
18:11 PM
it's another form of neural network
18:12 PM
reosarevok

tensors, huh
18:12 PM
Tension, apprehension, and dissension have begun
18:12 PM
rdswift joined the channel
18:12 PM
yvanzo

Also a close current practice: most userscripts already add a signature in MB edit notes.
18:13 PM
If you use a tool (of any kind) to generate text, mention it.
18:13 PM
reosarevok

(please tell me I'm not the only person here who read The Demolished Man :D)
18:15 PM
atj

https://usercontent.irccloud-cdn.com/file/K4G6g...
18:15 PM
yvanzo

reosarevok: An alert about topic changes would help with addressing that.
18:16 PM
reosarevok

atj: hah
18:16 PM
atj

the future is here and it's stupid
18:17 PM
yvanzo

A wiki page or a gist linked from the channel topic would be enough to share details of an added agenda item.
18:18 PM
lucifer

atj: i see a VC founder offering millions to the person to build that thing.
18:21 PM
yvanzo

By the way, about AI: https://slashdot.org/story/23/06/20/2044259/
18:24 PM
atj

What's the headline, that URL won't load for me?
18:24 PM
kellnerd

> Christopher Nolan Says AI Dangers Have Been 'Apparent For Years'
18:25 PM
I like both his movies and interviews, by the way
18:25 PM
atj

Ah, Google did give me the right result
18:26 PM
yvanzo

atj: Sorry: https://slashdot.org/story/23/06/20/2044259/chr...
19:35 PM
Shubh has quit
19:35 PM
Shubh joined the channel
19:44 PM
Maxr1998_ joined the channel
19:46 PM
Maxr1998 has quit
19:47 PM
theflash_ has quit
20:29 PM
arsh

Hi mayhem:
20:33 PM
mayhem

hey. missed you at the meeting today...
20:33 PM
arsh

I am sorry about that had to be somewhere urgent
20:33 PM
mayhem

remember to send your review to reosarevok@ if you need to miss the meeting.
20:34 PM
arsh

Sure I will keep in mind
20:34 PM
mayhem

thx
20:34 PM
arsh

I have made some progress on the project if you have a moment
20:34 PM
mayhem

i do
20:34 PM
arsh

I can tell you where I am at currently
20:34 PM
Ok sure
20:34 PM
mayhem

plz do
20:34 PM
arsh

https://nfgwfd-3000.csb.app/
20:35 PM
So I have made the distances according to score now
20:35 PM
It looks much better and lively now
20:35 PM
Secondly I added the colors and the shades represent the similarity of the artists to the main artist
20:36 PM
mayhem

it does. and yet it isnt annoyingly jumping around.
20:36 PM
how did trying log2() and sqrt() go?
20:36 PM
reosarevok

Or even better, community-manager@ iirc
20:36 PM
mayhem

ah, noted, reosarevok
20:37 PM
arsh

That didn't work well but I found a way to scale range of values which works much better
20:37 PM
Basically I take in the score values and scale them up or down to 100 to 300 for the graph
20:37 PM
reosarevok

I mean it's also me :D but I think there's more people with access? If not maybe there should be
20:39 PM
mayhem

the graph looks oddly similar from artist to artist -- is there a way to introduce some randomness?
20:39 PM
and what is the scaleing that you're using? what is the formula?
20:39 PM
arsh

https://stackoverflow.com/questions/5294955/how...
20:39 PM
Here is what I made use of