#metabrainz

/

      • reosarevok
        Either of you around?
      • riksucks
        Hey!
      • atj
        oh, i forgot to say, i'm on holiday from the 30th until the 8th, so I won't be around for next weeks meeting
      • riksucks
        Sorry for being a bit late. The week is mostly uneventful, but rn setting up vscode_'s branch so that I can review it better. Apart from that nothing to report. Hope u guys are doing well :)
      • reosarevok
        Same :)
      • Ok. Last call for arsh?
      • Ok, let's move on for now
      • We have 10 minutes for LLM policy (aerozol via proxy)
      • Drafts for discussion, re. more specific rules around LLM.
      • This has come up because we have had a mini-flood of LLM written reviews in CB in the last few days. One of the reviewers claims the reviews are mainly his work, but it is impossible to see how much input the user has had into the output. Though I would say they have at substantial chatGPT content (when compared to reviews I had chatGPT write for the same artists/albums). So, we have a decision to make.
      • With this having come up in the forums as well this year (https://community.metabrainz.org/t/ravebrainz-a... and https://community.metabrainz.org/t/embracing-th...) we could consider trying to cover all our projects instead of just CB.
      • Two options: Cover all our projects in the MeB CoC (https://metabrainz.org/code-of-conduct). Or just address the CB reviews with a update to the CB Guidelines for now (https://critiquebrainz.org/about)
      • I have put a set of drafts for each below, with three options: A. No LLM. B. Allow some LLM. C. Allow LLM, with a disclaimer.
      • NB: These drafts assume that we are happy to allow tracking of LLM data (e.g. LLM composed albums) in the database.
      • [reo's aside: I can't see any other choice than allowing it if we want to document all music)
      • *]
      • - Drafts for MetaBrainz CoC: - [that is, if we decide to set rules for LLM stuff globally]
      • A. 10. Do not add LLM ('AI') text, including in community chat and notes. Note that we do allow the tracking of LLM data in our databases, for instance LLM generated albums.
      • B. 10. Do not add LLM ('AI') text, including in community chat and notes. You may wish to use LLM as a tool, but be aware that obviously LLM generated content will be removed. Note that we do allow the tracking of LLM data in our databases, for instance LLM generated albums.
      • C. 10. Do not add LLM ('AI') text, including in community chat and notes, without adding a disclaimer, e.g. 'Post written/revised by chatGPT'. Note that we do allow the tracking of LLM data in our databases, for instance LLM generated albums.
      • mayhem
        first thought: Is LLM the best way to describe this? is it future proof?
      • reosarevok
        - Drafts for CB guidelines: - [that is, if we decide to do CB only]
      • A. “Your submissions must be your own original work. Do not submit content that you do not hold the copyright to, or is not your own, including LLM ('AI') generated content.”
      • B. “Your submissions must be your own original work. Do not submit content that you do not hold the copyright to, or is not your own, including primarily LLM ('AI') generated content. You may wish to use LLM as a tool, but be aware that obviously LLM generated content will be removed.”
      • C. “Your submissions must be your own original work. Do not submit content that you do not hold the copyright to, or is not your own. Primarily LLM (‘AI’) generated content must have a clear disclaimer at the top, e.g. ‘Review written/revised by chatGPT’”.
      • Done with the intro :D
      • mayhem: I dunno what is - it's certainly more future proof than chatgpt, but
      • yvanzo
        First though on whether or not it is “substantial” generated content: There are tools to check whether a content is likely to have been generated by tools.
      • mayhem
        but there will be AIs that generate text that won't be LLMs.
      • atj
        those tools are pretty crap from what i've heard
      • mayhem
        "text generating AI, such as LLMs"?
      • yvanzo
        I’ve heard the opposite
      • mayhem
        second, I think we should limit it to CB for now. I feel that applying this rule to everything is a bit overreaching when we dont understand how it might impact non CB projects.
      • akshaaatt
        True
      • monkey
        I think AI-assisted writing is an inevitable reality of the near future. Especially relevant for people who want to write in english (for reach) but who are not native speakers. IMO that's acceptable, so I'm personally more tempted by options C
      • reosarevok
        We do have issues in the forums with posts that don't sound like a real human wrote them
      • But we can have that separately in the forum rules
      • yvanzo
        I’m not sure why specifying LLM rather than AI either.
      • reosarevok
        We also had a super long, obviously LLM generated edit note in MB recently, but
      • yvanzo: because it's not AI :p
      • But ok, I guess we can say "AI" instead
      • atj
        https://musicbrainz.org/doc/Annotation - "You should never add copyrighted content copied from other resources, be they online or printed."
      • yvanzo
      • reosarevok
        Well, it's about artificial neural networks, which are very much not intelligence :)
      • monkey: B) was meant as "LLM-assisted"
      • C) was meant as "anything goes"
      • monkey
        Oh.
      • reosarevok
        Yeah, what atj brings up is another issue - who does even own the copyright?
      • yvanzo
        But I agree with the general draft otherwise: Do not submit AI-generated content without adding a clear disclaimer.
      • atj
        LLMs often reproduce text from their training corpus verbatim or very close to verbatim. and the user has no way to know or verify it.
      • monkey
        If I write in my language and ask a LLM to translate it, is that not "primarily LLM ('AI') generated content" ?
      • yvanzo
        I think that the upcoming EU regulation goes in the same direction: add watermarking to AI-generated images.
      • reosarevok
        monkey: That seems the same as using something like Google Translate (but worse)
      • (worse quality I mean)
      • So I wouldn't call that primarily "AI"
      • You wrote the thing, after all
      • I dunno why someone would use an LLM for the translation, but :)
      • atj
        i feel that this opens up MeB to legal risks, obvious mayhem can speak to that, but it seems somewhat similar to the CAA situation
      • yvanzo
        Even if using Google Translate, it would be sane to mention the translation tool being used.
      • reosarevok
        Agreed, actually, yvanzo :)
      • I think given the time, we should think about it this week and continue the discussion next meeting
      • mayhem: can you ask our lawyers if anyone has *any* idea what to make of this copyright-wise?
      • Or Cory, dunno
      • mayhem
        the answer is: no one does.
      • its all too early to tell.
      • atj
        well in that case i'd say blanket ban and review in a year or something, but it depends on the attitude to risk
      • reosarevok
        I'd personally prefer a ban but I can see the point of allowing it for people who don't have great English or whatnot (I'd much rather they posted in their own language for us to translate tbh since they might not know the bots are changing their meaning, but)
      • mayhem
        agreed.
      • reosarevok
        I think it 100% needs to be disclaimed and at least that much could be part of the general CoC
      • atj
        why can't you just use google translate for that?
      • mayhem
        shall we let this sit for a week, ponder and conclude next week?
      • yvanzo
        Ideally I would vote A but I'm not sure we can enforce it, so C would be at least something.
      • reosarevok
        I think that's a good idea
      • Maybe aerozol can make a forum post with the options too and collect some community input
      • Let's finish for today and revisit next Monday then :)
      • Thanks everyone!
      • </BANG>
      • yvanzo
        B would require to be able to draw a line.
      • reosarevok
        We can keep talking on the topic if we want to, anyway :)
      • yvanzo: that's my main worry, yes. I think the idea of B is "if we can clearly tell it's 'AI' then it's too 'AI'" :p
      • But that only helps now, better products will be harder to detect
      • yvanzo
        It would be nice to have such notes about agenda topics ahead of meetings.
      • (when possible)
      • akshaaatt
        I would vote for C as well. Because as yvanzo said, at least it is enforceable.
      • But as atj mentioned, would that lead to a potential strike?
      • reosarevok
        I mean, I'm not sure tbh it's more enforceable to say "you must disclaim LLMs" than "you cannot use LLMs"
      • If we cannot tell for A, how do we enforce the disclaimer? :D
      • yvanzo
        C isn’t really enforceable either (we may not be able to detect the lack of disclaimer) but at least it offers an option to submitter to do it right.
      • reosarevok
        yvanzo: that's a fair point, although probably most people don't read the agenda beforehand :)
      • akshaaatt
        True
      • yvanzo
        I have not been able to read the full topic and reactions so far.
      • akshaaatt
        Why don't we make our own AI to detect another AI?
      • akshaaatt jumps and hides
      • reosarevok
        Well, good thing we have another week to think
      • Hopefully aerozol can make a forum thread and we can see what people think. atj: re using Google Translate for that... because the hype is elsewhere seems to be the answer tbh
      • I cannot imagine these tools are anywhere as good yet
      • rdswift has quit
      • yvanzo
        Just read it all.
      • Can start with CB guidelines first and propagate to MeB CoC later on if it makes sense.
      • atj
        I believe Google Translate uses tensors which is a big part of what made LLMs feasible AFAIU
      • it's another form of neural network
      • reosarevok
        tensors, huh
      • Tension, apprehension, and dissension have begun
      • rdswift joined the channel
      • yvanzo
        Also a close current practice: most userscripts already add a signature in MB edit notes.
      • If you use a tool (of any kind) to generate text, mention it.
      • reosarevok
        (please tell me I'm not the only person here who read The Demolished Man :D)
      • atj
      • yvanzo
        reosarevok: An alert about topic changes would help with addressing that.
      • reosarevok
        atj: hah
      • atj
        the future is here and it's stupid
      • yvanzo
        A wiki page or a gist linked from the channel topic would be enough to share details of an added agenda item.
      • lucifer
        atj: i see a VC founder offering millions to the person to build that thing.
      • yvanzo
      • atj
        What's the headline, that URL won't load for me?
      • kellnerd
        > Christopher Nolan Says AI Dangers Have Been 'Apparent For Years'
      • I like both his movies and interviews, by the way
      • atj
        Ah, Google did give me the right result
      • yvanzo
      • Shubh has quit
      • Shubh joined the channel
      • Maxr1998_ joined the channel
      • Maxr1998 has quit
      • theflash_ has quit
      • arsh
        Hi mayhem:
      • mayhem
        hey. missed you at the meeting today...
      • arsh
        I am sorry about that had to be somewhere urgent
      • mayhem
        remember to send your review to reosarevok@ if you need to miss the meeting.
      • arsh
        Sure I will keep in mind
      • mayhem
        thx
      • arsh
        I have made some progress on the project if you have a moment
      • mayhem
        i do
      • arsh
        I can tell you where I am at currently
      • Ok sure
      • mayhem
        plz do
      • arsh
      • So I have made the distances according to score now
      • It looks much better and lively now
      • Secondly I added the colors and the shades represent the similarity of the artists to the main artist
      • mayhem
        it does. and yet it isnt annoyingly jumping around.
      • how did trying log2() and sqrt() go?
      • reosarevok
        Or even better, community-manager@ iirc
      • mayhem
        ah, noted, reosarevok
      • arsh
        That didn't work well but I found a way to scale range of values which works much better
      • Basically I take in the score values and scale them up or down to 100 to 300 for the graph
      • reosarevok
        I mean it's also me :D but I think there's more people with access? If not maybe there should be
      • mayhem
        the graph looks oddly similar from artist to artist -- is there a way to introduce some randomness?
      • and what is the scaleing that you're using? what is the formula?
      • arsh
      • Here is what I made use of