#metabrainz

/

      • BrainzGit
        [musicbrainz-server] mwiencek merged pull request #2082 (schema-change-2021-q2…mbs-11438-2): MBS-10962, MBS-11438, MBS-11460: Speed up listing artist releases/release groups https://github.com/metabrainz/musicbrainz-serve...
      • Major_Lurker has quit
      • D4RK-PH0ENiX joined the channel
      • d4rkie has quit
      • ephemer0l is now known as GeneralDiscourse
      • thomasross has quit
      • adhi001 joined the channel
      • sumedh joined the channel
      • _lucifer
        alastairp: i am experimenting in setting up cache for prod image using the article you had mentioned a few days ago. so far it seems, just using build kit cuts build time by 30%
      • or maybe not. the build completed 2.5 min early then it took that much time to export layers...
      • Freso
        bitmap, yvanzo: https://musicbrainz.org/admin/user/edit/jaovytu ’s e-mail address doesn’t seem to get picked up by https://musicbrainz.org/admin/email-search - it works if entered verbatim (with `\.`s), but that shouldn’t be necessary per MBS-11619 :\
      • BrainzBot
        MBS-11619: Ignore periods and +tags in admin e-mail searches https://tickets.metabrainz.org/browse/MBS-11619
      • Freso is now wondering whether he’s missed more spammer/sockpuppets 😬
      • _lucifer
        alastairp, i think i have figured why caching isn't working on release or tags, the actions caches are scoped to branches. my understanding is that each tag is a branch named ref/tags/{tag_name} so each tags get treated as a separate branch. different branches cannot access each other's cache, so no cache is found on subsequent tags.
      • but if we re run a job, the same tag gets built again and the cache is hit.
      • Freso
        ruaok: Ping. List of account names that need listen cleaning in latest MB account admin e-mail.
      • ruaok
        pong. will look after juice of life. thanks!
      • prabal joined the channel
      • jaovyto -- this cluser has a lot more accounts that I can see. you only list two. what is that?
      • Freso
        I only list two? There are 14 on my list in the mail?
      • ruaok
        two under ("*" denotes an account that existed in LB and may need listen cleanup)
      • those are the ones I need to take action on, yes?
      • Freso
        Uh. Maybe GMail is doing weird stuff with text/plain formatting. I’ll send you the list.
      • ruaok
      • is what I see
      • Freso
        All of those are one cluster. I think I may not understand what you mean by that I only list two.
      • ruaok
        ok, I dont see the other variants in the top similar users now. let me proceed with that list and we'll see.
      • you'll need to get him to ok that.
      • Freso
        alastairp: ^ :)
      • _lucifer: Just checking, you’re not using alt. accounts to test on live-LB, right?
      • _lucifer
        Freso: nope
      • Freso
        Alright, good.
      • _lucifer
        i too wonder how I am in top 100, 8 times
      • Freso
        Apparently you listen to similar music as other people. :)
      • ruaok
        _lucifer: I think the current configs for similar users is somehow borked. Mr_Monkey and may attempt to play with that today, to see what the matter is.
      • I'll rerun similar users now (without tweaking the settings).
      • _lucifer
        ruaok, could be. i had looked into the similarity code but didn't find any issues. alastairp had mentioned that he also had some thoughts on improving similarity.
      • Freso
        More than a half million listens gone. 🤌
      • 💋
      • _lucifer
        !m Freso
      • BrainzBot
        You're doing good work, Freso!
      • ruaok
        👏
      • should that report, (which is rather expensive to run on ALL listens) become a regular report?
      • _lucifer
        ruaok, regarding deletion of users, there are two different methods because one deletes the user as well as the listens but the other only deletes listens.
      • Freso
        I think it would be a nice one to have, yeah, but probably doesn’t need to run very frequently.
      • ruaok
        I wonder if there is utility in running that report on the last X years only...
      • Freso: k, I'll see about making that happen.
      • _lucifer: yes. I was deleting listens directly from psql.
      • _lucifer: do you know if it is possible to make the admin view delete the listens as well, or do we need to create something new?
      • _lucifer
        right. i mentioned this because we were wondering why there were two different delete methods the last week.
      • ruaok
        oh, actually we all misread that.
      • _lucifer
        just testing that, hence remembered to inform you.
      • ruaok
        at least in the timescale listenstore.
      • one deletes a SINGLE listen, the other deletes ALL listens.
      • so it does make sense. but ts.delete() should be called from the admin delete function.
      • _lucifer
        we already have a delete_user method that is used when the user deletes their accounts. we can just reuse that.
      • ruaok
        let's
      • yvanzo
        Freso: Looks like a bug. Do you need a direct search right now?
      • ruaok
        _lucifer: I'm looking at the output of spark_consumer on lemmy and I don;t see any output wrt to the calculated users. even though I got the email that they were calculated.
      • no output at all since 03:54. that seems odd and may explain why the user similarities are so borked.
      • Freso
        yvanzo: Nah. Crossreferencing with earlier list seems like I got all of them. If there are any stragglers, they haven’t made much of a splash, so probably not urgent to deal with them. Besides, running it again when it’s been fixed might be good regardless in case they’ve made new accounts by then. :)
      • ruaok
        hmmm. if i change the spammy users report to focus on insert_timestamps rather than listened_at timestamps, new spammers can't get past it by submitting old listen timestamps.
      • that will make this report much more effective.
      • err faster.
      • _lucifer
      • yeah, strange. spark side has some logs but lemmy doesn't
      • yvanzo
        Freso: just found the faulty code, seems easy to fix.
      • Freso: it affects any email with more than one period (.) in user info (that is before @).
      • prabal
        Hello everyone
      • yvanzo
        The 'g' flag (for multiple match) was set but probably not correctly.
      • prabal
        my exams are finished. College over :)
      • yvanzo
        Hi prabal, congrats :)
      • prabal
        thankss
      • BrainzGit
        [listenbrainz-server] amCap1712 opened pull request #1450 (master…delete-listens): Delete listens as well when user is deleted using admin console https://github.com/metabrainz/listenbrainz-serv...
      • _lucifer
        ruaok: ^, sweet and short fix. tested locally that it works.
      • ruaok
        is delete_model() some sort of magic thing that gets called if it exists?
      • _lucifer
        when the delete button in the admin console is clicked, delete_model gets called.
      • Mr_Monkey
        Hi prabal ! Glad for you that exams are over
      • *and$ college over !
      • _lucifer
        the default impl is to delete the associated model, in this case that turns out to be the entry in the users table
      • ruaok
        _lucifer: great. we should remember to ping Freso when we release this, so he can delete users directly then.
      • _lucifer
        👍
      • and indeed its magic https://flask-admin.readthedocs.io/en/latest/ap... . the documentation just says Delete Model 😞
      • Freso
        prabal: 🥳
      • yvanzo: Ah. That’d do it, yeah. Thanks for prodding at it. :)
      • _lucifer
      • the test description says it tests redis but ls is the timesclae listenstore
      • Freso
        Mr_Monkey: Does BB have a (public) API/WS currently?
      • _lucifer
        nvm, i see get_timestamps methods calls cache underneath
      • Mr_Monkey
        Freso: Public but in alpha version : https://api.test.bookbrainz.org/1/docs/
      • ruaok
        the comments needs improving. let me do that.
      • Freso
        Alright.
      • Mr_Monkey
        And running off of the test database, I'll add
      • Freso
        I’ve been poking at some Calibre plugins a bit, so was considering trying to revive the BB plugin too. 👀
      • Mr_Monkey
        Nice !
      • Let me know if you need anything
      • alastairp
        hi _lucifer, lots of interesting stuff this morning, thanks
      • ruaok
        _lucifer: improved comment pushed.
      • Mr_Monkey: alastairp : this may be of interest to you: https://wise.com/gb/blog/iban-discrimination
      • _lucifer
        :D
      • ruaok
        I'm hoping this is my ticket for us to ditch BBVA who is going to lock our account AGAIN.
      • alastairp
        if buildkit is faster then let's use that!
      • ruaok packs up and head to the office
      • _lucifer
        ruaok: thanks, another thing i see https://github.com/metabrainz/listenbrainz-serv... can we do min, max in same query?
      • alastairp
        regarding the per-branch cache, is this something that the docker cache action enforces, or something that github actions enforces?
      • _lucifer
        github actions enforces that
      • alastairp
        boo
      • Mr_Monkey
        Interesting, thanks ruaok
      • _lucifer
        build kit is faster to build but it does something called export layers at end which takes a lot of time
      • making the overall process take almost equal time
      • not sure, we could get rid of that. maybe it build kit postpones some processing to end, due to which the build seems faster
      • alastairp
        yeah - buildkit doesn't emit layers at intermediate stages. I guess it does it all at the end
      • here's another option:
      • we already have all of the intermediate layers available somewhere: they were pushed to docker hub the last time we built the production image!
      • _lucifer
        interesting thought, so we could fetch the latest built image before running the action?
      • alastairp
        exactly
      • _lucifer
        how difficult it is to figure out the previous tag? or should we just push twice, once as the tag we want and once as latest?
      • alastairp
        yeah, I was just going to suggest those two options
      • we could get a list of releases (tags) from the github api, and just pull the 2nd most recent one
      • _lucifer
        let's go with push twice first as it seems easier, i think docker is smart enough to not push same layers twice.
      • alastairp
        one other consideration - there is the github container registry too. is it faster to push/pull from there than docker hub? (I have no idea)
      • correct, docker registry will see that they're all the same
      • _lucifer
        github registry, we'll need to set that up first but maybe worth trying.
      • alastairp
        so, let's do your suggestion first
      • see how long the pull is
      • _lucifer: here's something else we haven't thought about - not sure how important it is: what's our build process for beta/test? Still do it manually from our local machine?
      • _lucifer
        i think yes, because it'll use different branches and sometimes even headless commits etc.
      • if we could have github run a workflow manually on a commit, then that would be useful.
      • alastairp
        I believe that there are ways of triggering workflows via API, with arguments
      • we could have a bot to do it for us! but I don't think that's super useful right now
      • let's just continue to do it manually
      • _lucifer
        yeah, let's continue manually for the time being and take a look again later
      • alastairp
        _lucifer: I was thinking about LB-895 yesterday
      • BrainzBot
        LB-895: Only run frontend and spark tests if affected files have changed https://tickets.metabrainz.org/browse/LB-895
      • _lucifer
        that's a nice one, i have seen other projects do it.
      • Mr_Monkey
        👍
      • I mean +1
      • Mr_Monkey takes off for OfficeBrainz
      • ruaok
        _lucifer: no idea. let me look at the explain for both. I suppose if it is is full index scan, it should do both.
      • _lucifer
        yeah, i checked it works. but not sure if it worsens the running time.
      • ruaok
      • alastairp
        _lucifer: we're lucky in that some of our tests clearly affect only some subdirectories. so we can do it with spark and js, but for example I don't think we can split unit/integration
      • ruaok: I saw those warnings yesterday
      • I can't see a pattern in the timing
      • ruaok
        the spikes happen when the cont agg is updated.
      • alastairp
        ah
      • ruaok
        if a user does a lot of imports/deletes of old data, we get these spikes.