in #metabrainz

0:37 AM
BrainzGit

[musicbrainz-server] mwiencek merged pull request #2082 (schema-change-2021-q2…mbs-11438-2): MBS-10962, MBS-11438, MBS-11460: Speed up listing artist releases/release groups https://github.com/metabrainz/musicbrainz-serve...
2:00 AM
Major_Lurker has quit
2:49 AM
D4RK-PH0ENiX joined the channel
2:49 AM
d4rkie has quit
2:50 AM
ephemer0l is now known as GeneralDiscourse
3:04 AM
thomasross has quit
3:34 AM
adhi001 joined the channel
4:13 AM
sumedh joined the channel
7:02 AM
_lucifer

alastairp: i am experimenting in setting up cache for prod image using the article you had mentioned a few days ago. so far it seems, just using build kit cuts build time by 30%
7:03 AM
https://github.com/amCap1712/listenbrainz-serve...
7:15 AM
or maybe not. the build completed 2.5 min early then it took that much time to export layers...
7:56 AM
Freso

bitmap, yvanzo: https://musicbrainz.org/admin/user/edit/jaovytu ’s e-mail address doesn’t seem to get picked up by https://musicbrainz.org/admin/email-search - it works if entered verbatim (with `\.`s), but that shouldn’t be necessary per MBS-11619 :\
7:57 AM
BrainzBot

MBS-11619: Ignore periods and +tags in admin e-mail searches https://tickets.metabrainz.org/browse/MBS-11619
7:57 AM
Freso is now wondering whether he’s missed more spammer/sockpuppets 😬
8:04 AM
_lucifer

alastairp, i think i have figured why caching isn't working on release or tags, the actions caches are scoped to branches. my understanding is that each tag is a branch named ref/tags/{tag_name} so each tags get treated as a separate branch. different branches cannot access each other's cache, so no cache is found on subsequent tags.
8:05 AM
but if we re run a job, the same tag gets built again and the cache is hit.
8:06 AM
this happened here https://github.com/amCap1712/listenbrainz-serve...
8:13 AM
Freso

ruaok: Ping. List of account names that need listen cleaning in latest MB account admin e-mail.
8:14 AM
ruaok

pong. will look after juice of life. thanks!
8:14 AM
prabal joined the channel
8:16 AM
jaovyto -- this cluser has a lot more accounts that I can see. you only list two. what is that?
8:16 AM
Freso

I only list two? There are 14 on my list in the mail?
8:17 AM
ruaok

two under ("*" denotes an account that existed in LB and may need listen cleanup)
8:17 AM
those are the ones I need to take action on, yes?
8:17 AM
Freso

Uh. Maybe GMail is doing weird stuff with text/plain formatting. I’ll send you the list.
8:18 AM
ruaok

https://usercontent.irccloud-cdn.com/file/OMlJ5...
8:18 AM
is what I see
8:20 AM
Freso

All of those are one cluster. I think I may not understand what you mean by that I only list two.
8:20 AM
ruaok

ok, I dont see the other variants in the top similar users now. let me proceed with that list and we'll see.
8:28 AM
you'll need to get him to ok that.
8:30 AM
Freso

alastairp: ^ :)
8:31 AM
_lucifer: Just checking, you’re not using alt. accounts to test on live-LB, right?
8:31 AM
_lucifer

Freso: nope
8:31 AM
Freso

Alright, good.
8:32 AM
_lucifer

i too wonder how I am in top 100, 8 times
8:35 AM
Freso

Apparently you listen to similar music as other people. :)
8:37 AM
ruaok

_lucifer: I think the current configs for similar users is somehow borked. Mr_Monkey and may attempt to play with that today, to see what the matter is.
8:37 AM
https://www.irccloud.com/pastebin/k9fhS94D/
8:38 AM
I'll rerun similar users now (without tweaking the settings).
8:39 AM
_lucifer

ruaok, could be. i had looked into the similarity code but didn't find any issues. alastairp had mentioned that he also had some thoughts on improving similarity.
8:39 AM
Freso

More than a half million listens gone. 🤌
8:39 AM
💋
8:39 AM
_lucifer

!m Freso
8:39 AM
BrainzBot

You're doing good work, Freso!
8:39 AM
ruaok

👏
8:42 AM
should that report, (which is rather expensive to run on ALL listens) become a regular report?
8:42 AM
_lucifer

ruaok, regarding deletion of users, there are two different methods because one deletes the user as well as the listens but the other only deletes listens.
8:42 AM
Freso

I think it would be a nice one to have, yeah, but probably doesn’t need to run very frequently.
8:42 AM
ruaok

I wonder if there is utility in running that report on the last X years only...
8:43 AM
Freso: k, I'll see about making that happen.
8:43 AM
_lucifer: yes. I was deleting listens directly from psql.
8:44 AM
_lucifer: do you know if it is possible to make the admin view delete the listens as well, or do we need to create something new?
8:44 AM
_lucifer

right. i mentioned this because we were wondering why there were two different delete methods the last week.
8:44 AM
ruaok

oh, actually we all misread that.
8:44 AM
_lucifer

just testing that, hence remembered to inform you.
8:44 AM
ruaok

at least in the timescale listenstore.
8:44 AM
one deletes a SINGLE listen, the other deletes ALL listens.
8:45 AM
so it does make sense. but ts.delete() should be called from the admin delete function.
8:46 AM
_lucifer

we already have a delete_user method that is used when the user deletes their accounts. we can just reuse that.
8:46 AM
ruaok

let's
8:48 AM
yvanzo

Freso: Looks like a bug. Do you need a direct search right now?
8:48 AM
ruaok

_lucifer: I'm looking at the output of spark_consumer on lemmy and I don;t see any output wrt to the calculated users. even though I got the email that they were calculated.
8:49 AM
no output at all since 03:54. that seems odd and may explain why the user similarities are so borked.
8:49 AM
Freso

yvanzo: Nah. Crossreferencing with earlier list seems like I got all of them. If there are any stragglers, they haven’t made much of a splash, so probably not urgent to deal with them. Besides, running it again when it’s been fixed might be good regardless in case they’ve made new accounts by then. :)
8:51 AM
ruaok

hmmm. if i change the spammy users report to focus on insert_timestamps rather than listened_at timestamps, new spammers can't get past it by submitting old listen timestamps.
8:51 AM
that will make this report much more effective.
8:51 AM
err faster.
8:52 AM
_lucifer

https://www.irccloud.com/pastebin/XDsI33ES/
8:52 AM
yeah, strange. spark side has some logs but lemmy doesn't
8:53 AM
yvanzo

Freso: just found the faulty code, seems easy to fix.
8:54 AM
Freso: it affects any email with more than one period (.) in user info (that is before @).
8:54 AM
prabal

Hello everyone
8:55 AM
yvanzo

The 'g' flag (for multiple match) was set but probably not correctly.
8:55 AM
prabal

my exams are finished. College over :)
8:55 AM
yvanzo

Hi prabal, congrats :)
8:56 AM
prabal

thankss
9:00 AM
BrainzGit

[listenbrainz-server] amCap1712 opened pull request #1450 (master…delete-listens): Delete listens as well when user is deleted using admin console https://github.com/metabrainz/listenbrainz-serv...
9:01 AM
_lucifer

ruaok: ^, sweet and short fix. tested locally that it works.
9:03 AM
ruaok

is delete_model() some sort of magic thing that gets called if it exists?
9:03 AM
_lucifer

when the delete button in the admin console is clicked, delete_model gets called.
9:04 AM
Mr_Monkey

Hi prabal ! Glad for you that exams are over
9:04 AM
*and$ college over !
9:04 AM
_lucifer

the default impl is to delete the associated model, in this case that turns out to be the entry in the users table
9:05 AM
ruaok

_lucifer: great. we should remember to ping Freso when we release this, so he can delete users directly then.
9:06 AM
_lucifer

👍
9:09 AM
and indeed its magic https://flask-admin.readthedocs.io/en/latest/ap... . the documentation just says Delete Model 😞
9:11 AM
Freso

prabal: 🥳
9:11 AM
yvanzo: Ah. That’d do it, yeah. Thanks for prodding at it. :)
9:13 AM
_lucifer

ruaok: https://github.com/metabrainz/listenbrainz-serv...
9:14 AM
the test description says it tests redis but ls is the timesclae listenstore
9:15 AM
Freso

Mr_Monkey: Does BB have a (public) API/WS currently?
9:15 AM
_lucifer

nvm, i see get_timestamps methods calls cache underneath
9:16 AM
Mr_Monkey

Freso: Public but in alpha version : https://api.test.bookbrainz.org/1/docs/
9:16 AM
ruaok

the comments needs improving. let me do that.
9:16 AM
Freso

Alright.
9:16 AM
Mr_Monkey

And running off of the test database, I'll add
9:16 AM
Freso

I’ve been poking at some Calibre plugins a bit, so was considering trying to revive the BB plugin too. 👀
9:17 AM
Mr_Monkey

Nice !
9:17 AM
Let me know if you need anything
9:18 AM
alastairp

hi _lucifer, lots of interesting stuff this morning, thanks
9:18 AM
ruaok

_lucifer: improved comment pushed.
9:19 AM
Mr_Monkey: alastairp : this may be of interest to you: https://wise.com/gb/blog/iban-discrimination
9:19 AM
_lucifer

:D
9:19 AM
ruaok

I'm hoping this is my ticket for us to ditch BBVA who is going to lock our account AGAIN.
9:19 AM
alastairp

if buildkit is faster then let's use that!
9:19 AM
ruaok packs up and head to the office
9:20 AM
_lucifer

ruaok: thanks, another thing i see https://github.com/metabrainz/listenbrainz-serv... can we do min, max in same query?
9:20 AM
alastairp

regarding the per-branch cache, is this something that the docker cache action enforces, or something that github actions enforces?
9:20 AM
_lucifer

github actions enforces that
9:20 AM
alastairp

boo
9:20 AM
Mr_Monkey

Interesting, thanks ruaok
9:21 AM
_lucifer

build kit is faster to build but it does something called export layers at end which takes a lot of time
9:21 AM
making the overall process take almost equal time
9:22 AM
not sure, we could get rid of that. maybe it build kit postpones some processing to end, due to which the build seems faster
9:22 AM
alastairp

yeah - buildkit doesn't emit layers at intermediate stages. I guess it does it all at the end
9:22 AM
here's another option:
9:23 AM
we already have all of the intermediate layers available somewhere: they were pushed to docker hub the last time we built the production image!
9:25 AM
_lucifer

interesting thought, so we could fetch the latest built image before running the action?
9:25 AM
alastairp

exactly
9:27 AM
_lucifer

how difficult it is to figure out the previous tag? or should we just push twice, once as the tag we want and once as latest?
9:27 AM
alastairp

yeah, I was just going to suggest those two options
9:28 AM
we could get a list of releases (tags) from the github api, and just pull the 2nd most recent one
9:28 AM
_lucifer

let's go with push twice first as it seems easier, i think docker is smart enough to not push same layers twice.
9:28 AM
alastairp

one other consideration - there is the github container registry too. is it faster to push/pull from there than docker hub? (I have no idea)
9:28 AM
correct, docker registry will see that they're all the same
9:29 AM
_lucifer

github registry, we'll need to set that up first but maybe worth trying.
9:30 AM
alastairp

so, let's do your suggestion first
9:30 AM
see how long the pull is
9:32 AM
_lucifer: here's something else we haven't thought about - not sure how important it is: what's our build process for beta/test? Still do it manually from our local machine?
9:33 AM
_lucifer

i think yes, because it'll use different branches and sometimes even headless commits etc.
9:34 AM
if we could have github run a workflow manually on a commit, then that would be useful.
9:41 AM
alastairp

I believe that there are ways of triggering workflows via API, with arguments
9:41 AM
we could have a bot to do it for us! but I don't think that's super useful right now
9:42 AM
let's just continue to do it manually
9:43 AM
_lucifer

yeah, let's continue manually for the time being and take a look again later
9:44 AM
alastairp

_lucifer: I was thinking about LB-895 yesterday
9:44 AM
BrainzBot

LB-895: Only run frontend and spark tests if affected files have changed https://tickets.metabrainz.org/browse/LB-895
9:44 AM
_lucifer

that's a nice one, i have seen other projects do it.
9:44 AM
Mr_Monkey

👍
9:44 AM
I mean +1
9:45 AM
Mr_Monkey takes off for OfficeBrainz
9:45 AM
ruaok

_lucifer: no idea. let me look at the explain for both. I suppose if it is is full index scan, it should do both.
9:46 AM
_lucifer

yeah, i checked it works. but not sure if it worsens the running time.
9:47 AM
ruaok

alastairp: check this out: https://stats.metabrainz.org/d/000000077/all-di...
9:47 AM
alastairp

_lucifer: we're lucky in that some of our tests clearly affect only some subdirectories. so we can do it with spark and js, but for example I don't think we can split unit/integration
9:48 AM
ruaok: I saw those warnings yesterday
9:48 AM
I can't see a pattern in the timing
9:48 AM
ruaok

the spikes happen when the cont agg is updated.
9:48 AM
alastairp

ah
9:49 AM
ruaok

if a user does a lot of imports/deletes of old data, we get these spikes.