[musicbrainz-server] mwiencek merged pull request #2082 (schema-change-2021-q2…mbs-11438-2): MBS-10962, MBS-11438, MBS-11460: Speed up listing artist releases/release groups https://github.com/metabrainz/musicbrainz-serve...
Major_Lurker has quit
D4RK-PH0ENiX joined the channel
d4rkie has quit
ephemer0l is now known as GeneralDiscourse
thomasross has quit
adhi001 joined the channel
sumedh joined the channel
_lucifer
alastairp: i am experimenting in setting up cache for prod image using the article you had mentioned a few days ago. so far it seems, just using build kit cuts build time by 30%
Freso is now wondering whether he’s missed more spammer/sockpuppets 😬
_lucifer
alastairp, i think i have figured why caching isn't working on release or tags, the actions caches are scoped to branches. my understanding is that each tag is a branch named ref/tags/{tag_name} so each tags get treated as a separate branch. different branches cannot access each other's cache, so no cache is found on subsequent tags.
but if we re run a job, the same tag gets built again and the cache is hit.
All of those are one cluster. I think I may not understand what you mean by that I only list two.
ruaok
ok, I dont see the other variants in the top similar users now. let me proceed with that list and we'll see.
you'll need to get him to ok that.
Freso
alastairp: ^ :)
_lucifer: Just checking, you’re not using alt. accounts to test on live-LB, right?
_lucifer
Freso: nope
Freso
Alright, good.
_lucifer
i too wonder how I am in top 100, 8 times
Freso
Apparently you listen to similar music as other people. :)
ruaok
_lucifer: I think the current configs for similar users is somehow borked. Mr_Monkey and may attempt to play with that today, to see what the matter is.
I'll rerun similar users now (without tweaking the settings).
_lucifer
ruaok, could be. i had looked into the similarity code but didn't find any issues. alastairp had mentioned that he also had some thoughts on improving similarity.
Freso
More than a half million listens gone. 🤌
💋
_lucifer
!m Freso
BrainzBot
You're doing good work, Freso!
ruaok
👏
should that report, (which is rather expensive to run on ALL listens) become a regular report?
_lucifer
ruaok, regarding deletion of users, there are two different methods because one deletes the user as well as the listens but the other only deletes listens.
Freso
I think it would be a nice one to have, yeah, but probably doesn’t need to run very frequently.
ruaok
I wonder if there is utility in running that report on the last X years only...
Freso: k, I'll see about making that happen.
_lucifer: yes. I was deleting listens directly from psql.
_lucifer: do you know if it is possible to make the admin view delete the listens as well, or do we need to create something new?
_lucifer
right. i mentioned this because we were wondering why there were two different delete methods the last week.
ruaok
oh, actually we all misread that.
_lucifer
just testing that, hence remembered to inform you.
ruaok
at least in the timescale listenstore.
one deletes a SINGLE listen, the other deletes ALL listens.
so it does make sense. but ts.delete() should be called from the admin delete function.
_lucifer
we already have a delete_user method that is used when the user deletes their accounts. we can just reuse that.
ruaok
let's
yvanzo
Freso: Looks like a bug. Do you need a direct search right now?
ruaok
_lucifer: I'm looking at the output of spark_consumer on lemmy and I don;t see any output wrt to the calculated users. even though I got the email that they were calculated.
no output at all since 03:54. that seems odd and may explain why the user similarities are so borked.
Freso
yvanzo: Nah. Crossreferencing with earlier list seems like I got all of them. If there are any stragglers, they haven’t made much of a splash, so probably not urgent to deal with them. Besides, running it again when it’s been fixed might be good regardless in case they’ve made new accounts by then. :)
ruaok
hmmm. if i change the spammy users report to focus on insert_timestamps rather than listened_at timestamps, new spammers can't get past it by submitting old listen timestamps.
regarding the per-branch cache, is this something that the docker cache action enforces, or something that github actions enforces?
_lucifer
github actions enforces that
alastairp
boo
Mr_Monkey
Interesting, thanks ruaok
_lucifer
build kit is faster to build but it does something called export layers at end which takes a lot of time
making the overall process take almost equal time
not sure, we could get rid of that. maybe it build kit postpones some processing to end, due to which the build seems faster
alastairp
yeah - buildkit doesn't emit layers at intermediate stages. I guess it does it all at the end
here's another option:
we already have all of the intermediate layers available somewhere: they were pushed to docker hub the last time we built the production image!
_lucifer
interesting thought, so we could fetch the latest built image before running the action?
alastairp
exactly
_lucifer
how difficult it is to figure out the previous tag? or should we just push twice, once as the tag we want and once as latest?
alastairp
yeah, I was just going to suggest those two options
we could get a list of releases (tags) from the github api, and just pull the 2nd most recent one
_lucifer
let's go with push twice first as it seems easier, i think docker is smart enough to not push same layers twice.
alastairp
one other consideration - there is the github container registry too. is it faster to push/pull from there than docker hub? (I have no idea)
correct, docker registry will see that they're all the same
_lucifer
github registry, we'll need to set that up first but maybe worth trying.
alastairp
so, let's do your suggestion first
see how long the pull is
_lucifer: here's something else we haven't thought about - not sure how important it is: what's our build process for beta/test? Still do it manually from our local machine?
_lucifer
i think yes, because it'll use different branches and sometimes even headless commits etc.
if we could have github run a workflow manually on a commit, then that would be useful.
alastairp
I believe that there are ways of triggering workflows via API, with arguments
we could have a bot to do it for us! but I don't think that's super useful right now
let's just continue to do it manually
_lucifer
yeah, let's continue manually for the time being and take a look again later
_lucifer: we're lucky in that some of our tests clearly affect only some subdirectories. so we can do it with spark and js, but for example I don't think we can split unit/integration
ruaok: I saw those warnings yesterday
I can't see a pattern in the timing
ruaok
the spikes happen when the cont agg is updated.
alastairp
ah
ruaok
if a user does a lot of imports/deletes of old data, we get these spikes.