[musicbrainz-server] mwiencek merged pull request #2082 (schema-change-2021-q2…mbs-11438-2): MBS-10962, MBS-11438, MBS-11460: Speed up listing artist releases/release groups https://github.com/metabrainz/musicbrainz-server/…
2021-05-12 13212, 2021
Major_Lurker has quit
2021-05-12 13230, 2021
D4RK-PH0ENiX joined the channel
2021-05-12 13243, 2021
d4rkie has quit
2021-05-12 13244, 2021
ephemer0l is now known as GeneralDiscourse
2021-05-12 13206, 2021
thomasross has quit
2021-05-12 13236, 2021
adhi001 joined the channel
2021-05-12 13219, 2021
sumedh joined the channel
2021-05-12 13258, 2021
_lucifer
alastairp: i am experimenting in setting up cache for prod image using the article you had mentioned a few days ago. so far it seems, just using build kit cuts build time by 30%
Freso is now wondering whether he’s missed more spammer/sockpuppets 😬
2021-05-12 13248, 2021
_lucifer
alastairp, i think i have figured why caching isn't working on release or tags, the actions caches are scoped to branches. my understanding is that each tag is a branch named ref/tags/{tag_name} so each tags get treated as a separate branch. different branches cannot access each other's cache, so no cache is found on subsequent tags.
2021-05-12 13241, 2021
_lucifer
but if we re run a job, the same tag gets built again and the cache is hit.
All of those are one cluster. I think I may not understand what you mean by that I only list two.
2021-05-12 13216, 2021
ruaok
ok, I dont see the other variants in the top similar users now. let me proceed with that list and we'll see.
2021-05-12 13217, 2021
ruaok
you'll need to get him to ok that.
2021-05-12 13238, 2021
Freso
alastairp: ^ :)
2021-05-12 13210, 2021
Freso
_lucifer: Just checking, you’re not using alt. accounts to test on live-LB, right?
2021-05-12 13224, 2021
_lucifer
Freso: nope
2021-05-12 13233, 2021
Freso
Alright, good.
2021-05-12 13244, 2021
_lucifer
i too wonder how I am in top 100, 8 times
2021-05-12 13227, 2021
Freso
Apparently you listen to similar music as other people. :)
2021-05-12 13214, 2021
ruaok
_lucifer: I think the current configs for similar users is somehow borked. Mr_Monkey and may attempt to play with that today, to see what the matter is.
I'll rerun similar users now (without tweaking the settings).
2021-05-12 13202, 2021
_lucifer
ruaok, could be. i had looked into the similarity code but didn't find any issues. alastairp had mentioned that he also had some thoughts on improving similarity.
2021-05-12 13226, 2021
Freso
More than a half million listens gone. 🤌
2021-05-12 13239, 2021
Freso
💋
2021-05-12 13248, 2021
_lucifer
!m Freso
2021-05-12 13248, 2021
BrainzBot
You're doing good work, Freso!
2021-05-12 13251, 2021
ruaok
👏
2021-05-12 13215, 2021
ruaok
should that report, (which is rather expensive to run on ALL listens) become a regular report?
2021-05-12 13249, 2021
_lucifer
ruaok, regarding deletion of users, there are two different methods because one deletes the user as well as the listens but the other only deletes listens.
2021-05-12 13252, 2021
Freso
I think it would be a nice one to have, yeah, but probably doesn’t need to run very frequently.
2021-05-12 13253, 2021
ruaok
I wonder if there is utility in running that report on the last X years only...
2021-05-12 13214, 2021
ruaok
Freso: k, I'll see about making that happen.
2021-05-12 13232, 2021
ruaok
_lucifer: yes. I was deleting listens directly from psql.
2021-05-12 13214, 2021
ruaok
_lucifer: do you know if it is possible to make the admin view delete the listens as well, or do we need to create something new?
2021-05-12 13218, 2021
_lucifer
right. i mentioned this because we were wondering why there were two different delete methods the last week.
2021-05-12 13233, 2021
ruaok
oh, actually we all misread that.
2021-05-12 13238, 2021
_lucifer
just testing that, hence remembered to inform you.
2021-05-12 13246, 2021
ruaok
at least in the timescale listenstore.
2021-05-12 13259, 2021
ruaok
one deletes a SINGLE listen, the other deletes ALL listens.
2021-05-12 13222, 2021
ruaok
so it does make sense. but ts.delete() should be called from the admin delete function.
2021-05-12 13224, 2021
_lucifer
we already have a delete_user method that is used when the user deletes their accounts. we can just reuse that.
2021-05-12 13236, 2021
ruaok
let's
2021-05-12 13223, 2021
yvanzo
Freso: Looks like a bug. Do you need a direct search right now?
2021-05-12 13245, 2021
ruaok
_lucifer: I'm looking at the output of spark_consumer on lemmy and I don;t see any output wrt to the calculated users. even though I got the email that they were calculated.
2021-05-12 13223, 2021
ruaok
no output at all since 03:54. that seems odd and may explain why the user similarities are so borked.
2021-05-12 13255, 2021
Freso
yvanzo: Nah. Crossreferencing with earlier list seems like I got all of them. If there are any stragglers, they haven’t made much of a splash, so probably not urgent to deal with them. Besides, running it again when it’s been fixed might be good regardless in case they’ve made new accounts by then. :)
2021-05-12 13232, 2021
ruaok
hmmm. if i change the spammy users report to focus on insert_timestamps rather than listened_at timestamps, new spammers can't get past it by submitting old listen timestamps.
regarding the per-branch cache, is this something that the docker cache action enforces, or something that github actions enforces?
2021-05-12 13223, 2021
_lucifer
github actions enforces that
2021-05-12 13229, 2021
alastairp
boo
2021-05-12 13232, 2021
Mr_Monkey
Interesting, thanks ruaok
2021-05-12 13200, 2021
_lucifer
build kit is faster to build but it does something called export layers at end which takes a lot of time
2021-05-12 13212, 2021
_lucifer
making the overall process take almost equal time
2021-05-12 13207, 2021
_lucifer
not sure, we could get rid of that. maybe it build kit postpones some processing to end, due to which the build seems faster
2021-05-12 13252, 2021
alastairp
yeah - buildkit doesn't emit layers at intermediate stages. I guess it does it all at the end
2021-05-12 13256, 2021
alastairp
here's another option:
2021-05-12 13221, 2021
alastairp
we already have all of the intermediate layers available somewhere: they were pushed to docker hub the last time we built the production image!
2021-05-12 13205, 2021
_lucifer
interesting thought, so we could fetch the latest built image before running the action?
2021-05-12 13213, 2021
alastairp
exactly
2021-05-12 13232, 2021
_lucifer
how difficult it is to figure out the previous tag? or should we just push twice, once as the tag we want and once as latest?
2021-05-12 13243, 2021
alastairp
yeah, I was just going to suggest those two options
2021-05-12 13211, 2021
alastairp
we could get a list of releases (tags) from the github api, and just pull the 2nd most recent one
2021-05-12 13244, 2021
_lucifer
let's go with push twice first as it seems easier, i think docker is smart enough to not push same layers twice.
2021-05-12 13248, 2021
alastairp
one other consideration - there is the github container registry too. is it faster to push/pull from there than docker hub? (I have no idea)
2021-05-12 13258, 2021
alastairp
correct, docker registry will see that they're all the same
2021-05-12 13239, 2021
_lucifer
github registry, we'll need to set that up first but maybe worth trying.
2021-05-12 13211, 2021
alastairp
so, let's do your suggestion first
2021-05-12 13215, 2021
alastairp
see how long the pull is
2021-05-12 13237, 2021
alastairp
_lucifer: here's something else we haven't thought about - not sure how important it is: what's our build process for beta/test? Still do it manually from our local machine?
2021-05-12 13221, 2021
_lucifer
i think yes, because it'll use different branches and sometimes even headless commits etc.
2021-05-12 13236, 2021
_lucifer
if we could have github run a workflow manually on a commit, then that would be useful.
2021-05-12 13234, 2021
alastairp
I believe that there are ways of triggering workflows via API, with arguments
2021-05-12 13255, 2021
alastairp
we could have a bot to do it for us! but I don't think that's super useful right now
2021-05-12 13200, 2021
alastairp
let's just continue to do it manually
2021-05-12 13222, 2021
_lucifer
yeah, let's continue manually for the time being and take a look again later
_lucifer: we're lucky in that some of our tests clearly affect only some subdirectories. so we can do it with spark and js, but for example I don't think we can split unit/integration
2021-05-12 13218, 2021
alastairp
ruaok: I saw those warnings yesterday
2021-05-12 13240, 2021
alastairp
I can't see a pattern in the timing
2021-05-12 13240, 2021
ruaok
the spikes happen when the cont agg is updated.
2021-05-12 13244, 2021
alastairp
ah
2021-05-12 13203, 2021
ruaok
if a user does a lot of imports/deletes of old data, we get these spikes.