suvid: Yes, you will need to remove any buttons to play music and other actions relating to BP, for which you can search for the `useBrainzPlayerDispatch` hook across the codebase.
2025-02-24 05504, 2025
monkey[m]
You should also ideally make sure that the BrainzPlayer component is not loaded at all if disabled (i.e. not just hiding it, but not loading it at all)
further working on eliminating spark's reliance on full dumps, there are two approaches I have been looking into. First approach is keep using dumps but only incremental dumps, this is easier to implement. The only issue is currently the incremental dump after a full dump includes listens created only since that full dump. I could change it to include the listens since the last incremental dump (add a column in the data_dump table to
2025-02-24 05549, 2025
lucifer[m]
recognise which dump is incremental and which is full). And incremental dumps can be made to run concurrently with full dumps so that we don't lag when full dumps are being generated. (currently incremental dumps wait for full dumps to finish).
2025-02-24 05504, 2025
suvid[m]
Hey... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/wDXsWdVgIzwSCItyKbinxqLA>)
2025-02-24 05511, 2025
suvid[m] uploaded an image: (294KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/yfZgZUVeycwCNkQQZrvZQZhw/image.png >
2025-02-24 05503, 2025
lucifer[m]
However, this breaks the current sequencing of listens import. For example, at the moment we can import a full dump + all incremental dumps and have all the listens of LB. With the proposed change, the incremental dumps would duplicate some listens with the full dump and the onus would be on the user of the full dumps to deduplicate them or they could use just the full dump listens and wait for the next to avoid duplicates.
2025-02-24 05529, 2025
lucifer[m]
what do you think about this approach?
2025-02-24 05546, 2025
mayhem[m]
could we generate full dumps from the incremental dumps??
2025-02-24 05504, 2025
mayhem[m]
that changes the whole equation, no?
2025-02-24 05536, 2025
lucifer[m]
right so you mean end the full dump at the last incremental dump instead of the current time?
2025-02-24 05508, 2025
mayhem[m]
sort-of.
2025-02-24 05539, 2025
mayhem[m]
first, lets elevate the idea of an incremental dump to a first class concept, rather than it being lower than the full dump.
2025-02-24 05547, 2025
mayhem[m]
lets forget about full dumps for now.
2025-02-24 05518, 2025
mayhem[m]
you fix the incremental dumps as you described above - from incremental to incremental.
2025-02-24 05540, 2025
mayhem[m]
and those are the only dumps the LB infra does directly.
2025-02-24 05520, 2025
mayhem[m]
to make a full dump, we take the last full dump, add all the incrementals and then zip up the full dump into the newest and latest version.
2025-02-24 05541, 2025
mayhem[m]
we could do this last step on some VM that we spin up for this task.
2025-02-24 05546, 2025
lucifer[m]
how do you get the first full dump?
2025-02-24 05557, 2025
petitminion has quit
2025-02-24 05502, 2025
mayhem[m]
clearly we need to create a "make a dump NOW" script.
2025-02-24 05512, 2025
mayhem[m]
but we don't plan to use it very often.
2025-02-24 05548, 2025
mayhem[m]
they key cool thing here is that the full dumps are no longer running on core infra
2025-02-24 05502, 2025
lucifer[m]
yes makes sense. there are two issues however.
2025-02-24 05553, 2025
lucifer[m]
the first full dump and incremental dump might possible still have duplicates because full dumps filter on listened_at and incremental dumps on created timestamp. so if you import listens in past you could end up with duplicates in case of unfortunate timing. but i guess we could solve it by adding an additional created filter to the full dump generation (the first one). for subsequent ones just incremental dumps.
2025-02-24 05534, 2025
lucifer[m]
i just realised that this created filter would solve the issue i was stuck on in the current infra fwiw.
2025-02-24 05554, 2025
mayhem[m]
can we redefine the full dumps so they are derived from the incremental semantics?
2025-02-24 05511, 2025
lucifer[m]
right so on that note, the issue would be deleted listens.
2025-02-24 05537, 2025
mayhem[m]
our nemesis.
2025-02-24 05545, 2025
lucifer[m]
fwiw, i am doing something similar in spark anyway. a full dump exists, bunch of incremental dumps come in. combine both and load listens for stats generation, filter out deleted listens everytime we load this combination.
2025-02-24 05503, 2025
mayhem[m]
I think this is where my thinking if coming from.
2025-02-24 05520, 2025
lucifer[m]
after N days, take the combined listens remove the deleted listens from this and rewrite them to disk.
2025-02-24 05544, 2025
lucifer[m]
(this step is pending implementation)
2025-02-24 05512, 2025
mayhem[m]
this could be done during the full dump generation,
2025-02-24 05535, 2025
lucifer[m]
right. however, the issue with the rewrite is that for spark we store the listens somewhat optimally in hdfs, doing this with a bunch of tar files on vm is not going to be efficient.
2025-02-24 05547, 2025
lucifer[m]
but then what if we use spark for full dumps?
2025-02-24 05504, 2025
mayhem[m]
oh weird.
2025-02-24 05534, 2025
mayhem[m]
can we imagine what the impact on the cluster would be?
2025-02-24 05540, 2025
lucifer[m]
yeah i am just wondering how do you do the combination step to remove the deleted listens efficiently.
2025-02-24 05545, 2025
mayhem[m]
would that be offline or can we pull just from hdfs?
2025-02-24 05509, 2025
lucifer[m]
not sure what you mean by offline?
2025-02-24 05524, 2025
lucifer[m]
oh do you mean if the cluster will become unavailable during this step.
2025-02-24 05537, 2025
mayhem[m]
would the cluster be unavailable for regular tasks?
2025-02-24 05538, 2025
lucifer[m]
i think yeah but how long is the question. if its less than 8 hours i think we could schedule it well enough.
2025-02-24 05512, 2025
mayhem[m]
overall, this doesn't feel great to me. the spark cluster is meant to be "disposable" and this feels like a departure from that
2025-02-24 05523, 2025
lucifer[m]
yeah fair point.
2025-02-24 05546, 2025
lucifer[m]
we could a LB db replica and let dumps run off of it.
2025-02-24 05513, 2025
mayhem[m]
lucifer[m]: poor use of resources.
2025-02-24 05554, 2025
mayhem[m]
is the deletion of listens from the incremental dumps the sticking factor in the idea of making full dumps from incrementals?
2025-02-24 05558, 2025
mayhem[m]
that doesn't seem that hard.
2025-02-24 05500, 2025
lucifer[m]
i think a LB replica would be good in general but yes just for dumps ifs not optimal.
2025-02-24 05519, 2025
mayhem[m]
you start with a list of listens to be deleted and a pile of incrementals.
2025-02-24 05519, 2025
lucifer[m]
deletion of listens need to happen from the full dump as well.
2025-02-24 05543, 2025
mayhem[m]
for each incremental, look at each listen. is deleted? yes, skip, no write to full dump.
2025-02-24 05544, 2025
lucifer[m]
if listens were to be deleted only from incrementals that wouldn't be too big of a deal.
2025-02-24 05525, 2025
mayhem[m]
this VM could continually update incrementals removing listens from it too.
2025-02-24 05523, 2025
lucifer[m]
mayhem[m]: yes but how do you delete listens from the full dump.
2025-02-24 05513, 2025
monkey[m]
<suvid[m]> "Hey..." <- suvid: I think that's a great step 1. The improvement I have in mind is to avoid loading any of the BrainzPlayer code in the first place if the user has deactivated it, saving on time and data usage.
2025-02-24 05513, 2025
monkey[m]
The files to look at are frontend/js/src/index.tsx, where the call to `getRoutes` (frontend/js/src/routes/routes.tsx) should have a new argument that can be trickled down to the Layout component (in the same way that Layout has a `withProtectedRoutes` prop, it should have a new `withBrainzPlayer` that defaults to true) at frontend/js/src/layout/index.tsx
2025-02-24 05513, 2025
monkey[m]
In frontend/js/src/index.tsx as well the BrainzPlayerContext should be rendered conditionally.
2025-02-24 05518, 2025
lucifer[m]
say you have a full dump till 15th feb and you are adding incrementals from 16th to 24th to it. and there are listen deletions that happened today for listens from say 2023.
2025-02-24 05528, 2025
lucifer[m]
those listens are already in the full dump.
2025-02-24 05541, 2025
mayhem[m]
lucifer[m]: since the full dump is made from incremental dumps that have listens deleted, it should be no problem, right?
2025-02-24 05512, 2025
mayhem[m]
ah -- I am going back to my idea of making full dumps from incrementals.
2025-02-24 05527, 2025
monkey[m]
Basically, seeing where BrainzPlayer and its BrainzPlayerContext are rendered, and working your way down to avoid rendering the components
2025-02-24 05546, 2025
petitminion joined the channel
2025-02-24 05557, 2025
lucifer[m]
yes i am on that idea too but doesn't your idea, start off with the full dump that was created off the last time and just add incrementals generated since then to it.
2025-02-24 05530, 2025
mayhem[m]
we would have to start with a clean dump that has no deleted listens in it, for starters.
2025-02-24 05559, 2025
lucifer[m]
Full Dump until 15 Feb, 2024 + Incremental Dumps from 16th to 24th = Full dump until 24th Feb
2025-02-24 05507, 2025
lucifer[m]
do i understand your idea right?
2025-02-24 05509, 2025
mayhem[m]
yes
2025-02-24 05539, 2025
lucifer[m]
so now if i deleted some listens from 2021 on Feb 23rd, those listens would be in the starter full dump?
2025-02-24 05541, 2025
mayhem[m]
except it won't line up on date lines, but when the dump started/terminated. still not real change in logic.
2025-02-24 05503, 2025
lucifer[m]
yup date lines are just example sake.
2025-02-24 05519, 2025
mayhem[m]
yes, they would be in the full starter dump.
2025-02-24 05534, 2025
mayhem[m]
but the key is to not have them in the NEXT full dump, right?
2025-02-24 05549, 2025
lucifer[m]
yes. but how do you do that?
2025-02-24 05557, 2025
lucifer[m]
do you process the full starter dump too?
2025-02-24 05513, 2025
lucifer[m]
i was under the impression your plan meant to copy it as is.
2025-02-24 05538, 2025
mayhem[m]
that is the full dump making process: collect listens to be deleted and all the incremental dumps and then filter inc dumps into one giant clean full dump.
2025-02-24 05556, 2025
mayhem[m]
all inc dumps since last full dump, I should say
2025-02-24 05514, 2025
suvid[m]
<monkey[m]> "suvid: I think that's a great..." <- oh ok
2025-02-24 05514, 2025
suvid[m]
i'll look into it
2025-02-24 05514, 2025
suvid[m]
in the meantime, i have also pushed a commit which changes listencard to remove play button and add to queue and play next options from the 3 dot menu
2025-02-24 05522, 2025
mayhem[m]
* then filter out deleted listens from inc dumps, * inc dumps and combine into one
2025-02-24 05539, 2025
lucifer[m]
right just to be clear you would read the existing full dump and process it too to delete the listens as needed?
2025-02-24 05501, 2025
mayhem[m]
lucifer[m]: I wasn't suggesting that no. is that needed?
2025-02-24 05521, 2025
suvid[m] uploaded an image: (331KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/QNAoBxHEEsqGUOrrQSOcSPMX/image.png >
2025-02-24 05522, 2025
suvid[m]
so it looks like this now :)
2025-02-24 05559, 2025
lucifer[m]
mayhem: hmm but then you can't filter out all the deleted listens. you need to process your starter full dump to delete listens, if you just look at incremental dumps to delete listens it won't always work.
2025-02-24 05511, 2025
mayhem[m]
yes, you're right. filter deleted listens from last full and all incs to make a new one.
2025-02-24 05527, 2025
mayhem[m]
that will still be faster than a full dump off our infra.
2025-02-24 05556, 2025
mayhem[m]
and if it runs on a on a one-off VM, we dont care how long it takes. 3 days? fine -- that is our schedule then
2025-02-24 05534, 2025
lucifer[m]
sure i just wanted to be clear on the idea.
2025-02-24 05529, 2025
petitminion has quit
2025-02-24 05542, 2025
mayhem[m]
but does the rest of the check-out?
2025-02-24 05557, 2025
lucifer[m]
yes it does.
2025-02-24 05501, 2025
mayhem[m]
overall I quite like this approach.
2025-02-24 05509, 2025
mayhem[m]
full dumps far away from prod systems == great.
2025-02-24 05519, 2025
lucifer[m]
i don't like the idea of a one off VM though. because we don't usually do that sort of provisioning. and it would likely be some work to add it. not that its not doable.
2025-02-24 05538, 2025
mayhem[m]
fair.
2025-02-24 05511, 2025
mayhem[m]
if we construct it carefully and have it consume only one thread, it still won't take forever. it will just be disk heavy.
2025-02-24 05520, 2025
mayhem[m]
we'd have to find a machine where we can do that.
2025-02-24 05513, 2025
lucifer[m]
one more thing, that we can only move listens dump away this way. the rest of things still need to be dumped from prod.
2025-02-24 05545, 2025
mayhem[m]
yes, but those dumps take how long? measured in minutes, not hours/days?
2025-02-24 05536, 2025
lucifer[m]
i think stats dump takes 6 hours or more.
2025-02-24 05518, 2025
lucifer[m]
but i don't think that's an issue still.
2025-02-24 05521, 2025
mayhem[m]
most of that isn't on PG, right?
2025-02-24 05526, 2025
lucifer[m]
yes
2025-02-24 05530, 2025
mayhem[m]
perfect.
2025-02-24 05548, 2025
mayhem[m]
I think this will be a great improvement for our infra, honestly.