#metabrainz

/

0:06 AM
Kladky has quit

2025-02-24 05533, 2025

0:06 AM
Maxr1998_ joined the channel

2025-02-24 05556, 2025

0:06 AM
Maxr1998 has quit

2025-02-24 05501, 2025

0:07 AM
Kladky joined the channel

2025-02-24 05501, 2025

1:23 AM
MyNetAz has quit

2025-02-24 05533, 2025

1:26 AM
sp1ff` has quit

2025-02-24 05524, 2025

1:31 AM
petitminion has quit

2025-02-24 05557, 2025

1:33 AM
MyNetAz joined the channel

2025-02-24 05553, 2025

1:57 AM
minimal has quit

2025-02-24 05525, 2025

4:10 AM
lucifer[m] has quit

2025-02-24 05537, 2025

4:19 AM
aerozol[m]

Lighting fast, thanks Jade and bitmap !!

2025-02-24 05553, 2025

5:34 AM
pite has quit

2025-02-24 05547, 2025

7:27 AM
suvid[m] joined the channel

2025-02-24 05547, 2025

7:27 AM
suvid[m]

monkey: in this PR, https://github.com/metabrainz/listenbrainz-server…... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/gwFfGCUpCSQkwzPURXFFTCDy>)

2025-02-24 05552, 2025

9:26 AM
lucifer[m] joined the channel

2025-02-24 05552, 2025

9:26 AM
lucifer[m]

mayhem: hi! let me know when you are around for some discussion about importing listens in spark.

2025-02-24 05500, 2025

9:35 AM
d4rk-ph0enix has quit

2025-02-24 05531, 2025

9:35 AM
d4rk-ph0enix joined the channel

2025-02-24 05528, 2025

9:51 AM
BrainzGit

[musicbrainz-server] 14reosarevok opened pull request #3485 (03master…MBS-13950): MBS-13950: Support Utaite/Touhou/VocaDB tag pages for genres https://github.com/metabrainz/musicbrainz-server/…

2025-02-24 05541, 2025

10:02 AM
monkey[m] has quit

2025-02-24 05536, 2025

10:44 AM
davic has quit

2025-02-24 05521, 2025

11:19 AM
davic joined the channel

2025-02-24 05503, 2025

11:20 AM
davic has quit

2025-02-24 05525, 2025

11:20 AM
davic joined the channel

2025-02-24 05543, 2025

11:23 AM
davic has quit

2025-02-24 05512, 2025

11:25 AM
davic joined the channel

2025-02-24 05541, 2025

11:33 AM
dabeglavins3 has quit

2025-02-24 05518, 2025

11:42 AM
mayhem[m] joined the channel

2025-02-24 05518, 2025

11:42 AM
mayhem[m]

lucifer: around-ish now, more so in 90 mins

2025-02-24 05542, 2025

11:50 AM
petitminion joined the channel

2025-02-24 05503, 2025

11:57 AM
monkey[m] joined the channel

2025-02-24 05503, 2025

11:57 AM
monkey[m]

suvid: Yes, you will need to remove any buttons to play music and other actions relating to BP, for which you can search for the `useBrainzPlayerDispatch` hook across the codebase.

2025-02-24 05504, 2025

11:57 AM
monkey[m]

You should also ideally make sure that the BrainzPlayer component is not loaded at all if disabled (i.e. not just hiding it, but not loading it at all)

2025-02-24 05542, 2025

11:58 AM
Pokey has quit

2025-02-24 05501, 2025

12:00 PM
Pokey joined the channel

2025-02-24 05509, 2025

12:01 PM
lucifer[m]

Sure let's do in 90 mine

2025-02-24 05509, 2025

12:25 PM
petitminion has quit

2025-02-24 05507, 2025

13:32 PM
petitminion joined the channel

2025-02-24 05522, 2025

13:35 PM
BrainzGit

[listenbrainz-server] 14anshg1214 merged pull request #3195 (03master…LB-1755): LB-1755: Fix Feed event deletion https://github.com/metabrainz/listenbrainz-server…

2025-02-24 05519, 2025

13:38 PM
mayhem[m]

lucifer: around now.

2025-02-24 05510, 2025

14:24 PM
lucifer[m]

mayhem: hi!

2025-02-24 05549, 2025

14:27 PM
lucifer[m]

further working on eliminating spark's reliance on full dumps, there are two approaches I have been looking into. First approach is keep using dumps but only incremental dumps, this is easier to implement. The only issue is currently the incremental dump after a full dump includes listens created only since that full dump. I could change it to include the listens since the last incremental dump (add a column in the data_dump table to

2025-02-24 05549, 2025

14:27 PM
lucifer[m]

recognise which dump is incremental and which is full). And incremental dumps can be made to run concurrently with full dumps so that we don't lag when full dumps are being generated. (currently incremental dumps wait for full dumps to finish).

2025-02-24 05504, 2025

14:28 PM
suvid[m]

Hey... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/wDXsWdVgIzwSCItyKbinxqLA>)

2025-02-24 05511, 2025

14:28 PM
suvid[m] uploaded an image: (294KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/yfZgZUVeycwCNkQQZrvZQZhw/image.png >

2025-02-24 05503, 2025

14:30 PM
lucifer[m]

However, this breaks the current sequencing of listens import. For example, at the moment we can import a full dump + all incremental dumps and have all the listens of LB. With the proposed change, the incremental dumps would duplicate some listens with the full dump and the onus would be on the user of the full dumps to deduplicate them or they could use just the full dump listens and wait for the next to avoid duplicates.

2025-02-24 05529, 2025

14:30 PM
lucifer[m]

what do you think about this approach?

2025-02-24 05546, 2025

14:30 PM
mayhem[m]

could we generate full dumps from the incremental dumps??

2025-02-24 05504, 2025

14:31 PM
mayhem[m]

that changes the whole equation, no?

2025-02-24 05536, 2025

14:31 PM
lucifer[m]

right so you mean end the full dump at the last incremental dump instead of the current time?

2025-02-24 05508, 2025

14:32 PM
mayhem[m]

sort-of.

2025-02-24 05539, 2025

14:32 PM
mayhem[m]

first, lets elevate the idea of an incremental dump to a first class concept, rather than it being lower than the full dump.

2025-02-24 05547, 2025

14:32 PM
mayhem[m]

lets forget about full dumps for now.

2025-02-24 05518, 2025

14:33 PM
mayhem[m]

you fix the incremental dumps as you described above - from incremental to incremental.

2025-02-24 05540, 2025

14:33 PM
mayhem[m]

and those are the only dumps the LB infra does directly.

2025-02-24 05520, 2025

14:34 PM
mayhem[m]

to make a full dump, we take the last full dump, add all the incrementals and then zip up the full dump into the newest and latest version.

2025-02-24 05541, 2025

14:34 PM
mayhem[m]

we could do this last step on some VM that we spin up for this task.

2025-02-24 05546, 2025

14:34 PM
lucifer[m]

how do you get the first full dump?

2025-02-24 05557, 2025

14:34 PM
petitminion has quit

2025-02-24 05502, 2025

14:35 PM
mayhem[m]

clearly we need to create a "make a dump NOW" script.

2025-02-24 05512, 2025

14:35 PM
mayhem[m]

but we don't plan to use it very often.

2025-02-24 05548, 2025

14:35 PM
mayhem[m]

they key cool thing here is that the full dumps are no longer running on core infra

2025-02-24 05502, 2025

14:36 PM
lucifer[m]

yes makes sense. there are two issues however.

2025-02-24 05553, 2025

14:37 PM
lucifer[m]

the first full dump and incremental dump might possible still have duplicates because full dumps filter on listened_at and incremental dumps on created timestamp. so if you import listens in past you could end up with duplicates in case of unfortunate timing. but i guess we could solve it by adding an additional created filter to the full dump generation (the first one). for subsequent ones just incremental dumps.

2025-02-24 05534, 2025

14:38 PM
lucifer[m]

i just realised that this created filter would solve the issue i was stuck on in the current infra fwiw.

2025-02-24 05554, 2025

14:38 PM
mayhem[m]

can we redefine the full dumps so they are derived from the incremental semantics?

2025-02-24 05511, 2025

14:39 PM
lucifer[m]

right so on that note, the issue would be deleted listens.

2025-02-24 05537, 2025

14:39 PM
mayhem[m]

our nemesis.

2025-02-24 05545, 2025

14:40 PM
lucifer[m]

fwiw, i am doing something similar in spark anyway. a full dump exists, bunch of incremental dumps come in. combine both and load listens for stats generation, filter out deleted listens everytime we load this combination.

2025-02-24 05503, 2025

14:41 PM
mayhem[m]

I think this is where my thinking if coming from.

2025-02-24 05520, 2025

14:41 PM
lucifer[m]

after N days, take the combined listens remove the deleted listens from this and rewrite them to disk.

2025-02-24 05544, 2025

14:41 PM
lucifer[m]

(this step is pending implementation)

2025-02-24 05512, 2025

14:42 PM
mayhem[m]

this could be done during the full dump generation,

2025-02-24 05535, 2025

14:43 PM
lucifer[m]

right. however, the issue with the rewrite is that for spark we store the listens somewhat optimally in hdfs, doing this with a bunch of tar files on vm is not going to be efficient.

2025-02-24 05547, 2025

14:43 PM
lucifer[m]

but then what if we use spark for full dumps?

2025-02-24 05504, 2025

14:44 PM
mayhem[m]

oh weird.

2025-02-24 05534, 2025

14:44 PM
mayhem[m]

can we imagine what the impact on the cluster would be?

2025-02-24 05540, 2025

14:44 PM
lucifer[m]

yeah i am just wondering how do you do the combination step to remove the deleted listens efficiently.

2025-02-24 05545, 2025

14:44 PM
mayhem[m]

would that be offline or can we pull just from hdfs?

2025-02-24 05509, 2025

14:45 PM
lucifer[m]

not sure what you mean by offline?

2025-02-24 05524, 2025

14:45 PM
lucifer[m]

oh do you mean if the cluster will become unavailable during this step.

2025-02-24 05537, 2025

14:45 PM
mayhem[m]

would the cluster be unavailable for regular tasks?

2025-02-24 05538, 2025

14:46 PM
lucifer[m]

i think yeah but how long is the question. if its less than 8 hours i think we could schedule it well enough.

2025-02-24 05512, 2025

14:47 PM
mayhem[m]

overall, this doesn't feel great to me. the spark cluster is meant to be "disposable" and this feels like a departure from that

2025-02-24 05523, 2025

14:47 PM
lucifer[m]

yeah fair point.

2025-02-24 05546, 2025

14:47 PM
lucifer[m]

we could a LB db replica and let dumps run off of it.

2025-02-24 05513, 2025

14:48 PM
mayhem[m]

lucifer[m]: poor use of resources.

2025-02-24 05554, 2025

14:48 PM
mayhem[m]

is the deletion of listens from the incremental dumps the sticking factor in the idea of making full dumps from incrementals?

2025-02-24 05558, 2025

14:48 PM
mayhem[m]

that doesn't seem that hard.

2025-02-24 05500, 2025

14:49 PM
lucifer[m]

i think a LB replica would be good in general but yes just for dumps ifs not optimal.

2025-02-24 05519, 2025

14:49 PM
mayhem[m]

you start with a list of listens to be deleted and a pile of incrementals.

2025-02-24 05519, 2025

14:49 PM
lucifer[m]

deletion of listens need to happen from the full dump as well.

2025-02-24 05543, 2025

14:49 PM
mayhem[m]

for each incremental, look at each listen. is deleted? yes, skip, no write to full dump.

2025-02-24 05544, 2025

14:49 PM
lucifer[m]

if listens were to be deleted only from incrementals that wouldn't be too big of a deal.

2025-02-24 05525, 2025

14:50 PM
mayhem[m]

this VM could continually update incrementals removing listens from it too.

2025-02-24 05523, 2025

14:51 PM
lucifer[m]

mayhem[m]: yes but how do you delete listens from the full dump.

2025-02-24 05513, 2025

14:52 PM
monkey[m]

<suvid[m]> "Hey..." <- suvid: I think that's a great step 1. The improvement I have in mind is to avoid loading any of the BrainzPlayer code in the first place if the user has deactivated it, saving on time and data usage.

2025-02-24 05513, 2025

14:52 PM
monkey[m]

The files to look at are frontend/js/src/index.tsx, where the call to `getRoutes` (frontend/js/src/routes/routes.tsx) should have a new argument that can be trickled down to the Layout component (in the same way that Layout has a `withProtectedRoutes` prop, it should have a new `withBrainzPlayer` that defaults to true) at frontend/js/src/layout/index.tsx

2025-02-24 05513, 2025

14:52 PM
monkey[m]

In frontend/js/src/index.tsx as well the BrainzPlayerContext should be rendered conditionally.

2025-02-24 05518, 2025

14:52 PM
lucifer[m]

say you have a full dump till 15th feb and you are adding incrementals from 16th to 24th to it. and there are listen deletions that happened today for listens from say 2023.

2025-02-24 05528, 2025

14:52 PM
lucifer[m]

those listens are already in the full dump.

2025-02-24 05541, 2025

14:52 PM
mayhem[m]

lucifer[m]: since the full dump is made from incremental dumps that have listens deleted, it should be no problem, right?

2025-02-24 05512, 2025

14:53 PM
mayhem[m]

ah -- I am going back to my idea of making full dumps from incrementals.

2025-02-24 05527, 2025

14:53 PM
monkey[m]

Basically, seeing where BrainzPlayer and its BrainzPlayerContext are rendered, and working your way down to avoid rendering the components

2025-02-24 05546, 2025

14:53 PM
petitminion joined the channel

2025-02-24 05557, 2025

14:53 PM
lucifer[m]

yes i am on that idea too but doesn't your idea, start off with the full dump that was created off the last time and just add incrementals generated since then to it.

2025-02-24 05530, 2025

14:54 PM
mayhem[m]

we would have to start with a clean dump that has no deleted listens in it, for starters.

2025-02-24 05559, 2025

14:54 PM
lucifer[m]

Full Dump until 15 Feb, 2024 + Incremental Dumps from 16th to 24th = Full dump until 24th Feb

2025-02-24 05507, 2025

14:55 PM
lucifer[m]

do i understand your idea right?

2025-02-24 05509, 2025

14:55 PM
mayhem[m]

yes

2025-02-24 05539, 2025

14:55 PM
lucifer[m]

so now if i deleted some listens from 2021 on Feb 23rd, those listens would be in the starter full dump?

2025-02-24 05541, 2025

14:55 PM
mayhem[m]

except it won't line up on date lines, but when the dump started/terminated. still not real change in logic.

2025-02-24 05503, 2025

14:56 PM
lucifer[m]

yup date lines are just example sake.

2025-02-24 05519, 2025

14:56 PM
mayhem[m]

yes, they would be in the full starter dump.

2025-02-24 05534, 2025

14:56 PM
mayhem[m]

but the key is to not have them in the NEXT full dump, right?

2025-02-24 05549, 2025

14:56 PM
lucifer[m]

yes. but how do you do that?

2025-02-24 05557, 2025

14:56 PM
lucifer[m]

do you process the full starter dump too?

2025-02-24 05513, 2025

14:57 PM
lucifer[m]

i was under the impression your plan meant to copy it as is.

2025-02-24 05538, 2025

14:57 PM
mayhem[m]

that is the full dump making process: collect listens to be deleted and all the incremental dumps and then filter inc dumps into one giant clean full dump.

2025-02-24 05556, 2025

14:57 PM
mayhem[m]

all inc dumps since last full dump, I should say

2025-02-24 05514, 2025

14:58 PM
suvid[m]

<monkey[m]> "suvid: I think that's a great..." <- oh ok

2025-02-24 05514, 2025

14:58 PM
suvid[m]

i'll look into it

2025-02-24 05514, 2025

14:58 PM
suvid[m]

in the meantime, i have also pushed a commit which changes listencard to remove play button and add to queue and play next options from the 3 dot menu

2025-02-24 05522, 2025

14:58 PM
mayhem[m]

* then filter out deleted listens from inc dumps, * inc dumps and combine into one

2025-02-24 05539, 2025

14:58 PM
lucifer[m]

right just to be clear you would read the existing full dump and process it too to delete the listens as needed?

2025-02-24 05501, 2025

14:59 PM
mayhem[m]

lucifer[m]: I wasn't suggesting that no. is that needed?

2025-02-24 05521, 2025

14:59 PM
suvid[m] uploaded an image: (331KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/QNAoBxHEEsqGUOrrQSOcSPMX/image.png >

2025-02-24 05522, 2025

14:59 PM
suvid[m]

so it looks like this now :)

2025-02-24 05559, 2025

15:00 PM
lucifer[m]

mayhem: hmm but then you can't filter out all the deleted listens. you need to process your starter full dump to delete listens, if you just look at incremental dumps to delete listens it won't always work.

2025-02-24 05511, 2025

15:03 PM
mayhem[m]

yes, you're right. filter deleted listens from last full and all incs to make a new one.

2025-02-24 05527, 2025

15:03 PM
mayhem[m]

that will still be faster than a full dump off our infra.

2025-02-24 05556, 2025

15:03 PM
mayhem[m]

and if it runs on a on a one-off VM, we dont care how long it takes. 3 days? fine -- that is our schedule then

2025-02-24 05534, 2025

15:05 PM
lucifer[m]

sure i just wanted to be clear on the idea.

2025-02-24 05529, 2025

15:07 PM
petitminion has quit

2025-02-24 05542, 2025

15:07 PM
mayhem[m]

but does the rest of the check-out?

2025-02-24 05557, 2025

15:07 PM
lucifer[m]

yes it does.

2025-02-24 05501, 2025

15:08 PM
mayhem[m]

overall I quite like this approach.

2025-02-24 05509, 2025

15:08 PM
mayhem[m]

full dumps far away from prod systems == great.

2025-02-24 05519, 2025

15:09 PM
lucifer[m]

i don't like the idea of a one off VM though. because we don't usually do that sort of provisioning. and it would likely be some work to add it. not that its not doable.

2025-02-24 05538, 2025

15:09 PM
mayhem[m]

fair.

2025-02-24 05511, 2025

15:10 PM
mayhem[m]

if we construct it carefully and have it consume only one thread, it still won't take forever. it will just be disk heavy.

2025-02-24 05520, 2025

15:10 PM
mayhem[m]

we'd have to find a machine where we can do that.

2025-02-24 05513, 2025

15:11 PM
lucifer[m]

one more thing, that we can only move listens dump away this way. the rest of things still need to be dumped from prod.

2025-02-24 05545, 2025

15:11 PM
mayhem[m]

yes, but those dumps take how long? measured in minutes, not hours/days?

2025-02-24 05536, 2025

15:12 PM
lucifer[m]

i think stats dump takes 6 hours or more.

2025-02-24 05518, 2025

15:13 PM
lucifer[m]

but i don't think that's an issue still.

2025-02-24 05521, 2025

15:13 PM
mayhem[m]

most of that isn't on PG, right?

2025-02-24 05526, 2025

15:13 PM
lucifer[m]

yes

2025-02-24 05530, 2025

15:13 PM
mayhem[m]

perfect.

2025-02-24 05548, 2025

15:13 PM
mayhem[m]

I think this will be a great improvement for our infra, honestly.