#metabrainz

/

13:51 PM
ruaok

if we had the diskspace, I would say, screw it, just write a duplicate table.

2021-07-05 18650, 2021

13:51 PM
ruaok

and then we can swap over in an instant.

2021-07-05 18606, 2021

13:52 PM
ruaok

but I dont think we can get away with that.

2021-07-05 18622, 2021

13:53 PM
kinduff

alrighty, thank you lucifer, reosarevok, ruaok

2021-07-05 18638, 2021

13:53 PM
alastairp

and even then, I suspect that recreating the continuous aggregate on a new datetime column at the same time that we have the one on the timestamp column will cause disk problems too

2021-07-05 18650, 2021

13:53 PM
ruaok

we wouldn't need that.

2021-07-05 18600, 2021

13:54 PM
ruaok

the new table would not be accessed until the actual switchover.

2021-07-05 18626, 2021

13:54 PM
ruaok

we could take the listenstore offline for an hour or so during the switchover, no big deal.

2021-07-05 18644, 2021

13:54 PM
alastairp

for both reads and writes?

2021-07-05 18600, 2021

13:55 PM
alastairp

and then during that time drop the existing aggregate and recreate it on the datetime column?

2021-07-05 18631, 2021

13:55 PM
ruaok

yes

2021-07-05 18612, 2021

13:59 PM
alastairp

I'm happy to try it, though now this task has now ballooned in size a bit

2021-07-05 18649, 2021

13:59 PM
ruaok

could we run a faily simple trial as a proof of concept?

2021-07-05 18604, 2021

14:00 PM
ruaok

fairly == not fail-y

2021-07-05 18631, 2021

14:00 PM
ruaok

create table, start copied old rows, monitor for 10% and extrapolate how much disk space it would really take.

2021-07-05 18640, 2021

14:00 PM
ruaok

and only if it looks doable do we proceed with this approach.

2021-07-05 18642, 2021

14:00 PM
alastairp

the only reason to do both columns at the same time would be to avoid 2 schema changes, are we worried about that? (I'm not)

2021-07-05 18601, 2021

14:01 PM
alastairp

sure, let me put together a quick PR for that

2021-07-05 18603, 2021

14:01 PM
ruaok

i'm not.

2021-07-05 18618, 2021

14:01 PM
ruaok

but, I am worried about an exclusive table lock during the ALTER TABLE command.

2021-07-05 18631, 2021

14:01 PM
ruaok

and we have no idea how long ALTER TABLE will run for

2021-07-05 18639, 2021

14:01 PM
alastairp

I believe that add column default null in postgres now no longer needs a lock

2021-07-05 18651, 2021

14:01 PM
alastairp

however, moving to not null may need one

2021-07-05 18631, 2021

14:02 PM
ruaok

if we can avoid a table lock we should use your solution. clearly simpler.

2021-07-05 18609, 2021

14:03 PM
alastairp

OK, I'll do the following: 1) PR for moving to user_id, 2) PR for testing the change to a date time field - by making a new table and copying 10%, 3) verify that adding a column doesn't need a lock and check the time that adding a not null constraint requires

2021-07-05 18622, 2021

14:03 PM
alastairp

thanks for the discussion

2021-07-05 18653, 2021

14:03 PM
ruaok

1) does not need a table lock either?

2021-07-05 18606, 2021

14:04 PM
alastairp

for adding the column, no

2021-07-05 18617, 2021

14:04 PM
alastairp

but now I'm doubting the change of the constraint

2021-07-05 18631, 2021

14:04 PM
ruaok

let's do 3) first.

2021-07-05 18644, 2021

14:04 PM
ruaok

because that will really inform 1 and 2

2021-07-05 18647, 2021

14:04 PM
alastairp

perfect, let me finish my db import and I'll test that

2021-07-05 18651, 2021

14:04 PM
ruaok

thx

2021-07-05 18630, 2021

14:13 PM
[1997kB] has quit

2021-07-05 18631, 2021

14:14 PM
outsidecontext_

alastairp: is this intentional or an oversight? https://tickets.metabrainz.org/projects/AB/issues…

2021-07-05 18632, 2021

14:14 PM
ruaok

alastairp: what exactly is broken about the public LB dumps?

2021-07-05 18632, 2021

14:14 PM
BrainzBot

AB-460: API: Missing feature tonal.chords_changes_rate

2021-07-05 18639, 2021

14:14 PM
ruaok

I see data.

2021-07-05 18631, 2021

14:16 PM
ruaok

wow. I just ran a query on timescale to extract spotify recording IDs from the new mapping.

2021-07-05 18637, 2021

14:16 PM
ruaok

anyone wanna guess how many rows it has?

2021-07-05 18643, 2021

14:17 PM
reosarevok

9387579!

2021-07-05 18649, 2021

14:17 PM
reosarevok

(I might have generated a random number)

2021-07-05 18601, 2021

14:18 PM
ruaok

you might be withing 20%

2021-07-05 18603, 2021

14:18 PM
ruaok

-g

2021-07-05 18611, 2021

14:18 PM
ruaok

11M rows!

2021-07-05 18617, 2021

14:19 PM
ruaok

but, those are mapped against MSIDs.

2021-07-05 18625, 2021

14:19 PM
ruaok

meaning it contains loads of dupes

2021-07-05 18658, 2021

14:20 PM
ruaok

1.4M unique recordings.

2021-07-05 18621, 2021

14:27 PM
ruaok

lucifer: alastairp : quick ponderance.... for the parquet based LB dumps intended to be imported into spark...

2021-07-05 18606, 2021

14:28 PM
ruaok

those are mostly intended as internal use. does it make sense to spend all that time XZ compressing them just to move them to another server at the same datacenter?

2021-07-05 18611, 2021

14:28 PM
ruaok

(or cluster of datacenters)

2021-07-05 18618, 2021

14:28 PM
ruaok

I'm inclined to not compress at ALL.

2021-07-05 18604, 2021

14:30 PM
lucifer

sure i think we can get away without compressing to xz.

2021-07-05 18627, 2021

14:30 PM
lucifer

also, parquet files by default use "snappy" compression iirc so it might already be comparable to compressed json anyway.

2021-07-05 18656, 2021

14:30 PM
ruaok

oh. well, that makes everything easier then.

2021-07-05 18601, 2021

14:31 PM
ruaok

was it 64MB chunks?

2021-07-05 18605, 2021

14:31 PM
ruaok

I wonder how to estimate that.

2021-07-05 18635, 2021

14:31 PM
lucifer

128 MB chunks or a lit less than less.

2021-07-05 18644, 2021

14:31 PM
ruaok

k.

2021-07-05 18600, 2021

14:32 PM
ruaok

if there is compression in the mix, I'll have to play with it to see if I can get close without going over.

2021-07-05 18635, 2021

14:32 PM
ruaok laughs at the thought that his first hard drive was 30MB large

2021-07-05 18637, 2021

14:32 PM
reosarevok

bitmap: https://tickets.metabrainz.org/browse/MBS-11767 is for you I think :)

2021-07-05 18638, 2021

14:32 PM
BrainzBot

MBS-11767: Track-level artists that differ from the release artist are no longer shown on multi-disc releases that aren't fully loaded

2021-07-05 18648, 2021

14:32 PM
reosarevok

I took a quick look but I'm not sure why medium is not being detected as changed by useMemo

2021-07-05 18641, 2021

14:39 PM
Sophist-UK joined the channel

2021-07-05 18641, 2021

14:39 PM
Sophist-UK has quit

2021-07-05 18641, 2021

14:39 PM
Sophist-UK joined the channel

2021-07-05 18638, 2021

14:40 PM
Sophist_UK has quit

2021-07-05 18643, 2021

14:45 PM
lucifer

ruaok: re lb public dumps, iiuc the `user` table schema of the public dumps is incorrect. we only import that table when there is no private dump so when we try to import public dump solely we get an error.

2021-07-05 18605, 2021

14:46 PM
ruaok

ah

2021-07-05 18657, 2021

14:51 PM
BrainzGit

[musicbrainz-android] 14akshaaatt opened pull request #81 (03master…patch-1): Update README.md https://github.com/metabrainz/musicbrainz-android…

2021-07-05 18625, 2021

14:53 PM
akshaaatt[m]

lucifer: I updated the readme of the github project. Will add more changes soon but this seems like a good start.

2021-07-05 18634, 2021

14:55 PM
akshaaatt[m]

We should add the website, topics and tags as well to the repository

2021-07-05 18659, 2021

14:55 PM
akshaaatt[m]

<akshaaatt[m] "We should add the website, topic"> in the github about section

2021-07-05 18616, 2021

15:08 PM
ritiek joined the channel

2021-07-05 18654, 2021

15:09 PM
revi has quit

2021-07-05 18606, 2021

15:10 PM
revi joined the channel

2021-07-05 18659, 2021

15:16 PM
akashgp09 joined the channel

2021-07-05 18621, 2021

15:17 PM
lucifer

akshaaatt[m]: we don't have a website for the app. adding topics and tags sounds good.

2021-07-05 18638, 2021

15:24 PM
akshaaatt[m]

Can't the website be either of musicbrainz.org or musicbrainz.org/doc/MusicBrainz_for_Android ?

2021-07-05 18607, 2021

15:26 PM
[1997kB] joined the channel

2021-07-05 18620, 2021

15:26 PM
ruaok

https://juliareda.eu/2021/07/github-copilot-is-no…

2021-07-05 18614, 2021

15:35 PM
alastairp

ruaok: sorry, I had to pop out. did lucifer answer your question about public dumps?

2021-07-05 18617, 2021

15:35 PM
akshaaatt[m]

<ruaok "https://juliareda.eu/2021/07/git"> This is really interesting!

2021-07-05 18630, 2021

15:35 PM
ruaok

alastairp: y

2021-07-05 18639, 2021

15:35 PM
alastairp

outsidecontext_: that's an oversight

2021-07-05 18600, 2021

15:36 PM
akshaaatt[m]

I really wish it were a free plugin though. Anyway, open sourced plugins similar to this will float eventually.

2021-07-05 18612, 2021

15:36 PM
alastairp

or more specifically, we made a list of things that we thought people might want to select, and that wasn't in our initial

2021-07-05 18615, 2021

15:36 PM
alastairp

list

2021-07-05 18622, 2021

15:40 PM
lucifer

akshaaatt[m]: i think we can use the MB android app page but that means we also have to maintain it at two places. let's finalize the details of the readme and see how we want to do it.

2021-07-05 18639, 2021

15:40 PM
akshaaatt[m]

Okaaayyy boss!

2021-07-05 18602, 2021

15:41 PM
lucifer

alastairp: sklearn training now takes ~5m after fixing the groundtruth path.

2021-07-05 18613, 2021

15:41 PM
alastairp

🎉

2021-07-05 18631, 2021

15:41 PM
lucifer

time to move to next step now :D

2021-07-05 18645, 2021

15:41 PM
alastairp

did you find the model file?

2021-07-05 18620, 2021

15:42 PM
lucifer

we have a lot of files in dataset directory of sklearn. checking for pkl file.

2021-07-05 18645, 2021

15:42 PM
lucifer

yup we have it

2021-07-05 18622, 2021

15:43 PM
lucifer

`/home/acousticbrainz/acousticbrainz-server/data/datasets/8f9c452b-6cef-4f36-a4c9-f2b29d4f167b/8f9c452b-6cef-4f36-a4c9-f2b29d4f167b/best_clf_model.pkl`

2021-07-05 18640, 2021

15:43 PM
alastairp

great

2021-07-05 18649, 2021

15:43 PM
alastairp

why is the uuid there twice?

2021-07-05 18625, 2021

15:44 PM
lucifer

not sure, but that's part of the groundtruth path stuff. due to some reason, groundtruth path is used to calculate dataset_dir path.

2021-07-05 18656, 2021

15:44 PM
alastairp

might be another error or perhaps an issue when selecting the groundtruth path, let's see if we can get rid of it

2021-07-05 18605, 2021

15:45 PM
lucifer

i added a separate arg to avoid messing with it. i'll read through the code and work on simplifying it.

2021-07-05 18619, 2021

15:45 PM
alastairp

ok, sounds great

2021-07-05 18644, 2021

15:45 PM
lucifer

do we need to keep the standalone scripts?

2021-07-05 18656, 2021

15:45 PM
alastairp

yes, I think they're useful to have

2021-07-05 18604, 2021

15:46 PM
lucifer

👍

2021-07-05 18659, 2021

15:46 PM
lucifer

another question unrealted to the PR, why is dataset eval page in react instead of jinja2?

2021-07-05 18653, 2021

15:48 PM
alastairp

dataset editor is in react too

2021-07-05 18609, 2021

15:49 PM
alastairp

having the editor in react was nice, as it made it interactive

2021-07-05 18631, 2021

15:49 PM
alastairp

and so all of that part of the site is in react, as it was able to reuse code

2021-07-05 18643, 2021

15:49 PM
lucifer

ah right. makes sense.

2021-07-05 18653, 2021

15:49 PM
lucifer

https://similarity.acousticbrainz.org/datasets/95…

2021-07-05 18608, 2021

15:50 PM
alastairp

nice!

2021-07-05 18617, 2021

15:50 PM
Lotheric_ joined the channel

2021-07-05 18619, 2021

15:50 PM
lucifer

i just saw two different creation time format and found one is from jinja2 and other from react.

2021-07-05 18627, 2021

15:50 PM
alastairp

ah, right

2021-07-05 18637, 2021

15:50 PM
alastairp

yeah, there are a lot of react/data display tickets open

2021-07-05 18601, 2021

15:51 PM
lucifer

ah! i see.

2021-07-05 18614, 2021

15:51 PM
lucifer

i have also added the tool column here https://similarity.acousticbrainz.org/datasets/95…

2021-07-05 18633, 2021

15:51 PM
alastairp

perfect

2021-07-05 18612, 2021

15:52 PM
lucifer

also looked into failed status stuff, we have it already but are not catching all exceptions so sometimes it does not get updated.

2021-07-05 18602, 2021

15:53 PM
BrainzGit

[acousticbrainz-server] 14alastair opened pull request #405 (03master…AB-460-chords_changes_rate): AB-460: Add tonal.chords_changes_rate to allowed lowlevel features https://github.com/metabrainz/acousticbrainz-serv…

2021-07-05 18617, 2021

15:53 PM
alastairp

yes, I saw that. I think there's even a TODO saying to catch more exceptions, right? :)

2021-07-05 18644, 2021

15:53 PM
Lotheric has quit

2021-07-05 18644, 2021

15:53 PM
lucifer

yup, how poetic.

2021-07-05 18606, 2021

15:57 PM
outsidecontext_

alastairp: thanks, makes sense. So I could submit a PR to add this

2021-07-05 18614, 2021

15:57 PM
alastairp

outsidecontext_: ^ I just did :)

2021-07-05 18613, 2021

15:59 PM
outsidecontext_

Thanks!

2021-07-05 18639, 2021

15:59 PM
alastairp

I'm just finishing up a few other features that I hope to merge soon, so expect to see this available some time this week

2021-07-05 18639, 2021

16:03 PM
akshaaatt[m]

<BrainzGit "[musicbrainz-android] akshaaatt "> lucifer: Changes made ✌️

2021-07-05 18636, 2021

16:12 PM
BrainzGit

[musicbrainz-android] 14amCap1712 merged pull request #81 (03master…patch-1): Update README.md https://github.com/metabrainz/musicbrainz-android…

2021-07-05 18606, 2021

16:13 PM
lucifer

akshaaatt[m]: thanks! looks much better now.

2021-07-05 18609, 2021

16:13 PM
lucifer

!m akshaaatt[m]

2021-07-05 18609, 2021

16:13 PM
BrainzBot

You're doing good work, akshaaatt[m]!

2021-07-05 18611, 2021

16:13 PM
akshaaatt[m]

Sweet! 💯

2021-07-05 18601, 2021

16:42 PM
reosarevok

bitmap: https://tickets.metabrainz.org/browse/MBS-11762 seems like another side effect of the assumption you had about the disc URLs

2021-07-05 18601, 2021

16:42 PM
BrainzBot

MBS-11762: Medium toolbox missing on disc URLs

2021-07-05 18615, 2021

16:42 PM
reosarevok

I'm not sure what's the best option with this

2021-07-05 18647, 2021

16:49 PM
ritiek has quit

2021-07-05 18604, 2021

16:58 PM
ritiek joined the channel

2021-07-05 18614, 2021

17:00 PM
lucifer

meeting time? :)

2021-07-05 18634, 2021

17:00 PM
lucifer

Freso: ping

2021-07-05 18634, 2021

17:01 PM
ruaok

wanna take a stab at running the meeting until Freso appears, lucifer ?

2021-07-05 18642, 2021

17:01 PM
ruaok

if not, I'm happy to kick things off.

2021-07-05 18601, 2021

17:02 PM
lucifer

i haven't done that anytime before. better if you do it :D

2021-07-05 18610, 2021

17:02 PM
ruaok

ok.

2021-07-05 18614, 2021

17:02 PM
lucifer

thanks!

2021-07-05 18614, 2021

17:02 PM
ruaok

<BANG>

2021-07-05 18619, 2021

17:02 PM
ruaok

meeting time!