in #metabrainz

13:51 PM
ruaok

if we had the diskspace, I would say, screw it, just write a duplicate table.
13:51 PM
and then we can swap over in an instant.
13:52 PM
but I dont think we can get away with that.
13:53 PM
kinduff

alrighty, thank you lucifer, reosarevok, ruaok
13:53 PM
alastairp

and even then, I suspect that recreating the continuous aggregate on a new datetime column at the same time that we have the one on the timestamp column will cause disk problems too
13:53 PM
ruaok

we wouldn't need that.
13:54 PM
the new table would not be accessed until the actual switchover.
13:54 PM
we could take the listenstore offline for an hour or so during the switchover, no big deal.
13:54 PM
alastairp

for both reads and writes?
13:55 PM
and then during that time drop the existing aggregate and recreate it on the datetime column?
13:55 PM
ruaok

yes
13:59 PM
alastairp

I'm happy to try it, though now this task has now ballooned in size a bit
13:59 PM
ruaok

could we run a faily simple trial as a proof of concept?
14:00 PM
fairly == not fail-y
14:00 PM
create table, start copied old rows, monitor for 10% and extrapolate how much disk space it would really take.
14:00 PM
and only if it looks doable do we proceed with this approach.
14:00 PM
alastairp

the only reason to do both columns at the same time would be to avoid 2 schema changes, are we worried about that? (I'm not)
14:01 PM
sure, let me put together a quick PR for that
14:01 PM
ruaok

i'm not.
14:01 PM
but, I am worried about an exclusive table lock during the ALTER TABLE command.
14:01 PM
and we have no idea how long ALTER TABLE will run for
14:01 PM
alastairp

I believe that add column default null in postgres now no longer needs a lock
14:01 PM
however, moving to not null may need one
14:02 PM
ruaok

if we can avoid a table lock we should use your solution. clearly simpler.
14:03 PM
alastairp

OK, I'll do the following: 1) PR for moving to user_id, 2) PR for testing the change to a date time field - by making a new table and copying 10%, 3) verify that adding a column doesn't need a lock and check the time that adding a not null constraint requires
14:03 PM
thanks for the discussion
14:03 PM
ruaok

1) does not need a table lock either?
14:04 PM
alastairp

for adding the column, no
14:04 PM
but now I'm doubting the change of the constraint
14:04 PM
ruaok

let's do 3) first.
14:04 PM
because that will really inform 1 and 2
14:04 PM
alastairp

perfect, let me finish my db import and I'll test that
14:04 PM
ruaok

thx
14:13 PM
[1997kB] has quit
14:14 PM
outsidecontext_

alastairp: is this intentional or an oversight? https://tickets.metabrainz.org/projects/AB/issu...
14:14 PM
ruaok

alastairp: what exactly is broken about the public LB dumps?
14:14 PM
BrainzBot

AB-460: API: Missing feature tonal.chords_changes_rate
14:14 PM
ruaok

I see data.
14:16 PM
wow. I just ran a query on timescale to extract spotify recording IDs from the new mapping.
14:16 PM
anyone wanna guess how many rows it has?
14:17 PM
reosarevok

9387579!
14:17 PM
(I might have generated a random number)
14:18 PM
ruaok

you might be withing 20%
14:18 PM
-g
14:18 PM
11M rows!
14:19 PM
but, those are mapped against MSIDs.
14:19 PM
meaning it contains loads of dupes
14:20 PM
1.4M unique recordings.
14:27 PM
lucifer: alastairp : quick ponderance.... for the parquet based LB dumps intended to be imported into spark...
14:28 PM
those are mostly intended as internal use. does it make sense to spend all that time XZ compressing them just to move them to another server at the same datacenter?
14:28 PM
(or cluster of datacenters)
14:28 PM
I'm inclined to not compress at ALL.
14:30 PM
lucifer

sure i think we can get away without compressing to xz.
14:30 PM
also, parquet files by default use "snappy" compression iirc so it might already be comparable to compressed json anyway.
14:30 PM
ruaok

oh. well, that makes everything easier then.
14:31 PM
was it 64MB chunks?
14:31 PM
I wonder how to estimate that.
14:31 PM
lucifer

128 MB chunks or a lit less than less.
14:31 PM
ruaok

k.
14:32 PM
if there is compression in the mix, I'll have to play with it to see if I can get close without going over.
14:32 PM
ruaok laughs at the thought that his first hard drive was 30MB large
14:32 PM
reosarevok

bitmap: https://tickets.metabrainz.org/browse/MBS-11767 is for you I think :)
14:32 PM
BrainzBot

MBS-11767: Track-level artists that differ from the release artist are no longer shown on multi-disc releases that aren't fully loaded
14:32 PM
reosarevok

I took a quick look but I'm not sure why medium is not being detected as changed by useMemo
14:39 PM
Sophist-UK joined the channel
14:39 PM
Sophist-UK has quit
14:39 PM
Sophist-UK joined the channel
14:40 PM
Sophist_UK has quit
14:45 PM
lucifer

ruaok: re lb public dumps, iiuc the `user` table schema of the public dumps is incorrect. we only import that table when there is no private dump so when we try to import public dump solely we get an error.
14:46 PM
ruaok

ah
14:51 PM
BrainzGit

[musicbrainz-android] 14akshaaatt opened pull request #81 (03master…patch-1): Update README.md https://github.com/metabrainz/musicbrainz-andro...
14:53 PM
akshaaatt[m]

lucifer: I updated the readme of the github project. Will add more changes soon but this seems like a good start.
14:55 PM
We should add the website, topics and tags as well to the repository
14:55 PM
<akshaaatt[m] "We should add the website, topic"> in the github about section
15:08 PM
ritiek joined the channel
15:09 PM
revi has quit
15:10 PM
revi joined the channel
15:16 PM
akashgp09 joined the channel
15:17 PM
lucifer

akshaaatt[m]: we don't have a website for the app. adding topics and tags sounds good.
15:24 PM
akshaaatt[m]

Can't the website be either of musicbrainz.org or musicbrainz.org/doc/MusicBrainz_for_Android ?
15:26 PM
[1997kB] joined the channel
15:26 PM
ruaok

https://juliareda.eu/2021/07/github-copilot-is-...
15:35 PM
alastairp

ruaok: sorry, I had to pop out. did lucifer answer your question about public dumps?
15:35 PM
akshaaatt[m]

<ruaok "https://juliareda.eu/2021/07/git"> This is really interesting!
15:35 PM
ruaok

alastairp: y
15:35 PM
alastairp

outsidecontext_: that's an oversight
15:36 PM
akshaaatt[m]

I really wish it were a free plugin though. Anyway, open sourced plugins similar to this will float eventually.
15:36 PM
alastairp

or more specifically, we made a list of things that we thought people might want to select, and that wasn't in our initial
15:36 PM
list
15:40 PM
lucifer

akshaaatt[m]: i think we can use the MB android app page but that means we also have to maintain it at two places. let's finalize the details of the readme and see how we want to do it.
15:40 PM
akshaaatt[m]

Okaaayyy boss!
15:41 PM
lucifer

alastairp: sklearn training now takes ~5m after fixing the groundtruth path.
15:41 PM
alastairp

🎉
15:41 PM
lucifer

time to move to next step now :D
15:41 PM
alastairp

did you find the model file?
15:42 PM
lucifer

we have a lot of files in dataset directory of sklearn. checking for pkl file.
15:42 PM
yup we have it
15:43 PM
`/home/acousticbrainz/acousticbrainz-server/data/datasets/8f9c452b-6cef-4f36-a4c9-f2b29d4f167b/8f9c452b-6cef-4f36-a4c9-f2b29d4f167b/best_clf_model.pkl`
15:43 PM
alastairp

great
15:43 PM
why is the uuid there twice?
15:44 PM
lucifer

not sure, but that's part of the groundtruth path stuff. due to some reason, groundtruth path is used to calculate dataset_dir path.
15:44 PM
alastairp

might be another error or perhaps an issue when selecting the groundtruth path, let's see if we can get rid of it
15:45 PM
lucifer

i added a separate arg to avoid messing with it. i'll read through the code and work on simplifying it.
15:45 PM
alastairp

ok, sounds great
15:45 PM
lucifer

do we need to keep the standalone scripts?
15:45 PM
alastairp

yes, I think they're useful to have
15:46 PM
lucifer

👍
15:46 PM
another question unrealted to the PR, why is dataset eval page in react instead of jinja2?
15:48 PM
alastairp

dataset editor is in react too
15:49 PM
having the editor in react was nice, as it made it interactive
15:49 PM
and so all of that part of the site is in react, as it was able to reuse code
15:49 PM
lucifer

ah right. makes sense.
15:49 PM
https://similarity.acousticbrainz.org/datasets/...
15:50 PM
alastairp

nice!
15:50 PM
Lotheric_ joined the channel
15:50 PM
lucifer

i just saw two different creation time format and found one is from jinja2 and other from react.
15:50 PM
alastairp

ah, right
15:50 PM
yeah, there are a lot of react/data display tickets open
15:51 PM
lucifer

ah! i see.
15:51 PM
i have also added the tool column here https://similarity.acousticbrainz.org/datasets/...
15:51 PM
alastairp

perfect
15:52 PM
lucifer

also looked into failed status stuff, we have it already but are not catching all exceptions so sometimes it does not get updated.
15:53 PM
BrainzGit

[acousticbrainz-server] 14alastair opened pull request #405 (03master…AB-460-chords_changes_rate): AB-460: Add tonal.chords_changes_rate to allowed lowlevel features https://github.com/metabrainz/acousticbrainz-se...
15:53 PM
alastairp

yes, I saw that. I think there's even a TODO saying to catch more exceptions, right? :)
15:53 PM
Lotheric has quit
15:53 PM
lucifer

yup, how poetic.
15:57 PM
outsidecontext_

alastairp: thanks, makes sense. So I could submit a PR to add this
15:57 PM
alastairp

outsidecontext_: ^ I just did :)
15:59 PM
outsidecontext_

Thanks!
15:59 PM
alastairp

I'm just finishing up a few other features that I hope to merge soon, so expect to see this available some time this week
16:03 PM
akshaaatt[m]

<BrainzGit "[musicbrainz-android] akshaaatt "> lucifer: Changes made ✌️
16:12 PM
BrainzGit

[musicbrainz-android] 14amCap1712 merged pull request #81 (03master…patch-1): Update README.md https://github.com/metabrainz/musicbrainz-andro...
16:13 PM
lucifer

akshaaatt[m]: thanks! looks much better now.
16:13 PM
!m akshaaatt[m]
16:13 PM
BrainzBot

You're doing good work, akshaaatt[m]!
16:13 PM
akshaaatt[m]

Sweet! 💯
16:42 PM
reosarevok

bitmap: https://tickets.metabrainz.org/browse/MBS-11762 seems like another side effect of the assumption you had about the disc URLs
16:42 PM
BrainzBot

MBS-11762: Medium toolbox missing on disc URLs
16:42 PM
reosarevok

I'm not sure what's the best option with this
16:49 PM
ritiek has quit
16:58 PM
ritiek joined the channel
17:00 PM
lucifer

meeting time? :)
17:00 PM
Freso: ping
17:01 PM
ruaok

wanna take a stab at running the meeting until Freso appears, lucifer ?
17:01 PM
if not, I'm happy to kick things off.
17:02 PM
lucifer

i haven't done that anytime before. better if you do it :D
17:02 PM
ruaok

ok.
17:02 PM
lucifer

thanks!
17:02 PM
ruaok

<BANG>
17:02 PM
meeting time!