if we had the diskspace, I would say, screw it, just write a duplicate table.
and then we can swap over in an instant.
but I dont think we can get away with that.
kinduff
alrighty, thank you lucifer, reosarevok, ruaok
alastairp
and even then, I suspect that recreating the continuous aggregate on a new datetime column at the same time that we have the one on the timestamp column will cause disk problems too
ruaok
we wouldn't need that.
the new table would not be accessed until the actual switchover.
we could take the listenstore offline for an hour or so during the switchover, no big deal.
alastairp
for both reads and writes?
and then during that time drop the existing aggregate and recreate it on the datetime column?
ruaok
yes
alastairp
I'm happy to try it, though now this task has now ballooned in size a bit
ruaok
could we run a faily simple trial as a proof of concept?
fairly == not fail-y
create table, start copied old rows, monitor for 10% and extrapolate how much disk space it would really take.
and only if it looks doable do we proceed with this approach.
alastairp
the only reason to do both columns at the same time would be to avoid 2 schema changes, are we worried about that? (I'm not)
sure, let me put together a quick PR for that
ruaok
i'm not.
but, I am worried about an exclusive table lock during the ALTER TABLE command.
and we have no idea how long ALTER TABLE will run for
alastairp
I believe that add column default null in postgres now no longer needs a lock
however, moving to not null may need one
ruaok
if we can avoid a table lock we should use your solution. clearly simpler.
alastairp
OK, I'll do the following: 1) PR for moving to user_id, 2) PR for testing the change to a date time field - by making a new table and copying 10%, 3) verify that adding a column doesn't need a lock and check the time that adding a not null constraint requires
thanks for the discussion
ruaok
1) does not need a table lock either?
alastairp
for adding the column, no
but now I'm doubting the change of the constraint
ruaok
let's do 3) first.
because that will really inform 1 and 2
alastairp
perfect, let me finish my db import and I'll test that
wow. I just ran a query on timescale to extract spotify recording IDs from the new mapping.
anyone wanna guess how many rows it has?
reosarevok
9387579!
(I might have generated a random number)
ruaok
you might be withing 20%
-g
11M rows!
but, those are mapped against MSIDs.
meaning it contains loads of dupes
1.4M unique recordings.
lucifer: alastairp : quick ponderance.... for the parquet based LB dumps intended to be imported into spark...
those are mostly intended as internal use. does it make sense to spend all that time XZ compressing them just to move them to another server at the same datacenter?
(or cluster of datacenters)
I'm inclined to not compress at ALL.
lucifer
sure i think we can get away without compressing to xz.
also, parquet files by default use "snappy" compression iirc so it might already be comparable to compressed json anyway.
ruaok
oh. well, that makes everything easier then.
was it 64MB chunks?
I wonder how to estimate that.
lucifer
128 MB chunks or a lit less than less.
ruaok
k.
if there is compression in the mix, I'll have to play with it to see if I can get close without going over.
ruaok laughs at the thought that his first hard drive was 30MB large
MBS-11767: Track-level artists that differ from the release artist are no longer shown on multi-disc releases that aren't fully loaded
reosarevok
I took a quick look but I'm not sure why medium is not being detected as changed by useMemo
Sophist-UK joined the channel
Sophist-UK has quit
Sophist-UK joined the channel
Sophist_UK has quit
lucifer
ruaok: re lb public dumps, iiuc the `user` table schema of the public dumps is incorrect. we only import that table when there is no private dump so when we try to import public dump solely we get an error.
ruaok: sorry, I had to pop out. did lucifer answer your question about public dumps?
akshaaatt[m]
<ruaok "https://juliareda.eu/2021/07/git"> This is really interesting!
ruaok
alastairp: y
alastairp
outsidecontext_: that's an oversight
akshaaatt[m]
I really wish it were a free plugin though. Anyway, open sourced plugins similar to this will float eventually.
alastairp
or more specifically, we made a list of things that we thought people might want to select, and that wasn't in our initial
list
lucifer
akshaaatt[m]: i think we can use the MB android app page but that means we also have to maintain it at two places. let's finalize the details of the readme and see how we want to do it.
akshaaatt[m]
Okaaayyy boss!
lucifer
alastairp: sklearn training now takes ~5m after fixing the groundtruth path.
alastairp
🎉
lucifer
time to move to next step now :D
alastairp
did you find the model file?
lucifer
we have a lot of files in dataset directory of sklearn. checking for pkl file.
also looked into failed status stuff, we have it already but are not catching all exceptions so sometimes it does not get updated.
BrainzGit
[acousticbrainz-server] 14alastair opened pull request #405 (03master…AB-460-chords_changes_rate): AB-460: Add tonal.chords_changes_rate to allowed lowlevel features https://github.com/metabrainz/acousticbrainz-se...
alastairp
yes, I saw that. I think there's even a TODO saying to catch more exceptions, right? :)
Lotheric has quit
lucifer
yup, how poetic.
outsidecontext_
alastairp: thanks, makes sense. So I could submit a PR to add this
alastairp
outsidecontext_: ^ I just did :)
outsidecontext_
Thanks!
alastairp
I'm just finishing up a few other features that I hope to merge soon, so expect to see this available some time this week
akshaaatt[m]
<BrainzGit "[musicbrainz-android] akshaaatt "> lucifer: Changes made ✌️