if we had the diskspace, I would say, screw it, just write a duplicate table.
2021-07-05 18650, 2021
ruaok
and then we can swap over in an instant.
2021-07-05 18606, 2021
ruaok
but I dont think we can get away with that.
2021-07-05 18622, 2021
kinduff
alrighty, thank you lucifer, reosarevok, ruaok
2021-07-05 18638, 2021
alastairp
and even then, I suspect that recreating the continuous aggregate on a new datetime column at the same time that we have the one on the timestamp column will cause disk problems too
2021-07-05 18650, 2021
ruaok
we wouldn't need that.
2021-07-05 18600, 2021
ruaok
the new table would not be accessed until the actual switchover.
2021-07-05 18626, 2021
ruaok
we could take the listenstore offline for an hour or so during the switchover, no big deal.
2021-07-05 18644, 2021
alastairp
for both reads and writes?
2021-07-05 18600, 2021
alastairp
and then during that time drop the existing aggregate and recreate it on the datetime column?
2021-07-05 18631, 2021
ruaok
yes
2021-07-05 18612, 2021
alastairp
I'm happy to try it, though now this task has now ballooned in size a bit
2021-07-05 18649, 2021
ruaok
could we run a faily simple trial as a proof of concept?
2021-07-05 18604, 2021
ruaok
fairly == not fail-y
2021-07-05 18631, 2021
ruaok
create table, start copied old rows, monitor for 10% and extrapolate how much disk space it would really take.
2021-07-05 18640, 2021
ruaok
and only if it looks doable do we proceed with this approach.
2021-07-05 18642, 2021
alastairp
the only reason to do both columns at the same time would be to avoid 2 schema changes, are we worried about that? (I'm not)
2021-07-05 18601, 2021
alastairp
sure, let me put together a quick PR for that
2021-07-05 18603, 2021
ruaok
i'm not.
2021-07-05 18618, 2021
ruaok
but, I am worried about an exclusive table lock during the ALTER TABLE command.
2021-07-05 18631, 2021
ruaok
and we have no idea how long ALTER TABLE will run for
2021-07-05 18639, 2021
alastairp
I believe that add column default null in postgres now no longer needs a lock
2021-07-05 18651, 2021
alastairp
however, moving to not null may need one
2021-07-05 18631, 2021
ruaok
if we can avoid a table lock we should use your solution. clearly simpler.
2021-07-05 18609, 2021
alastairp
OK, I'll do the following: 1) PR for moving to user_id, 2) PR for testing the change to a date time field - by making a new table and copying 10%, 3) verify that adding a column doesn't need a lock and check the time that adding a not null constraint requires
2021-07-05 18622, 2021
alastairp
thanks for the discussion
2021-07-05 18653, 2021
ruaok
1) does not need a table lock either?
2021-07-05 18606, 2021
alastairp
for adding the column, no
2021-07-05 18617, 2021
alastairp
but now I'm doubting the change of the constraint
2021-07-05 18631, 2021
ruaok
let's do 3) first.
2021-07-05 18644, 2021
ruaok
because that will really inform 1 and 2
2021-07-05 18647, 2021
alastairp
perfect, let me finish my db import and I'll test that
wow. I just ran a query on timescale to extract spotify recording IDs from the new mapping.
2021-07-05 18637, 2021
ruaok
anyone wanna guess how many rows it has?
2021-07-05 18643, 2021
reosarevok
9387579!
2021-07-05 18649, 2021
reosarevok
(I might have generated a random number)
2021-07-05 18601, 2021
ruaok
you might be withing 20%
2021-07-05 18603, 2021
ruaok
-g
2021-07-05 18611, 2021
ruaok
11M rows!
2021-07-05 18617, 2021
ruaok
but, those are mapped against MSIDs.
2021-07-05 18625, 2021
ruaok
meaning it contains loads of dupes
2021-07-05 18658, 2021
ruaok
1.4M unique recordings.
2021-07-05 18621, 2021
ruaok
lucifer: alastairp : quick ponderance.... for the parquet based LB dumps intended to be imported into spark...
2021-07-05 18606, 2021
ruaok
those are mostly intended as internal use. does it make sense to spend all that time XZ compressing them just to move them to another server at the same datacenter?
2021-07-05 18611, 2021
ruaok
(or cluster of datacenters)
2021-07-05 18618, 2021
ruaok
I'm inclined to not compress at ALL.
2021-07-05 18604, 2021
lucifer
sure i think we can get away without compressing to xz.
2021-07-05 18627, 2021
lucifer
also, parquet files by default use "snappy" compression iirc so it might already be comparable to compressed json anyway.
2021-07-05 18656, 2021
ruaok
oh. well, that makes everything easier then.
2021-07-05 18601, 2021
ruaok
was it 64MB chunks?
2021-07-05 18605, 2021
ruaok
I wonder how to estimate that.
2021-07-05 18635, 2021
lucifer
128 MB chunks or a lit less than less.
2021-07-05 18644, 2021
ruaok
k.
2021-07-05 18600, 2021
ruaok
if there is compression in the mix, I'll have to play with it to see if I can get close without going over.
2021-07-05 18635, 2021
ruaok laughs at the thought that his first hard drive was 30MB large
MBS-11767: Track-level artists that differ from the release artist are no longer shown on multi-disc releases that aren't fully loaded
2021-07-05 18648, 2021
reosarevok
I took a quick look but I'm not sure why medium is not being detected as changed by useMemo
2021-07-05 18641, 2021
Sophist-UK joined the channel
2021-07-05 18641, 2021
Sophist-UK has quit
2021-07-05 18641, 2021
Sophist-UK joined the channel
2021-07-05 18638, 2021
Sophist_UK has quit
2021-07-05 18643, 2021
lucifer
ruaok: re lb public dumps, iiuc the `user` table schema of the public dumps is incorrect. we only import that table when there is no private dump so when we try to import public dump solely we get an error.
ruaok: sorry, I had to pop out. did lucifer answer your question about public dumps?
2021-07-05 18617, 2021
akshaaatt[m]
<ruaok "https://juliareda.eu/2021/07/git"> This is really interesting!
2021-07-05 18630, 2021
ruaok
alastairp: y
2021-07-05 18639, 2021
alastairp
outsidecontext_: that's an oversight
2021-07-05 18600, 2021
akshaaatt[m]
I really wish it were a free plugin though. Anyway, open sourced plugins similar to this will float eventually.
2021-07-05 18612, 2021
alastairp
or more specifically, we made a list of things that we thought people might want to select, and that wasn't in our initial
2021-07-05 18615, 2021
alastairp
list
2021-07-05 18622, 2021
lucifer
akshaaatt[m]: i think we can use the MB android app page but that means we also have to maintain it at two places. let's finalize the details of the readme and see how we want to do it.
2021-07-05 18639, 2021
akshaaatt[m]
Okaaayyy boss!
2021-07-05 18602, 2021
lucifer
alastairp: sklearn training now takes ~5m after fixing the groundtruth path.
2021-07-05 18613, 2021
alastairp
🎉
2021-07-05 18631, 2021
lucifer
time to move to next step now :D
2021-07-05 18645, 2021
alastairp
did you find the model file?
2021-07-05 18620, 2021
lucifer
we have a lot of files in dataset directory of sklearn. checking for pkl file.
also looked into failed status stuff, we have it already but are not catching all exceptions so sometimes it does not get updated.
2021-07-05 18602, 2021
BrainzGit
[acousticbrainz-server] 14alastair opened pull request #405 (03master…AB-460-chords_changes_rate): AB-460: Add tonal.chords_changes_rate to allowed lowlevel features https://github.com/metabrainz/acousticbrainz-serv…
2021-07-05 18617, 2021
alastairp
yes, I saw that. I think there's even a TODO saying to catch more exceptions, right? :)
2021-07-05 18644, 2021
Lotheric has quit
2021-07-05 18644, 2021
lucifer
yup, how poetic.
2021-07-05 18606, 2021
outsidecontext_
alastairp: thanks, makes sense. So I could submit a PR to add this
2021-07-05 18614, 2021
alastairp
outsidecontext_: ^ I just did :)
2021-07-05 18613, 2021
outsidecontext_
Thanks!
2021-07-05 18639, 2021
alastairp
I'm just finishing up a few other features that I hope to merge soon, so expect to see this available some time this week
2021-07-05 18639, 2021
akshaaatt[m]
<BrainzGit "[musicbrainz-android] akshaaatt "> lucifer: Changes made ✌️