yvanzo: moin! Anything special for this docker release?
hi reosarevok: yes, I drafted release notes but have to make some improvements.
Ok :) I'll start the prod release (won't be around in the evening probably)
but we can look at that bit later
oh hey reo ˆ__ˆ
lucifer: on TS the listened_at_track_name_user_id_ndx_listen index was created live and we didn't decide at the time if we wanted to keep it, yes?
because if that is so then PR 2042 makes sense. :)
mayhem: yes it was created live. we needed it to keep the on conflict clauses working.
still need to figure out how many dupes are there in the db and how to delete those.
there is dup detection and removal code in the MBID mapping stuff, you can take a look at it.
to use it for TS, I think we would have to do it on a set of chunks at the same time
well, one at a time, once the new index is in place.
we cant create the index without deleting dupes.
why not delete the dups?
ah no, i mean we should delete the dupes. i misunderstood your message as to create index first and delete afterwards
that would be ideal, but not possible.
we will have the problem that new dups can be created while we are deleting the old ones.
but I wonder if we can make the script that deletes dups work on ranges or the whole listen table.
then we do a month or so at a time and then once that is done, we try to create the index.
if that fails, we delete dups across the whole table.
but I doubt that would work, so we might end up chasing our tail on this one.
i think dup deletion should be fast enough that we can stop ts writer while the script runs.
I really doubt that.
i see, lets try how fast it goes on one chunk and then decide what to do accordingly.
well, if we do it in python then maybe. but pure SQL, I think that is going to OOM
hmm, dont think it should oom but yeah really cant say without trying
if we just fetch all the tracks ordered by listened_at and the other dedup fields and then just slowly delete all the dups, that could work. it might be fast enough for the second pass to run with TS writer stopped.
alastairp: also, you mentioned the part about making a csv with the following columns: mlhd_recording_mbid, mlhd_artist_mbid, mlhd_recording_name, mlhd_artist_name, mb_recording_artist_credit, mb_artist_mbids, mb_canonical_recording_mbid
TBH I am still a bit confused about this one. Maybe breaking it down into some macro steps could help :)