#metabrainz

/

      • lucifer
        reosarevok: oh which one?
      • 2022-03-25 08409, 2022

      • lucifer
        rmq issues? i can help look
      • 2022-03-25 08438, 2022

      • reosarevok
        The also adding those mbids we're removing from the main table with the dedup to the canonical table
      • 2022-03-25 08455, 2022

      • reosarevok
        lucifer: did you see the sir error I pasted earlier?
      • 2022-03-25 08406, 2022

      • lucifer
        ah yes, once we have inspected the data to find out what all is there after that
      • 2022-03-25 08423, 2022

      • lucifer
        no, i'll look now.
      • 2022-03-25 08400, 2022

      • reosarevok
        lucifer: just tons and tons of
      • 2022-03-25 08403, 2022

      • reosarevok
      • 2022-03-25 08406, 2022

      • reosarevok
        Still happening now
      • 2022-03-25 08423, 2022

      • reosarevok
        (with ssh pink docker logs --follow --since="$(date -d '5 minutes ago' -u +%FT%TZ)" --timestamps sir-prod)
      • 2022-03-25 08434, 2022

      • lucifer
        i see, looking
      • 2022-03-25 08438, 2022

      • reosarevok
        Thanks!
      • 2022-03-25 08441, 2022

      • reosarevok
        mayhem: so, for the dedup: I assume we *still* want to consider "Foo" by "X & Z" and "Foo" by "X feat. Z" the same even if the AC differs, right?
      • 2022-03-25 08431, 2022

      • reosarevok
        (arguably, we should be looking at the artists inside the artist_credit_name and ignoring the name, but I guess that might not work for your use case)
      • 2022-03-25 08404, 2022

      • reosarevok
        Hmm, actually, I guess artist_credit_name would leave "feat" in the decoded name
      • 2022-03-25 08425, 2022

      • reosarevok
        So those two wouldn't clash, just something like "X & Z" vs "X, Z" vs "X / Z" :)
      • 2022-03-25 08448, 2022

      • mayhem
        hang on. back in 5.
      • 2022-03-25 08428, 2022

      • reosarevok
        Ok!
      • 2022-03-25 08411, 2022

      • lucifer
        reosarevok: the error message is not much helpful but rmq-clash logs say client closed connection unexpectedly.
      • 2022-03-25 08443, 2022

      • lucifer
        huh a blame game, sir logs says server did it. rmq logs says client did it.
      • 2022-03-25 08414, 2022

      • lucifer
        do you know if its safe to restart sir?
      • 2022-03-25 08436, 2022

      • PrathameshG
        lucifer: Got it.
      • 2022-03-25 08441, 2022

      • lucifer
        this error message has been multiple times but less frequently in the past 4 months acc to sentry.
      • 2022-03-25 08403, 2022

      • reosarevok
        lucifer: should be, I'll do it
      • 2022-03-25 08408, 2022

      • lucifer
        👍
      • 2022-03-25 08417, 2022

      • reosarevok
      • 2022-03-25 08410, 2022

      • reosarevok
        tailing the logs now
      • 2022-03-25 08453, 2022

      • reosarevok
        No more errors and it seems like it's starting to decrease... hopefully that was all there is to it then?
      • 2022-03-25 08459, 2022

      • reosarevok
        But keeping an eye
      • 2022-03-25 08406, 2022

      • lucifer
        maybe some missed error handling in sir. it disconnected temporarily due to some reason but then didn't reconnect while the rest of code assumed it did.
      • 2022-03-25 08433, 2022

      • alastairp
        morning
      • 2022-03-25 08454, 2022

      • lucifer
        🤦 i forgot to rebuild the mapping container so it used the wrong commit :( building again
      • 2022-03-25 08412, 2022

      • alastairp
        I'm without internet at home, I'll hang around working offline, and try and jump in on tethering every now and again until things are back normal
      • 2022-03-25 08415, 2022

      • lucifer
        morning!
      • 2022-03-25 08441, 2022

      • reosarevok
        Hmm, the queues are actually slowly rising again, but no errors. Maybe it's just the issue where if it's too high it doesn't come back down on its own. I'll try the whole saving-the-messages thing yvanzo documented
      • 2022-03-25 08404, 2022

      • d4rkie joined the channel
      • 2022-03-25 08439, 2022

      • reosarevok
        All saved, let's see what happens now
      • 2022-03-25 08412, 2022

      • mayhem returns after a surprise visit from a friend
      • 2022-03-25 08455, 2022

      • reosarevok
        Oh no, unexpected socialising
      • 2022-03-25 08403, 2022

      • reosarevok shudders :D
      • 2022-03-25 08420, 2022

      • reosarevok
        Ok, sir seems to be working fine and processing messages again
      • 2022-03-25 08432, 2022

      • reosarevok
        Will start queuing the saved messages in small batches
      • 2022-03-25 08417, 2022

      • mayhem
        do we have examples for two recordings that are in conflict I can look at?
      • 2022-03-25 08421, 2022

      • PrathameshG has quit
      • 2022-03-25 08411, 2022

      • mayhem
        the vacuum analyze on bono finished. but swapping the new mb_metadata_cache table into production at gaga didn't finish. odd.
      • 2022-03-25 08433, 2022

      • atj
        reosarevok: hello
      • 2022-03-25 08440, 2022

      • mayhem
        ah! stopping listenbrainz-web-test allowed it to finish.
      • 2022-03-25 08456, 2022

      • reosarevok
        atj: hi! seemingly not actually a rabbitmq issue after all, so no need anymore :)
      • 2022-03-25 08425, 2022

      • reosarevok
        mayhem: lucifer mentioned that https://beta.musicbrainz.org/recording/0007534c-d… did not get listed
      • 2022-03-25 08417, 2022

      • reosarevok
      • 2022-03-25 08456, 2022

      • mayhem
        ah, I see. would it be cheeky to call that a data problem?
      • 2022-03-25 08405, 2022

      • reosarevok
        Yes
      • 2022-03-25 08417, 2022

      • reosarevok
        I mean, if it's really dropping 3 million recordings, yes :D
      • 2022-03-25 08418, 2022

      • mayhem
        "David Guetta,JD Davis"
      • 2022-03-25 08439, 2022

      • reosarevok
        Well, if it was "David Guetta, JD Davis" with a space you'd get the same :)
      • 2022-03-25 08440, 2022

      • mayhem
        "David Guetta & J.D. Davis"
      • 2022-03-25 08404, 2022

      • mayhem
        and this isn't an argument for merging this stuff into a single AC?
      • 2022-03-25 08408, 2022

      • reosarevok
        If one release prints it one way and the other the other way, it's correct to have it like that (probably with a space after the comma though)
      • 2022-03-25 08424, 2022

      • mayhem
        ah yes, fair.
      • 2022-03-25 08424, 2022

      • reosarevok
        You can't, acs explicitly need to have the same credit, join phrases, etc
      • 2022-03-25 08459, 2022

      • mayhem
        ok.
      • 2022-03-25 08403, 2022

      • lucifer
        for our purposes, we'd mark one as canonical and redirect all others to it though
      • 2022-03-25 08405, 2022

      • mayhem
        I really have no idea how to resolve this.
      • 2022-03-25 08406, 2022

      • reosarevok
        I did suggest that an option might be to not use ACs but AC artist MBIDs for deduping
      • 2022-03-25 08417, 2022

      • reosarevok
        But that might not work for matching to messybrainz :)
      • 2022-03-25 08427, 2022

      • reosarevok
        I think the least bad option is what lucifer said
      • 2022-03-25 08408, 2022

      • mayhem
        that could work if the underlying audio is the same track. does that appear to be the case?
      • 2022-03-25 08414, 2022

      • mayhem
        no
      • 2022-03-25 08421, 2022

      • lucifer
        uh yeah in this case no.
      • 2022-03-25 08425, 2022

      • reosarevok
        You just take all the MBIDs for a specific combined_lookup and throw it into canonical_recording
      • 2022-03-25 08435, 2022

      • lucifer
        consider track length too?
      • 2022-03-25 08449, 2022

      • reosarevok
        I mean, you'll already be conflating actually-different-recordings anyway
      • 2022-03-25 08454, 2022

      • mayhem
        track length would open a greater can of worms/
      • 2022-03-25 08401, 2022

      • lucifer
        yeah indeed
      • 2022-03-25 08409, 2022

      • lucifer
        not to mention that we don't have it most listens
      • 2022-03-25 08416, 2022

      • reosarevok
        AFAICT, you're already merging live and studio versions if they have the same title + ac, no
      • 2022-03-25 08419, 2022

      • reosarevok
        ?
      • 2022-03-25 08429, 2022

      • mayhem
        yeah
      • 2022-03-25 08438, 2022

      • reosarevok
        So it doesn't seem any different to me
      • 2022-03-25 08452, 2022

      • mayhem
        and there are a lot of liberties that have been taken here in order to get a decent mapping.
      • 2022-03-25 08459, 2022

      • reosarevok
        As I said, yes, there's a small chance you'll conflate a track with a very common name by two different artists with the same name
      • 2022-03-25 08407, 2022

      • reosarevok
        But it seems about as minor as the punctuation-only issue tbh
      • 2022-03-25 08418, 2022

      • mayhem
        "You just take all the MBIDs for a specific combined_lookup and throw it into canonical_recording"
      • 2022-03-25 08423, 2022

      • mayhem
        how do you feel about that lucifer ?
      • 2022-03-25 08440, 2022

      • lucifer
        i guess for the automatic mapper continue to do this. but in future let users override mapping for specific listens.
      • 2022-03-25 08418, 2022

      • reosarevok
        There's two ways to do that cleanly, a) you specifically exclude the *first* mbid and only throw the others in or b) (probably easier) you literally throw all into canonical_recording at first, then remove any mbids from canonical_recording that already appear on the main table
      • 2022-03-25 08432, 2022

      • reosarevok
        (since you're also maybe going to get some dupes that *are* just dupes)
      • 2022-03-25 08401, 2022

      • lucifer
        mayhem: yeah i agree with that unless it is entirely different artists and recordings.
      • 2022-03-25 08408, 2022

      • mayhem
        it would be great if we could get rid of the dedup step at the end and have the alg produce data without dups.
      • 2022-03-25 08429, 2022

      • lucifer
        reosarevok: do know of an example at that? like the Prodigy one you mentioned
      • 2022-03-25 08417, 2022

      • reosarevok
        lucifer: an example where there's two different acs, but the same artist?
      • 2022-03-25 08446, 2022

      • lucifer
        uh no, different ac different artist but after removing punctuation it becomes the same.
      • 2022-03-25 08400, 2022

      • reosarevok
        Oh
      • 2022-03-25 08427, 2022

      • reosarevok
        Well, it's easier to find ones with different join phrases probably
      • 2022-03-25 08435, 2022

      • lucifer
        but again its likely to be an edge case so i am in favor of letting users's handle that.
      • 2022-03-25 08449, 2022

      • reosarevok
      • 2022-03-25 08400, 2022

      • lucifer
        here the best match we could find, if you don't like feel free to change it.
      • 2022-03-25 08410, 2022

      • reosarevok
        If it's an edge case then it can't be the cause for a 3 million recording difference? :)
      • 2022-03-25 08429, 2022

      • reosarevok
        But yes, in general, "match as best you can but allow to change it" seems sensible to me
      • 2022-03-25 08430, 2022

      • mayhem
        lucifer: I see you commented out the dedup step for canonical recordings as well. just for testing or was there a solid reason for that?
      • 2022-03-25 08405, 2022

      • reosarevok
        Unrelatedly, I'm slowly requeing all those sir messages, seems like all is good
      • 2022-03-25 08406, 2022

      • lucifer
        mayhem: testing to confirm that dedup is also not removing rows unexpectedly.
      • 2022-03-25 08412, 2022

      • mayhem
        ok.
      • 2022-03-25 08428, 2022

      • mayhem
        then I think we should change the dedup step to insert found rows into canoncial_recordings
      • 2022-03-25 08445, 2022

      • mayhem
        it is much harder to do this earlier since we process data AC by AC.
      • 2022-03-25 08423, 2022

      • lucifer
        reosarevok: oh yes, for different join phrase i say redirect. that's not the edge case i am talking about. its different artist before punctuation smae after one i called edge case.
      • 2022-03-25 08424, 2022

      • mayhem
        s/DELETE FROM/ INSERT INTO/
      • 2022-03-25 08430, 2022

      • lucifer
        yeah makes sense
      • 2022-03-25 08437, 2022

      • lucifer
        insert into followed by a delete.
      • 2022-03-25 08442, 2022

      • mayhem
        yes.
      • 2022-03-25 08406, 2022

      • mayhem
        my schedule is discombobulated for the next 5 hours. if you're free to take a stab at it, please do lucifer .
      • 2022-03-25 08418, 2022

      • mayhem
        I'll be back this afternoon to continue working on this.
      • 2022-03-25 08420, 2022

      • lucifer
        somewhat related canonical recordings and mbid mapping need to be in the same db for that. can't use --timescale.
      • 2022-03-25 08420, 2022

      • reosarevok
        You can probably do the delete + insert in the same query
      • 2022-03-25 08435, 2022

      • mayhem
        mb_metadata_cache, however, looks good now. caa_ids are present.
      • 2022-03-25 08415, 2022

      • mayhem
        lucifer: then, lets make it so that either all or none of the produced tables are stored in TS.
      • 2022-03-25 08419, 2022

      • mayhem
        that should work, no?
      • 2022-03-25 08423, 2022

      • lucifer
        yes
      • 2022-03-25 08432, 2022

      • mayhem
        great.
      • 2022-03-25 08447, 2022

      • mayhem
        reosarevok: thanks for all your help. I knew this would be easier with you helping.
      • 2022-03-25 08453, 2022

      • lucifer
        yes probably but 2 data modifying statements in 1 cte may be calling for problems. since pg doesn't mandate which order those will run.
      • 2022-03-25 08454, 2022

      • reosarevok
      • 2022-03-25 08459, 2022

      • reosarevok
        Since you're just running raw sql and you're probably on psql 10+ (I hope?) that is likely to be of use
      • 2022-03-25 08403, 2022

      • lucifer
        ah i see, that can work. i was thing select, insert, delete which is problematic. this probably no.
      • 2022-03-25 08413, 2022

      • lucifer
        yup this is MB db so 12.
      • 2022-03-25 08416, 2022

      • lucifer
        thanks!
      • 2022-03-25 08454, 2022

      • reosarevok
        We actually do this in MB in one place
      • 2022-03-25 08455, 2022

      • yvanzo
        O’Moin
      • 2022-03-25 08415, 2022

      • reosarevok
      • 2022-03-25 08420, 2022

      • reosarevok
        But keep in mind the comment there, lucifer
      • 2022-03-25 08433, 2022

      • reosarevok
        In case it's relevant to your issue
      • 2022-03-25 08455, 2022

      • reosarevok
        I guess since you want all of them it might not be :)
      • 2022-03-25 08459, 2022

      • reosarevok
        moin yvanzo!
      • 2022-03-25 08409, 2022

      • lucifer
        ah thanks. yes i think not.
      • 2022-03-25 08411, 2022

      • reosarevok
        Your sir recovery thing seems to be working great, is the good news
      • 2022-03-25 08425, 2022

      • reosarevok
        Do we know of an error where sir fails to reconnect to rabbitmq?
      • 2022-03-25 08435, 2022

      • CatQuest
        thre must be more then 218 artists
      • 2022-03-25 08451, 2022

      • CatQuest
        tk
      • 2022-03-25 08456, 2022

      • CatQuest
        *than
      • 2022-03-25 08400, 2022

      • reosarevok
        CatQuest: That *only* have punctuation or spaces, mind, not that have that + letters and numbers
      • 2022-03-25 08402, 2022

      • CatQuest
      • 2022-03-25 08406, 2022

      • CatQuest
        hmm
      • 2022-03-25 08421, 2022

      • reosarevok
        So ones like "O" wouldn't count
      • 2022-03-25 08426, 2022

      • CatQuest
        can you figure out if any of tose 218 do nt have ag "sillyname" :D
      • 2022-03-25 08437, 2022

      • reosarevok
        haha
      • 2022-03-25 08438, 2022

      • reosarevok
        I can
      • 2022-03-25 08449, 2022

      • CatQuest
        ;O
      • 2022-03-25 08440, 2022

      • atj
        what machines were the errors occuring on? connections were between rabbitmq and sir right?
      • 2022-03-25 08445, 2022

      • CatQuest
        but also https://beta.musicbrainz.org/release/1494935a-92f… is good and i'd still wanna scrob that
      • 2022-03-25 08433, 2022

      • CatQuest