#metabrainz

/

      • MRiddickW joined the channel
      • gcrkrause has quit
      • gcrkrause joined the channel
      • -- BotBot disconnected, possible missing messages --
      • BrainzBot joined the channel
      • gavinatkinson has quit
      • pprkut has quit
      • rdrg109 has quit
      • bitmap has quit
      • atj has quit
      • atj joined the channel
      • reosarevok
        yvanzo: will release prod, since the only known bug is "timelines load very slowly" which seems like it's fine to fix for next release
      • Anything special about this docker release?
      • BrainzGit
        [musicbrainz-server] 14reosarevok merged pull request #2332 (03beta…MBS-12082-tidal-favicon): MBS-12082: Move Tidal favicon to external-favicons https://github.com/metabrainz/musicbrainz-serve...
      • MRiddickW has quit
      • dseomn has quit
      • reosarevok
        Updating beta
      • dseomn joined the channel
      • Updating prod
      • zas
        I love the smell of upgrades in the morning.... Moooin reosarevok
      • reosarevok
        Did the alerts wake you up? :D
      • zas
        yup ;)
      • but that's my time (and yours it seems)
      • reosarevok
        We should probably try to get https://github.com/metabrainz/musicbrainz-serve... in for next time :)
      • pprkut joined the channel
      • alastairp
        hello
      • 4pm today for more gaga stuff, right?
      • ruaok
        mooin!
      • yes, gaga stuff 4pm.
      • reosarevok
        yvanzo: blog post ready to review, draft release of docker ready, didn't tag docker yet in case we have something else to add, although I'm expecting not since there's nothing new in master
      • Do let me know
      • zas: see support about ip addresses
      • zas
        I'll check
      • ruaok
        lucifer: that list of releases looks very promising -- I really want to listen to it.
      • lucifer
        nice :D
      • ruaok
        is there any way to get this data into a dataset hoster or something that is playable?
      • I think I will embed BP into data set hoster once that is possible.
      • lucifer
        i think i can get this on LB prod this week.
      • ruaok
        another thought on this: would it be possible to differentiate on album type?
      • lucifer
        once this data is in LB db we can present it in any way we want.
      • can you elaborate on that?
      • ruaok
        I care less about singles releases or EPs, but albums, those I really want to know about
      • lucifer
        i see. yes should be possible i think.
      • do you mean to just select albums or include album type in the final json?
      • *just select albums before in the sql query itself
      • ruaok
        I think selecting the column is a good start.
      • then the question is: how do we present this?
      • do we want to make a list of releases and a "selected recordings from this list of releases" playlist
      • ?
      • I can make the latter happen if you can get this data into the data set hoster.
      • reosarevok
        Keep in mind EPs very often are also new music that doesn't end up in albums, so I'd want to be able to at least choose to see those :)
      • lucifer
        we'll get a release mbid as the output. show the users the list of releases. if they click one, load the recordings from MB WS and feed it to BP. thoughts?
      • much like how huesound works actually. clicking color gets release mbids then we fetch recordings from db.
      • ruaok
        reosarevok: claro, I just want to be able to skew it.
      • lucifer: yes, I think that is good.
      • lucifer
        another possibility is random recordings from these release playlist.
      • ruaok
        that means we need to write such an endpoint, which shouldn't be hard.
      • lucifer
        indeed.
      • ruaok
        lucifer: the random recordings is what I was suggesting for a playlist. we could do both.
      • lucifer
        sounds good to me.
      • ruaok
        a lot of my top discoveries of 2021 haven't been matched to MBIDs.
      • so now I am trying to make sure that they exist in MB and am working on a "retry no_match" mbid mapping entries/
      • lucifer
        ah, that's not good. matching is a key part of much of the stuff we are doing.
      • ruaok
        so we can tell people to go and get their missing albums in before we run the data for real.
      • lucifer
        noice!
      • ruaok
        I really need to make the mapping match this album: https://musicbrainz.org/release/87f10206-f3fe-4...
      • its a great case for bad data in MB.
      • I thought it should already match this -- not sure why not. I'll check,
      • lucifer
        do you have a sample listen that is not matching this?
      • ruaok
        not to hand, no.
      • see one track not matched near the 6s. "the river bend"
      • lucifer
        i see.
      • the one in MB uses feat. while the listen has `,`
      • do we ignore join phrases while trying to match?
      • ruaok
        no
      • lucifer
        thoughts on doing that?
      • ruaok
        but we attempt to find common endings on tracks like "feat. XXXX"
      • and if we remove them and get a match, its low or med quality match.
      • lucifer
        makes sense
      • ruaok
        well, I wouldn't spend a lot of time thinking about join phrases. the extra guff usually are search engine stop words that get filtered out.
      • lucifer
        ruaok, running the steps mbid mapper goes through manually. i see typesense find the mbid match but evaluate_hit discards it.
      • ruaok
        sounds about right.
      • picking the right threshold for such an operation is tricky to say the least.
      • lucifer
      • this is the original comparision without detuning
      • ruaok
        yeah.
      • I wonder if there should be a detune, that removes everthing past the first comma.
      • from the artist name.
      • lucifer
        uh that looks wrong. i probably messed something up the artist should not be The River bend,
      • i'll retry more carefully but is this code working as expected https://github.com/metabrainz/listenbrainz-serv...
      • detune_query_string('The River Bend') returns ''
      • ruaok
        we need to add a lot more detunings, I think.
      • lucifer
        should it just the whole thing back if there's no cruft?
      • *just return the whole thing
      • ruaok
        it should return the whole thing if there is no cruft, returning "" is wrong
      • well, but clearly that is what the code does. let me look closer.
      • lucifer
        👍
      • ruaok
        yeah, that seems wrong.
      • it should return the whole string. time to add more detailed test cases that catches this.
      • lucifer
        cool, let's fix that bug. it should hopefully improve some matches.
      • ruaok
        possibly.
      • I'll dive into that once the migration doc for the afternoon is worked up.
      • alastairp
        I'm copying a 500gb file over a usb2 hard drive enclsure
      • I now remember that usb2 max speed is 400mbps
      • my home internet connection is faster
      • lucifer
        another thing i was thinking that we detune only query strings not the mb data (makes sense because mb data is mostly correct) but then levensthein distance is symmetric so should not matter anyways, right?
      • ruaok
        lucifer: if you have more examples that the mapping should match, but doesn't, let me know. I'll add them to the tests
      • lucifer
        sure will do
      • ruaok
        detuning MB data is intended to catch the subculture example
      • I think that is still valid.
      • lucifer
        uh right, i got confused.
      • ruaok
        alastairp: lucifer zas : how do we feel about doing backups before today's migration? In theory today's operations are a little less error prone, but still. what should our level of backups be today?
      • lucifer: this shit does get confusing.
      • lucifer
        thinking again. yes, i think another step that detunes MB data after detuned query string has no match and assigns a low match in this case sounds good.
      • indeed 😓
      • alastairp
        ruaok: incremental backup and playlist/mapping backup took less than 5 minutes, I don't have a problem doing them again just in case
      • ruaok
        lets gather more cases that didn't match and see if we can make a prioritized list of things to add
      • lucifer
        +1 on both
      • ruaok
        lucifer: we didn't happen to run a full dump last night, did we?
      • lucifer
        let me check
      • alastairp
        oh, or more specifically, there isn't a full dump running right now, is there?
      • ruaok
        even better question.
      • zas
        ruaok: we should be ready for the worse, so back up whatever is needed
      • lucifer
        uh no..
      • we are in the middle of one
      • ruaok
        zas: yeah, ok.
      • lucifer
      • won't be done anytime soon.
      • ruaok
        2016? 4 hours? hmm. could be possible.
      • won't be done is different from done dumping from gaga.
      • if we're done dumping from gaga, we can proceed.
      • (and onto compressing dumps and cp'ing them)
      • lucifer
        yeah, i checked the dump timings we ran on 11th. its like
      • and this is the normal listen dump. after this spark listen dump will start
      • ruaok
        crap
      • delay by a day or abort dumps?
      • do we have full incremental dumps since the last full dump?
      • lucifer
        yes
      • ruaok
        (read: could we recover from this, by dumping out incremental dumps and backing up the other tables?)
      • lucifer
        i'd suggest abort dumps. deploy alastairp's no listen flag PR. dump all tables except listens fully and do an inc listen dump.
      • ruaok
        I'm ok with that.
      • alastairp
        lucifer: good idea
      • ruaok
        alastairp: ?
      • lucifer
        ok let's do that then.
      • alastairp
        I tested a full end-to-end dump + import of listens, user table, playlist table, mappings table (didn't test stats, but nothing changed there)
      • ruaok
        is that PR misnamed?
      • does it create private and public dumps?