#metabrainz

/

      • rdswift
        phibb, is this on a windows system?
      • 2019-10-14 28747, 2019

      • phibs
        yeah
      • 2019-10-14 28758, 2019

      • rdswift
        I'm not 100% sure, but perhaps Picard is chaning the elipses into three periods before it gets into the renaming.
      • 2019-10-14 28743, 2019

      • rdswift
        You might try either moving the replacement to a script in the scripting section rather than the renaming section, or
      • 2019-10-14 28703, 2019

      • phibs
        I tried those too :(
      • 2019-10-14 28709, 2019

      • phibs
        yeah this is from the scripting section
      • 2019-10-14 28725, 2019

      • rdswift
        try replacing three periods rather than the elipses.
      • 2019-10-14 28733, 2019

      • phibs
        yeah it did not match those either
      • 2019-10-14 28736, 2019

      • phibs
        regular or regex
      • 2019-10-14 28713, 2019

      • rdswift
        Well, I'm out of ideas. Sorry I couldn't help.
      • 2019-10-14 28742, 2019

      • phibs
        np
      • 2019-10-14 28757, 2019

      • rdswift
        Is it showing up as an elipses in the filename on Windows?
      • 2019-10-14 28749, 2019

      • rdswift
        or as three periods?
      • 2019-10-14 28717, 2019

      • rdswift
        I wonder if there is more than character code that displays as an elipses, and you're replacing a different code than is in the string? That's a "Hail Mary" for sure. ;-)
      • 2019-10-14 28709, 2019

      • phibs
        loops like elipses
      • 2019-10-14 28711, 2019

      • phibs
        esp when I copy/paste
      • 2019-10-14 28728, 2019

      • phibs
        Yeah i'm trying to just remove or replace w/ space
      • 2019-10-14 28734, 2019

      • phibs
        it doesn't detect the search pattern
      • 2019-10-14 28729, 2019

      • chaban has quit
      • 2019-10-14 28749, 2019

      • chaban joined the channel
      • 2019-10-14 28724, 2019

      • kepstin
        phibs: file naming pattern can't change tags, so that doesn't make sense in the file naming pattern
      • 2019-10-14 28755, 2019

      • kepstin
        what you could do is take the %title% in your file naming pattern and turn it into $replace(%title%,…,wat) maybe?
      • 2019-10-14 28733, 2019

      • kepstin
        or even just wrap the entire file naming pattern in a $replace call (just put the whole existing file naming pattern as the first arg)
      • 2019-10-14 28702, 2019

      • pristine__
        Moin
      • 2019-10-14 28710, 2019

      • phibs
        kepstin: I mean it definitely works if I do simple works
      • 2019-10-14 28712, 2019

      • phibs
        words *
      • 2019-10-14 28714, 2019

      • phibs
        the -> something else
      • 2019-10-14 28718, 2019

      • phibs
        so it should be able to strip the elipses
      • 2019-10-14 28759, 2019

      • Rotab has quit
      • 2019-10-14 28730, 2019

      • P23 has quit
      • 2019-10-14 28719, 2019

      • Pac23 joined the channel
      • 2019-10-14 28705, 2019

      • yvanzo
        mo’’in’
      • 2019-10-14 28704, 2019

      • outsidecontext
        phibs: The ellipsis replacement definitely works for me with your script and e.g. the second track on https://musicbrainz.org/release/da53e497-8c61-4d4… . But a few ideas:
      • 2019-10-14 28734, 2019

      • outsidecontext
        - Options > Metadata > "Convert Unicode punctuation characters to ASCII" might be enabled
      • 2019-10-14 28734, 2019

      • outsidecontext
        - Is some other script doing a conversion?
      • 2019-10-14 28734, 2019

      • outsidecontext
        - Any plugin that might change the title?
      • 2019-10-14 28700, 2019

      • Mineo
        also, if you do `$replace(%title%, …, wat)`, the extra space before `…` _does_ matter
      • 2019-10-14 28745, 2019

      • Pac23 has quit
      • 2019-10-14 28757, 2019

      • adhawkins has quit
      • 2019-10-14 28755, 2019

      • Pac23 joined the channel
      • 2019-10-14 28741, 2019

      • adhawkins joined the channel
      • 2019-10-14 28701, 2019

      • Gazooo has quit
      • 2019-10-14 28746, 2019

      • Gazooo joined the channel
      • 2019-10-14 28718, 2019

      • travis-ci joined the channel
      • 2019-10-14 28718, 2019

      • travis-ci
        [picard:master@e16c7c4 - build #150] CI passed! (https://travis-ci.org/phw/picard/builds/597527216)
      • 2019-10-14 28718, 2019

      • travis-ci has left the channel
      • 2019-10-14 28751, 2019

      • travis-ci joined the channel
      • 2019-10-14 28751, 2019

      • travis-ci
        metabrainz/picard#5113 (picard-2.2.3 - 4cfa625 : Philipp Wolfer): The build passed.
      • 2019-10-14 28751, 2019

      • travis-ci
      • 2019-10-14 28751, 2019

      • travis-ci
      • 2019-10-14 28751, 2019

      • travis-ci has left the channel
      • 2019-10-14 28738, 2019

      • ruaok
        moooin!
      • 2019-10-14 28745, 2019

      • ruaok
        pristine__: I've got some time to chat, finally.
      • 2019-10-14 28738, 2019

      • pristine__
        Thanks
      • 2019-10-14 28748, 2019

      • pristine__
        A sec
      • 2019-10-14 28701, 2019

      • alastairp
        hi ruaok
      • 2019-10-14 28716, 2019

      • pristine__
        Hi alastairp
      • 2019-10-14 28726, 2019

      • alastairp
        to confirm - did we set a time tomorrow? I only have a small window of about 2h
      • 2019-10-14 28732, 2019

      • alastairp
        hi pristine__
      • 2019-10-14 28744, 2019

      • alastairp
        I saw that you wanted to talk with me - I'll be free on Thursday and Friday
      • 2019-10-14 28755, 2019

      • pristine__
        alastairp: it is the same discussion i wanted to have with ruaok for like 5 min, but yeah we can have a follow up whenever you are free :)
      • 2019-10-14 28709, 2019

      • alastairp
        I'll keep an eye on the conversation and make any comments if I have time
      • 2019-10-14 28754, 2019

      • pristine__
      • 2019-10-14 28706, 2019

      • ruaok got distraced
      • 2019-10-14 28709, 2019

      • ruaok
        gimme 2 minutes.
      • 2019-10-14 28716, 2019

      • pristine__
        all the queries should have same count
      • 2019-10-14 28719, 2019

      • ruaok
        alastairp: what is your timeframe?
      • 2019-10-14 28731, 2019

      • pristine__
        but they don't have the same count
      • 2019-10-14 28751, 2019

      • pristine__
        I realised after lot of querying that it was beacuse of this
      • 2019-10-14 28714, 2019

      • pristine__
      • 2019-10-14 28720, 2019

      • pristine__
        the last dataframe
      • 2019-10-14 28726, 2019

      • alastairp
        ruaok: I'm free 12-2, morning meeting starts at 10, so likely won't run until 12, but I have another at 2 that I have to be back for
      • 2019-10-14 28743, 2019

      • pristine__
        Messybrainz is case insensitive, it was unknown to me
      • 2019-10-14 28748, 2019

      • pristine__
        now, the concern is
      • 2019-10-14 28757, 2019

      • pristine__
        a sec
      • 2019-10-14 28703, 2019

      • alastairp
        it's only 15 minutes for me on the bike between offices, so I think we should still have plenty of time
      • 2019-10-14 28740, 2019

      • pristine__
      • 2019-10-14 28743, 2019

      • pristine__
        this
      • 2019-10-14 28756, 2019

      • pristine__
        rn , we use this query to get recordings_df
      • 2019-10-14 28719, 2019

      • pristine__
        basically distinct recording_msids and corresponding col
      • 2019-10-14 28754, 2019

      • pristine__
        in this query we add a recording_id col
      • 2019-10-14 28703, 2019

      • pristine__
        which is supposed to be a primary key.
      • 2019-10-14 28725, 2019

      • pristine__
        but now when I realise that Messybrainz is not case sensitive
      • 2019-10-14 28731, 2019

      • pristine__
        I queried the data
      • 2019-10-14 28732, 2019

      • pristine__
        and
      • 2019-10-14 28705, 2019

      • pristine__
        with 182986 listens
      • 2019-10-14 28725, 2019

      • ruaok
        12:15 at the office then, alastairp ?
      • 2019-10-14 28727, 2019

      • pristine__
        we should have 119929 distinct recording_msids
      • 2019-10-14 28755, 2019

      • pristine__
        but we have 129124 distinct recording_msids
      • 2019-10-14 28708, 2019

      • alastairp
        ruaok: sounds good. thanks
      • 2019-10-14 28711, 2019

      • pristine__
        so aorund 600 are duplicate
      • 2019-10-14 28723, 2019

      • pristine__
        it is because of the query we use
      • 2019-10-14 28744, 2019

      • pristine__
        the ideal way is to only fetch the distinct recording_msids
      • 2019-10-14 28749, 2019

      • pristine__
        and then perform a join
      • 2019-10-14 28758, 2019

      • pristine__
        which can be expensive
      • 2019-10-14 28705, 2019

      • ruaok starts digesting pristine__'s thread
      • 2019-10-14 28705, 2019

      • pristine__
        as I see....
      • 2019-10-14 28729, 2019

      • pristine__
        Spark needs lot of optimization in joins
      • 2019-10-14 28740, 2019

      • pristine__
        but
      • 2019-10-14 28704, 2019

      • pristine__
        primary key has to be distinct, according to basic norms of dbms...
      • 2019-10-14 28722, 2019

      • pristine__
        so I think we should do something about it
      • 2019-10-14 28731, 2019

      • pristine__
        we can optimize the join later
      • 2019-10-14 28749, 2019

      • pristine__
        but data accuracy is a must
      • 2019-10-14 28752, 2019

      • pristine__
        .
      • 2019-10-14 28720, 2019

      • ruaok
        so, lots of things to address here.
      • 2019-10-14 28725, 2019

      • pristine__
        :)
      • 2019-10-14 28752, 2019

      • ruaok
        first off, messybrainz is intended to be messy. it has tons of duplicates and loads of crap.
      • 2019-10-14 28738, 2019

      • ruaok
        which is why it is key to used MBIDs, rather than MSIDs for the collab filtering stuff -- as we discussed over the summer.
      • 2019-10-14 28700, 2019

      • pristine__
        we will need the mapping for that. true
      • 2019-10-14 28705, 2019

      • ruaok
        which is why the MSID <-> MBID mapping is so important.
      • 2019-10-14 28735, 2019

      • ruaok
        and you were given the instructions to proceed with these flimsy assumptions, hoping that we would have a mapping before too long that would allow us to swap out the MSIDs for MBIDs.
      • 2019-10-14 28724, 2019

      • ruaok
        so, how does that change this conversation?
      • 2019-10-14 28719, 2019

      • ruaok
        instead of worrying about querying MSIDs in the best way possible, we should instead think about how to translate from MSID -> MBID as one of the precursor steps to the CF algorithm.
      • 2019-10-14 28722, 2019

      • pristine__
        I know. Just wanted to tell you what all we need to look upon. We discussed the that we need a mapping, yup, so here are the cases that are a part of it. Thought it would be good to tell you the specifics :)
      • 2019-10-14 28746, 2019

      • ruaok
        thank you for that -- those specifics give me nightmares. :)
      • 2019-10-14 28759, 2019

      • pristine__
        oh. sorry :)
      • 2019-10-14 28706, 2019

      • pristine__
        umm...
      • 2019-10-14 28712, 2019

      • pristine__
        so ya...the mapping
      • 2019-10-14 28715, 2019

      • ruaok
        not your fault -- this just the nature of messybrainz. :)
      • 2019-10-14 28731, 2019

      • alastairp
        out of interest, how much data in listenbrainz comes with some other id (spotify or mbid?)
      • 2019-10-14 28732, 2019

      • ruaok
        what I can do is work on a mapping this week that would be very... preliminary.
      • 2019-10-14 28744, 2019

      • pristine__
        once we have it, I can start on it.
      • 2019-10-14 28746, 2019

      • ruaok
        alastairp: I don't have an answer for you yet.
      • 2019-10-14 28750, 2019

      • pristine__
        no hurry :)
      • 2019-10-14 28753, 2019

      • ruaok
        pristine__: exactly.
      • 2019-10-14 28709, 2019

      • ruaok
        so, for now assume that you will get three tables:
      • 2019-10-14 28728, 2019

      • ruaok
        msid_mbid_artist_mapping (msid, mbid)
      • 2019-10-14 28736, 2019

      • pristine__
        I was just querying data to understand the specifics. MBIDs are too less. yeah
      • 2019-10-14 28740, 2019

      • ruaok
        msid_mbid_recording_mapping (msid, mbid)
      • 2019-10-14 28752, 2019

      • ruaok
        and then one for release, I think.
      • 2019-10-14 28704, 2019

      • pristine__
        yup
      • 2019-10-14 28727, 2019

      • pristine__
        sounds good.
      • 2019-10-14 28728, 2019

      • ruaok
        so, can you start working on a new translation layer as another pre-process step to the CF alg?
      • 2019-10-14 28712, 2019

      • pristine__
        ummm.....yes. I was working on some cleaning stuff so that new people don't have a hard time, but yeah can start with it
      • 2019-10-14 28744, 2019

      • pristine__
        and hopefully will start with the tests soon :)
      • 2019-10-14 28749, 2019

      • pristine__
        so..
      • 2019-10-14 28734, 2019

      • pristine__
        I won't open any PR on correcting the primary key stuff
      • 2019-10-14 28748, 2019

      • pristine__
        mapping will solve the problem :)
      • 2019-10-14 28733, 2019

      • ruaok
        well, it will make things worse before it makes them better, but yes this is the right path forward.
      • 2019-10-14 28757, 2019

      • pristine__
        lol. I understand.
      • 2019-10-14 28716, 2019

      • pristine__
        but the mapping can and would give us better results
      • 2019-10-14 28744, 2019

      • pristine__
        like for instance we will not be using *artist-name* anymore.
      • 2019-10-14 28748, 2019

      • pristine__
        so ya....nice
      • 2019-10-14 28753, 2019

      • ruaok
        it *should*. but the mapping will be sparse initially, so fewer results.
      • 2019-10-14 28712, 2019

      • ruaok
        yes, exactly. something else, not your doing, will use artist names.
      • 2019-10-14 28726, 2019

      • ruaok
        and the blame will go on my shoulder. something that reosarevok can pick on me for the next decade. :)
      • 2019-10-14 28757, 2019

      • pristine__
        If we get *nice* fewer results, that will like a very good news, will pave path for the rest of it.
      • 2019-10-14 28700, 2019

      • pristine__
        lol
      • 2019-10-14 28711, 2019

      • reosarevok
        Yaaaaaay
      • 2019-10-14 28736, 2019

      • zas
        I'm upgrading discourse