#metabrainz

/

      • rdswift
        phibb, is this on a windows system?
      • phibs
        yeah
      • rdswift
        I'm not 100% sure, but perhaps Picard is chaning the elipses into three periods before it gets into the renaming.
      • You might try either moving the replacement to a script in the scripting section rather than the renaming section, or
      • phibs
        I tried those too :(
      • yeah this is from the scripting section
      • rdswift
        try replacing three periods rather than the elipses.
      • phibs
        yeah it did not match those either
      • regular or regex
      • rdswift
        Well, I'm out of ideas. Sorry I couldn't help.
      • phibs
        np
      • rdswift
        Is it showing up as an elipses in the filename on Windows?
      • or as three periods?
      • I wonder if there is more than character code that displays as an elipses, and you're replacing a different code than is in the string? That's a "Hail Mary" for sure. ;-)
      • phibs
        loops like elipses
      • esp when I copy/paste
      • Yeah i'm trying to just remove or replace w/ space
      • it doesn't detect the search pattern
      • chaban has quit
      • chaban joined the channel
      • kepstin
        phibs: file naming pattern can't change tags, so that doesn't make sense in the file naming pattern
      • what you could do is take the %title% in your file naming pattern and turn it into $replace(%title%,…,wat) maybe?
      • or even just wrap the entire file naming pattern in a $replace call (just put the whole existing file naming pattern as the first arg)
      • pristine__
        Moin
      • phibs
        kepstin: I mean it definitely works if I do simple works
      • words *
      • the -> something else
      • so it should be able to strip the elipses
      • Rotab has quit
      • P23 has quit
      • Pac23 joined the channel
      • yvanzo
        mo’’in’
      • outsidecontext
        phibs: The ellipsis replacement definitely works for me with your script and e.g. the second track on https://musicbrainz.org/release/da53e497-8c61-4... . But a few ideas:
      • - Options > Metadata > "Convert Unicode punctuation characters to ASCII" might be enabled
      • - Is some other script doing a conversion?
      • - Any plugin that might change the title?
      • Mineo
        also, if you do `$replace(%title%, …, wat)`, the extra space before `…` _does_ matter
      • Pac23 has quit
      • adhawkins has quit
      • Pac23 joined the channel
      • adhawkins joined the channel
      • Gazooo has quit
      • Gazooo joined the channel
      • travis-ci joined the channel
      • travis-ci
        [picard:master@e16c7c4 - build #150] CI passed! (https://travis-ci.org/phw/picard/builds/597527216)
      • travis-ci has left the channel
      • travis-ci joined the channel
      • metabrainz/picard#5113 (picard-2.2.3 - 4cfa625 : Philipp Wolfer): The build passed.
      • travis-ci has left the channel
      • ruaok
        moooin!
      • pristine__: I've got some time to chat, finally.
      • pristine__
        Thanks
      • A sec
      • alastairp
        hi ruaok
      • pristine__
        Hi alastairp
      • alastairp
        to confirm - did we set a time tomorrow? I only have a small window of about 2h
      • hi pristine__
      • I saw that you wanted to talk with me - I'll be free on Thursday and Friday
      • pristine__
        alastairp: it is the same discussion i wanted to have with ruaok for like 5 min, but yeah we can have a follow up whenever you are free :)
      • alastairp
        I'll keep an eye on the conversation and make any comments if I have time
      • pristine__
      • ruaok got distraced
      • ruaok
        gimme 2 minutes.
      • pristine__
        all the queries should have same count
      • ruaok
        alastairp: what is your timeframe?
      • pristine__
        but they don't have the same count
      • I realised after lot of querying that it was beacuse of this
      • the last dataframe
      • alastairp
        ruaok: I'm free 12-2, morning meeting starts at 10, so likely won't run until 12, but I have another at 2 that I have to be back for
      • pristine__
        Messybrainz is case insensitive, it was unknown to me
      • now, the concern is
      • a sec
      • alastairp
        it's only 15 minutes for me on the bike between offices, so I think we should still have plenty of time
      • pristine__
      • this
      • rn , we use this query to get recordings_df
      • basically distinct recording_msids and corresponding col
      • in this query we add a recording_id col
      • which is supposed to be a primary key.
      • but now when I realise that Messybrainz is not case sensitive
      • I queried the data
      • and
      • with 182986 listens
      • ruaok
        12:15 at the office then, alastairp ?
      • pristine__
        we should have 119929 distinct recording_msids
      • but we have 129124 distinct recording_msids
      • alastairp
        ruaok: sounds good. thanks
      • pristine__
        so aorund 600 are duplicate
      • it is because of the query we use
      • the ideal way is to only fetch the distinct recording_msids
      • and then perform a join
      • which can be expensive
      • ruaok starts digesting pristine__'s thread
      • as I see....
      • Spark needs lot of optimization in joins
      • but
      • primary key has to be distinct, according to basic norms of dbms...
      • so I think we should do something about it
      • we can optimize the join later
      • but data accuracy is a must
      • .
      • ruaok
        so, lots of things to address here.
      • pristine__
        :)
      • ruaok
        first off, messybrainz is intended to be messy. it has tons of duplicates and loads of crap.
      • which is why it is key to used MBIDs, rather than MSIDs for the collab filtering stuff -- as we discussed over the summer.
      • pristine__
        we will need the mapping for that. true
      • ruaok
        which is why the MSID <-> MBID mapping is so important.
      • and you were given the instructions to proceed with these flimsy assumptions, hoping that we would have a mapping before too long that would allow us to swap out the MSIDs for MBIDs.
      • so, how does that change this conversation?
      • instead of worrying about querying MSIDs in the best way possible, we should instead think about how to translate from MSID -> MBID as one of the precursor steps to the CF algorithm.
      • pristine__
        I know. Just wanted to tell you what all we need to look upon. We discussed the that we need a mapping, yup, so here are the cases that are a part of it. Thought it would be good to tell you the specifics :)
      • ruaok
        thank you for that -- those specifics give me nightmares. :)
      • pristine__
        oh. sorry :)
      • umm...
      • so ya...the mapping
      • ruaok
        not your fault -- this just the nature of messybrainz. :)
      • alastairp
        out of interest, how much data in listenbrainz comes with some other id (spotify or mbid?)
      • ruaok
        what I can do is work on a mapping this week that would be very... preliminary.
      • pristine__
        once we have it, I can start on it.
      • ruaok
        alastairp: I don't have an answer for you yet.
      • pristine__
        no hurry :)
      • ruaok
        pristine__: exactly.
      • so, for now assume that you will get three tables:
      • msid_mbid_artist_mapping (msid, mbid)
      • pristine__
        I was just querying data to understand the specifics. MBIDs are too less. yeah
      • ruaok
        msid_mbid_recording_mapping (msid, mbid)
      • and then one for release, I think.
      • pristine__
        yup
      • sounds good.
      • ruaok
        so, can you start working on a new translation layer as another pre-process step to the CF alg?
      • pristine__
        ummm.....yes. I was working on some cleaning stuff so that new people don't have a hard time, but yeah can start with it
      • and hopefully will start with the tests soon :)
      • so..
      • I won't open any PR on correcting the primary key stuff
      • mapping will solve the problem :)
      • ruaok
        well, it will make things worse before it makes them better, but yes this is the right path forward.
      • pristine__
        lol. I understand.
      • but the mapping can and would give us better results
      • like for instance we will not be using *artist-name* anymore.
      • so ya....nice
      • ruaok
        it *should*. but the mapping will be sparse initially, so fewer results.
      • yes, exactly. something else, not your doing, will use artist names.
      • and the blame will go on my shoulder. something that reosarevok can pick on me for the next decade. :)
      • pristine__
        If we get *nice* fewer results, that will like a very good news, will pave path for the rest of it.
      • lol
      • reosarevok
        Yaaaaaay
      • zas
        I'm upgrading discourse