I'm not 100% sure, but perhaps Picard is chaning the elipses into three periods before it gets into the renaming.
You might try either moving the replacement to a script in the scripting section rather than the renaming section, or
phibs
I tried those too :(
yeah this is from the scripting section
rdswift
try replacing three periods rather than the elipses.
phibs
yeah it did not match those either
regular or regex
rdswift
Well, I'm out of ideas. Sorry I couldn't help.
phibs
np
rdswift
Is it showing up as an elipses in the filename on Windows?
or as three periods?
I wonder if there is more than character code that displays as an elipses, and you're replacing a different code than is in the string? That's a "Hail Mary" for sure. ;-)
phibs
loops like elipses
esp when I copy/paste
Yeah i'm trying to just remove or replace w/ space
it doesn't detect the search pattern
chaban has quit
chaban joined the channel
kepstin
phibs: file naming pattern can't change tags, so that doesn't make sense in the file naming pattern
what you could do is take the %title% in your file naming pattern and turn it into $replace(%title%,…,wat) maybe?
or even just wrap the entire file naming pattern in a $replace call (just put the whole existing file naming pattern as the first arg)
pristine__
Moin
phibs
kepstin: I mean it definitely works if I do simple works
basically distinct recording_msids and corresponding col
in this query we add a recording_id col
which is supposed to be a primary key.
but now when I realise that Messybrainz is not case sensitive
I queried the data
and
with 182986 listens
ruaok
12:15 at the office then, alastairp ?
pristine__
we should have 119929 distinct recording_msids
but we have 129124 distinct recording_msids
alastairp
ruaok: sounds good. thanks
pristine__
so aorund 600 are duplicate
it is because of the query we use
the ideal way is to only fetch the distinct recording_msids
and then perform a join
which can be expensive
ruaok starts digesting pristine__'s thread
as I see....
Spark needs lot of optimization in joins
but
primary key has to be distinct, according to basic norms of dbms...
so I think we should do something about it
we can optimize the join later
but data accuracy is a must
.
ruaok
so, lots of things to address here.
pristine__
:)
ruaok
first off, messybrainz is intended to be messy. it has tons of duplicates and loads of crap.
which is why it is key to used MBIDs, rather than MSIDs for the collab filtering stuff -- as we discussed over the summer.
pristine__
we will need the mapping for that. true
ruaok
which is why the MSID <-> MBID mapping is so important.
and you were given the instructions to proceed with these flimsy assumptions, hoping that we would have a mapping before too long that would allow us to swap out the MSIDs for MBIDs.
so, how does that change this conversation?
instead of worrying about querying MSIDs in the best way possible, we should instead think about how to translate from MSID -> MBID as one of the precursor steps to the CF algorithm.
pristine__
I know. Just wanted to tell you what all we need to look upon. We discussed the that we need a mapping, yup, so here are the cases that are a part of it. Thought it would be good to tell you the specifics :)
ruaok
thank you for that -- those specifics give me nightmares. :)
pristine__
oh. sorry :)
umm...
so ya...the mapping
ruaok
not your fault -- this just the nature of messybrainz. :)
alastairp
out of interest, how much data in listenbrainz comes with some other id (spotify or mbid?)
ruaok
what I can do is work on a mapping this week that would be very... preliminary.
pristine__
once we have it, I can start on it.
ruaok
alastairp: I don't have an answer for you yet.
pristine__
no hurry :)
ruaok
pristine__: exactly.
so, for now assume that you will get three tables:
msid_mbid_artist_mapping (msid, mbid)
pristine__
I was just querying data to understand the specifics. MBIDs are too less. yeah
ruaok
msid_mbid_recording_mapping (msid, mbid)
and then one for release, I think.
pristine__
yup
sounds good.
ruaok
so, can you start working on a new translation layer as another pre-process step to the CF alg?
pristine__
ummm.....yes. I was working on some cleaning stuff so that new people don't have a hard time, but yeah can start with it
and hopefully will start with the tests soon :)
so..
I won't open any PR on correcting the primary key stuff
mapping will solve the problem :)
ruaok
well, it will make things worse before it makes them better, but yes this is the right path forward.
pristine__
lol. I understand.
but the mapping can and would give us better results
like for instance we will not be using *artist-name* anymore.
so ya....nice
ruaok
it *should*. but the mapping will be sparse initially, so fewer results.
yes, exactly. something else, not your doing, will use artist names.
and the blame will go on my shoulder. something that reosarevok can pick on me for the next decade. :)
pristine__
If we get *nice* fewer results, that will like a very good news, will pave path for the rest of it.