I'm not 100% sure, but perhaps Picard is chaning the elipses into three periods before it gets into the renaming.
2019-10-14 28743, 2019
rdswift
You might try either moving the replacement to a script in the scripting section rather than the renaming section, or
2019-10-14 28703, 2019
phibs
I tried those too :(
2019-10-14 28709, 2019
phibs
yeah this is from the scripting section
2019-10-14 28725, 2019
rdswift
try replacing three periods rather than the elipses.
2019-10-14 28733, 2019
phibs
yeah it did not match those either
2019-10-14 28736, 2019
phibs
regular or regex
2019-10-14 28713, 2019
rdswift
Well, I'm out of ideas. Sorry I couldn't help.
2019-10-14 28742, 2019
phibs
np
2019-10-14 28757, 2019
rdswift
Is it showing up as an elipses in the filename on Windows?
2019-10-14 28749, 2019
rdswift
or as three periods?
2019-10-14 28717, 2019
rdswift
I wonder if there is more than character code that displays as an elipses, and you're replacing a different code than is in the string? That's a "Hail Mary" for sure. ;-)
2019-10-14 28709, 2019
phibs
loops like elipses
2019-10-14 28711, 2019
phibs
esp when I copy/paste
2019-10-14 28728, 2019
phibs
Yeah i'm trying to just remove or replace w/ space
2019-10-14 28734, 2019
phibs
it doesn't detect the search pattern
2019-10-14 28729, 2019
chaban has quit
2019-10-14 28749, 2019
chaban joined the channel
2019-10-14 28724, 2019
kepstin
phibs: file naming pattern can't change tags, so that doesn't make sense in the file naming pattern
2019-10-14 28755, 2019
kepstin
what you could do is take the %title% in your file naming pattern and turn it into $replace(%title%,…,wat) maybe?
2019-10-14 28733, 2019
kepstin
or even just wrap the entire file naming pattern in a $replace call (just put the whole existing file naming pattern as the first arg)
2019-10-14 28702, 2019
pristine__
Moin
2019-10-14 28710, 2019
phibs
kepstin: I mean it definitely works if I do simple works
basically distinct recording_msids and corresponding col
2019-10-14 28754, 2019
pristine__
in this query we add a recording_id col
2019-10-14 28703, 2019
pristine__
which is supposed to be a primary key.
2019-10-14 28725, 2019
pristine__
but now when I realise that Messybrainz is not case sensitive
2019-10-14 28731, 2019
pristine__
I queried the data
2019-10-14 28732, 2019
pristine__
and
2019-10-14 28705, 2019
pristine__
with 182986 listens
2019-10-14 28725, 2019
ruaok
12:15 at the office then, alastairp ?
2019-10-14 28727, 2019
pristine__
we should have 119929 distinct recording_msids
2019-10-14 28755, 2019
pristine__
but we have 129124 distinct recording_msids
2019-10-14 28708, 2019
alastairp
ruaok: sounds good. thanks
2019-10-14 28711, 2019
pristine__
so aorund 600 are duplicate
2019-10-14 28723, 2019
pristine__
it is because of the query we use
2019-10-14 28744, 2019
pristine__
the ideal way is to only fetch the distinct recording_msids
2019-10-14 28749, 2019
pristine__
and then perform a join
2019-10-14 28758, 2019
pristine__
which can be expensive
2019-10-14 28705, 2019
ruaok starts digesting pristine__'s thread
2019-10-14 28705, 2019
pristine__
as I see....
2019-10-14 28729, 2019
pristine__
Spark needs lot of optimization in joins
2019-10-14 28740, 2019
pristine__
but
2019-10-14 28704, 2019
pristine__
primary key has to be distinct, according to basic norms of dbms...
2019-10-14 28722, 2019
pristine__
so I think we should do something about it
2019-10-14 28731, 2019
pristine__
we can optimize the join later
2019-10-14 28749, 2019
pristine__
but data accuracy is a must
2019-10-14 28752, 2019
pristine__
.
2019-10-14 28720, 2019
ruaok
so, lots of things to address here.
2019-10-14 28725, 2019
pristine__
:)
2019-10-14 28752, 2019
ruaok
first off, messybrainz is intended to be messy. it has tons of duplicates and loads of crap.
2019-10-14 28738, 2019
ruaok
which is why it is key to used MBIDs, rather than MSIDs for the collab filtering stuff -- as we discussed over the summer.
2019-10-14 28700, 2019
pristine__
we will need the mapping for that. true
2019-10-14 28705, 2019
ruaok
which is why the MSID <-> MBID mapping is so important.
2019-10-14 28735, 2019
ruaok
and you were given the instructions to proceed with these flimsy assumptions, hoping that we would have a mapping before too long that would allow us to swap out the MSIDs for MBIDs.
2019-10-14 28724, 2019
ruaok
so, how does that change this conversation?
2019-10-14 28719, 2019
ruaok
instead of worrying about querying MSIDs in the best way possible, we should instead think about how to translate from MSID -> MBID as one of the precursor steps to the CF algorithm.
2019-10-14 28722, 2019
pristine__
I know. Just wanted to tell you what all we need to look upon. We discussed the that we need a mapping, yup, so here are the cases that are a part of it. Thought it would be good to tell you the specifics :)
2019-10-14 28746, 2019
ruaok
thank you for that -- those specifics give me nightmares. :)
2019-10-14 28759, 2019
pristine__
oh. sorry :)
2019-10-14 28706, 2019
pristine__
umm...
2019-10-14 28712, 2019
pristine__
so ya...the mapping
2019-10-14 28715, 2019
ruaok
not your fault -- this just the nature of messybrainz. :)
2019-10-14 28731, 2019
alastairp
out of interest, how much data in listenbrainz comes with some other id (spotify or mbid?)
2019-10-14 28732, 2019
ruaok
what I can do is work on a mapping this week that would be very... preliminary.
2019-10-14 28744, 2019
pristine__
once we have it, I can start on it.
2019-10-14 28746, 2019
ruaok
alastairp: I don't have an answer for you yet.
2019-10-14 28750, 2019
pristine__
no hurry :)
2019-10-14 28753, 2019
ruaok
pristine__: exactly.
2019-10-14 28709, 2019
ruaok
so, for now assume that you will get three tables:
2019-10-14 28728, 2019
ruaok
msid_mbid_artist_mapping (msid, mbid)
2019-10-14 28736, 2019
pristine__
I was just querying data to understand the specifics. MBIDs are too less. yeah
2019-10-14 28740, 2019
ruaok
msid_mbid_recording_mapping (msid, mbid)
2019-10-14 28752, 2019
ruaok
and then one for release, I think.
2019-10-14 28704, 2019
pristine__
yup
2019-10-14 28727, 2019
pristine__
sounds good.
2019-10-14 28728, 2019
ruaok
so, can you start working on a new translation layer as another pre-process step to the CF alg?
2019-10-14 28712, 2019
pristine__
ummm.....yes. I was working on some cleaning stuff so that new people don't have a hard time, but yeah can start with it
2019-10-14 28744, 2019
pristine__
and hopefully will start with the tests soon :)
2019-10-14 28749, 2019
pristine__
so..
2019-10-14 28734, 2019
pristine__
I won't open any PR on correcting the primary key stuff
2019-10-14 28748, 2019
pristine__
mapping will solve the problem :)
2019-10-14 28733, 2019
ruaok
well, it will make things worse before it makes them better, but yes this is the right path forward.
2019-10-14 28757, 2019
pristine__
lol. I understand.
2019-10-14 28716, 2019
pristine__
but the mapping can and would give us better results
2019-10-14 28744, 2019
pristine__
like for instance we will not be using *artist-name* anymore.
2019-10-14 28748, 2019
pristine__
so ya....nice
2019-10-14 28753, 2019
ruaok
it *should*. but the mapping will be sparse initially, so fewer results.
2019-10-14 28712, 2019
ruaok
yes, exactly. something else, not your doing, will use artist names.
2019-10-14 28726, 2019
ruaok
and the blame will go on my shoulder. something that reosarevok can pick on me for the next decade. :)
2019-10-14 28757, 2019
pristine__
If we get *nice* fewer results, that will like a very good news, will pave path for the rest of it.