in #metabrainz

1:26 AM
rdswift

phibb, is this on a windows system?
1:26 AM
phibs

yeah
1:27 AM
rdswift

I'm not 100% sure, but perhaps Picard is chaning the elipses into three periods before it gets into the renaming.
1:28 AM
You might try either moving the replacement to a script in the scripting section rather than the renaming section, or
1:29 AM
phibs

I tried those too :(
1:29 AM
yeah this is from the scripting section
1:29 AM
rdswift

try replacing three periods rather than the elipses.
1:29 AM
phibs

yeah it did not match those either
1:29 AM
regular or regex
1:30 AM
rdswift

Well, I'm out of ideas. Sorry I couldn't help.
1:30 AM
phibs

np
1:30 AM
rdswift

Is it showing up as an elipses in the filename on Windows?
1:31 AM
or as three periods?
1:34 AM
I wonder if there is more than character code that displays as an elipses, and you're replacing a different code than is in the string? That's a "Hail Mary" for sure. ;-)
1:39 AM
phibs

loops like elipses
1:39 AM
esp when I copy/paste
1:39 AM
Yeah i'm trying to just remove or replace w/ space
1:39 AM
it doesn't detect the search pattern
2:48 AM
chaban has quit
2:49 AM
chaban joined the channel
4:18 AM
kepstin

phibs: file naming pattern can't change tags, so that doesn't make sense in the file naming pattern
4:18 AM
what you could do is take the %title% in your file naming pattern and turn it into $replace(%title%,…,wat) maybe?
4:20 AM
or even just wrap the entire file naming pattern in a $replace call (just put the whole existing file naming pattern as the first arg)
4:26 AM
pristine__

Moin
4:37 AM
phibs

kepstin: I mean it definitely works if I do simple works
4:37 AM
words *
4:37 AM
the -> something else
4:37 AM
so it should be able to strip the elipses
5:30 AM
Rotab has quit
5:31 AM
P23 has quit
5:36 AM
Pac23 joined the channel
6:13 AM
yvanzo

mo’’in’
7:35 AM
outsidecontext

phibs: The ellipsis replacement definitely works for me with your script and e.g. the second track on https://musicbrainz.org/release/da53e497-8c61-4... . But a few ideas:
7:37 AM
- Options > Metadata > "Convert Unicode punctuation characters to ASCII" might be enabled
7:37 AM
- Is some other script doing a conversion?
7:37 AM
- Any plugin that might change the title?
7:47 AM
Mineo

also, if you do `$replace(%title%, …, wat)`, the extra space before `…` _does_ matter
8:38 AM
Pac23 has quit
8:39 AM
adhawkins has quit
8:40 AM
Pac23 joined the channel
8:55 AM
adhawkins joined the channel
9:05 AM
Gazooo has quit
9:07 AM
Gazooo joined the channel
9:19 AM
travis-ci joined the channel
9:19 AM
travis-ci

[picard:master@e16c7c4 - build #150] CI passed! (https://travis-ci.org/phw/picard/builds/597527216)
9:19 AM
travis-ci has left the channel
10:19 AM
travis-ci joined the channel
10:19 AM
metabrainz/picard#5113 (picard-2.2.3 - 4cfa625 : Philipp Wolfer): The build passed.
10:19 AM
Change view : https://github.com/metabrainz/picard/compare/f9...
10:19 AM
Build details : https://travis-ci.org/metabrainz/picard/builds/...
10:19 AM
travis-ci has left the channel
10:35 AM
ruaok

moooin!
10:35 AM
pristine__: I've got some time to chat, finally.
10:38 AM
pristine__

Thanks
10:38 AM
A sec
10:39 AM
alastairp

hi ruaok
10:39 AM
pristine__

Hi alastairp
10:39 AM
alastairp

to confirm - did we set a time tomorrow? I only have a small window of about 2h
10:39 AM
hi pristine__
10:39 AM
I saw that you wanted to talk with me - I'll be free on Thursday and Friday
10:40 AM
pristine__

alastairp: it is the same discussion i wanted to have with ruaok for like 5 min, but yeah we can have a follow up whenever you are free :)
10:43 AM
alastairp

I'll keep an eye on the conversation and make any comments if I have time
10:43 AM
pristine__

ruaok: https://gist.github.com/vansika/7ac1acfda613e64...
10:44 AM
ruaok got distraced
10:44 AM
ruaok

gimme 2 minutes.
10:44 AM
pristine__

all the queries should have same count
10:44 AM
ruaok

alastairp: what is your timeframe?
10:44 AM
pristine__

but they don't have the same count
10:44 AM
I realised after lot of querying that it was beacuse of this
10:45 AM
https://gist.github.com/vansika/e811498ad9040bc...
10:45 AM
the last dataframe
10:45 AM
alastairp

ruaok: I'm free 12-2, morning meeting starts at 10, so likely won't run until 12, but I have another at 2 that I have to be back for
10:45 AM
pristine__

Messybrainz is case insensitive, it was unknown to me
10:45 AM
now, the concern is
10:45 AM
a sec
10:46 AM
alastairp

it's only 15 minutes for me on the bike between offices, so I think we should still have plenty of time
10:46 AM
pristine__

https://github.com/metabrainz/listenbrainz-labs...
10:46 AM
this
10:46 AM
rn , we use this query to get recordings_df
10:47 AM
basically distinct recording_msids and corresponding col
10:47 AM
in this query we add a recording_id col
10:48 AM
which is supposed to be a primary key.
10:48 AM
but now when I realise that Messybrainz is not case sensitive
10:48 AM
I queried the data
10:48 AM
and
10:49 AM
with 182986 listens
10:49 AM
ruaok

12:15 at the office then, alastairp ?
10:49 AM
pristine__

we should have 119929 distinct recording_msids
10:49 AM
but we have 129124 distinct recording_msids
10:50 AM
alastairp

ruaok: sounds good. thanks
10:50 AM
pristine__

so aorund 600 are duplicate
10:50 AM
it is because of the query we use
10:50 AM
the ideal way is to only fetch the distinct recording_msids
10:50 AM
and then perform a join
10:50 AM
which can be expensive
10:51 AM
ruaok starts digesting pristine__'s thread
10:51 AM
as I see....
10:51 AM
Spark needs lot of optimization in joins
10:51 AM
but
10:52 AM
primary key has to be distinct, according to basic norms of dbms...
10:52 AM
so I think we should do something about it
10:52 AM
we can optimize the join later
10:52 AM
but data accuracy is a must
10:52 AM
.
10:53 AM
ruaok

so, lots of things to address here.
10:53 AM
pristine__

:)
10:53 AM
ruaok

first off, messybrainz is intended to be messy. it has tons of duplicates and loads of crap.
10:54 AM
which is why it is key to used MBIDs, rather than MSIDs for the collab filtering stuff -- as we discussed over the summer.
10:55 AM
pristine__

we will need the mapping for that. true
10:55 AM
ruaok

which is why the MSID <-> MBID mapping is so important.
10:55 AM
and you were given the instructions to proceed with these flimsy assumptions, hoping that we would have a mapping before too long that would allow us to swap out the MSIDs for MBIDs.
10:56 AM
so, how does that change this conversation?
10:57 AM
instead of worrying about querying MSIDs in the best way possible, we should instead think about how to translate from MSID -> MBID as one of the precursor steps to the CF algorithm.
10:57 AM
pristine__

I know. Just wanted to tell you what all we need to look upon. We discussed the that we need a mapping, yup, so here are the cases that are a part of it. Thought it would be good to tell you the specifics :)
10:57 AM
ruaok

thank you for that -- those specifics give me nightmares. :)
10:57 AM
pristine__

oh. sorry :)
10:58 AM
umm...
10:58 AM
so ya...the mapping
10:58 AM
ruaok

not your fault -- this just the nature of messybrainz. :)
10:58 AM
alastairp

out of interest, how much data in listenbrainz comes with some other id (spotify or mbid?)
10:58 AM
ruaok

what I can do is work on a mapping this week that would be very... preliminary.
10:58 AM
pristine__

once we have it, I can start on it.
10:58 AM
ruaok

alastairp: I don't have an answer for you yet.
10:58 AM
pristine__

no hurry :)
10:58 AM
ruaok

pristine__: exactly.
10:59 AM
so, for now assume that you will get three tables:
10:59 AM
msid_mbid_artist_mapping (msid, mbid)
10:59 AM
pristine__

I was just querying data to understand the specifics. MBIDs are too less. yeah
10:59 AM
ruaok

msid_mbid_recording_mapping (msid, mbid)
10:59 AM
and then one for release, I think.
11:00 AM
pristine__

yup
11:00 AM
sounds good.
11:00 AM
ruaok

so, can you start working on a new translation layer as another pre-process step to the CF alg?
11:01 AM
pristine__

ummm.....yes. I was working on some cleaning stuff so that new people don't have a hard time, but yeah can start with it
11:01 AM
and hopefully will start with the tests soon :)
11:01 AM
so..
11:02 AM
I won't open any PR on correcting the primary key stuff
11:02 AM
mapping will solve the problem :)
11:03 AM
ruaok

well, it will make things worse before it makes them better, but yes this is the right path forward.
11:03 AM
pristine__

lol. I understand.
11:04 AM
but the mapping can and would give us better results
11:04 AM
like for instance we will not be using *artist-name* anymore.
11:04 AM
so ya....nice
11:04 AM
ruaok

it *should*. but the mapping will be sparse initially, so fewer results.
11:05 AM
yes, exactly. something else, not your doing, will use artist names.
11:05 AM
and the blame will go on my shoulder. something that reosarevok can pick on me for the next decade. :)
11:05 AM
pristine__

If we get *nice* fewer results, that will like a very good news, will pave path for the rest of it.
11:06 AM
lol
11:06 AM
reosarevok

Yaaaaaay
11:06 AM
zas

I'm upgrading discourse