#metabrainz

/

1:26 AM
rdswift

phibb, is this on a windows system?

2019-10-14 28747, 2019

1:26 AM
phibs

yeah

2019-10-14 28758, 2019

1:27 AM
rdswift

I'm not 100% sure, but perhaps Picard is chaning the elipses into three periods before it gets into the renaming.

2019-10-14 28743, 2019

1:28 AM
rdswift

You might try either moving the replacement to a script in the scripting section rather than the renaming section, or

2019-10-14 28703, 2019

1:29 AM
phibs

I tried those too :(

2019-10-14 28709, 2019

1:29 AM
phibs

yeah this is from the scripting section

2019-10-14 28725, 2019

1:29 AM
rdswift

try replacing three periods rather than the elipses.

2019-10-14 28733, 2019

1:29 AM
phibs

yeah it did not match those either

2019-10-14 28736, 2019

1:29 AM
phibs

regular or regex

2019-10-14 28713, 2019

1:30 AM
rdswift

Well, I'm out of ideas. Sorry I couldn't help.

2019-10-14 28742, 2019

1:30 AM
phibs

np

2019-10-14 28757, 2019

1:30 AM
rdswift

Is it showing up as an elipses in the filename on Windows?

2019-10-14 28749, 2019

1:31 AM
rdswift

or as three periods?

2019-10-14 28717, 2019

1:34 AM
rdswift

I wonder if there is more than character code that displays as an elipses, and you're replacing a different code than is in the string? That's a "Hail Mary" for sure. ;-)

2019-10-14 28709, 2019

1:39 AM
phibs

loops like elipses

2019-10-14 28711, 2019

1:39 AM
phibs

esp when I copy/paste

2019-10-14 28728, 2019

1:39 AM
phibs

Yeah i'm trying to just remove or replace w/ space

2019-10-14 28734, 2019

1:39 AM
phibs

it doesn't detect the search pattern

2019-10-14 28729, 2019

2:48 AM
chaban has quit

2019-10-14 28749, 2019

2:49 AM
chaban joined the channel

2019-10-14 28724, 2019

4:18 AM
kepstin

phibs: file naming pattern can't change tags, so that doesn't make sense in the file naming pattern

2019-10-14 28755, 2019

4:18 AM
kepstin

what you could do is take the %title% in your file naming pattern and turn it into $replace(%title%,…,wat) maybe?

2019-10-14 28733, 2019

4:20 AM
kepstin

or even just wrap the entire file naming pattern in a $replace call (just put the whole existing file naming pattern as the first arg)

2019-10-14 28702, 2019

4:26 AM
pristine__

Moin

2019-10-14 28710, 2019

4:37 AM
phibs

kepstin: I mean it definitely works if I do simple works

2019-10-14 28712, 2019

4:37 AM
phibs

words *

2019-10-14 28714, 2019

4:37 AM
phibs

the -> something else

2019-10-14 28718, 2019

4:37 AM
phibs

so it should be able to strip the elipses

2019-10-14 28759, 2019

5:30 AM
Rotab has quit

2019-10-14 28730, 2019

5:31 AM
P23 has quit

2019-10-14 28719, 2019

5:36 AM
Pac23 joined the channel

2019-10-14 28705, 2019

6:13 AM
yvanzo

mo’’in’

2019-10-14 28704, 2019

7:35 AM
outsidecontext

phibs: The ellipsis replacement definitely works for me with your script and e.g. the second track on https://musicbrainz.org/release/da53e497-8c61-4d4… . But a few ideas:

2019-10-14 28734, 2019

7:37 AM
outsidecontext

- Options > Metadata > "Convert Unicode punctuation characters to ASCII" might be enabled

2019-10-14 28734, 2019

7:37 AM
outsidecontext

- Is some other script doing a conversion?

2019-10-14 28734, 2019

7:37 AM
outsidecontext

- Any plugin that might change the title?

2019-10-14 28700, 2019

7:47 AM
Mineo

also, if you do `$replace(%title%, …, wat)`, the extra space before `…` _does_ matter

2019-10-14 28745, 2019

8:38 AM
Pac23 has quit

2019-10-14 28757, 2019

8:39 AM
adhawkins has quit

2019-10-14 28755, 2019

8:40 AM
Pac23 joined the channel

2019-10-14 28741, 2019

8:55 AM
adhawkins joined the channel

2019-10-14 28701, 2019

9:05 AM
Gazooo has quit

2019-10-14 28746, 2019

9:07 AM
Gazooo joined the channel

2019-10-14 28718, 2019

9:19 AM
travis-ci joined the channel

2019-10-14 28718, 2019

9:19 AM
travis-ci

[picard:master@e16c7c4 - build #150] CI passed! (https://travis-ci.org/phw/picard/builds/597527216)

2019-10-14 28718, 2019

9:19 AM
travis-ci has left the channel

2019-10-14 28751, 2019

10:19 AM
travis-ci joined the channel

2019-10-14 28751, 2019

10:19 AM
travis-ci

metabrainz/picard#5113 (picard-2.2.3 - 4cfa625 : Philipp Wolfer): The build passed.

2019-10-14 28751, 2019

10:19 AM
travis-ci

Change view : https://github.com/metabrainz/picard/compare/f903…

2019-10-14 28751, 2019

10:19 AM
travis-ci

Build details : https://travis-ci.org/metabrainz/picard/builds/59…

2019-10-14 28751, 2019

10:19 AM
travis-ci has left the channel

2019-10-14 28738, 2019

10:35 AM
ruaok

moooin!

2019-10-14 28745, 2019

10:35 AM
ruaok

pristine__: I've got some time to chat, finally.

2019-10-14 28738, 2019

10:38 AM
pristine__

Thanks

2019-10-14 28748, 2019

10:38 AM
pristine__

A sec

2019-10-14 28701, 2019

10:39 AM
alastairp

hi ruaok

2019-10-14 28716, 2019

10:39 AM
pristine__

Hi alastairp

2019-10-14 28726, 2019

10:39 AM
alastairp

to confirm - did we set a time tomorrow? I only have a small window of about 2h

2019-10-14 28732, 2019

10:39 AM
alastairp

hi pristine__

2019-10-14 28744, 2019

10:39 AM
alastairp

I saw that you wanted to talk with me - I'll be free on Thursday and Friday

2019-10-14 28755, 2019

10:40 AM
pristine__

alastairp: it is the same discussion i wanted to have with ruaok for like 5 min, but yeah we can have a follow up whenever you are free :)

2019-10-14 28709, 2019

10:43 AM
alastairp

I'll keep an eye on the conversation and make any comments if I have time

2019-10-14 28754, 2019

10:43 AM
pristine__

ruaok: https://gist.github.com/vansika/7ac1acfda613e6425…

2019-10-14 28706, 2019

10:44 AM
ruaok got distraced

2019-10-14 28709, 2019

10:44 AM
ruaok

gimme 2 minutes.

2019-10-14 28716, 2019

10:44 AM
pristine__

all the queries should have same count

2019-10-14 28719, 2019

10:44 AM
ruaok

alastairp: what is your timeframe?

2019-10-14 28731, 2019

10:44 AM
pristine__

but they don't have the same count

2019-10-14 28751, 2019

10:44 AM
pristine__

I realised after lot of querying that it was beacuse of this

2019-10-14 28714, 2019

10:45 AM
pristine__

https://gist.github.com/vansika/e811498ad9040bc21…

2019-10-14 28720, 2019

10:45 AM
pristine__

the last dataframe

2019-10-14 28726, 2019

10:45 AM
alastairp

ruaok: I'm free 12-2, morning meeting starts at 10, so likely won't run until 12, but I have another at 2 that I have to be back for

2019-10-14 28743, 2019

10:45 AM
pristine__

Messybrainz is case insensitive, it was unknown to me

2019-10-14 28748, 2019

10:45 AM
pristine__

now, the concern is

2019-10-14 28757, 2019

10:45 AM
pristine__

a sec

2019-10-14 28703, 2019

10:46 AM
alastairp

it's only 15 minutes for me on the bike between offices, so I think we should still have plenty of time

2019-10-14 28740, 2019

10:46 AM
pristine__

https://github.com/metabrainz/listenbrainz-labs/b…

2019-10-14 28743, 2019

10:46 AM
pristine__

this

2019-10-14 28756, 2019

10:46 AM
pristine__

rn , we use this query to get recordings_df

2019-10-14 28719, 2019

10:47 AM
pristine__

basically distinct recording_msids and corresponding col

2019-10-14 28754, 2019

10:47 AM
pristine__

in this query we add a recording_id col

2019-10-14 28703, 2019

10:48 AM
pristine__

which is supposed to be a primary key.

2019-10-14 28725, 2019

10:48 AM
pristine__

but now when I realise that Messybrainz is not case sensitive

2019-10-14 28731, 2019

10:48 AM
pristine__

I queried the data

2019-10-14 28732, 2019

10:48 AM
pristine__

and

2019-10-14 28705, 2019

10:49 AM
pristine__

with 182986 listens

2019-10-14 28725, 2019

10:49 AM
ruaok

12:15 at the office then, alastairp ?

2019-10-14 28727, 2019

10:49 AM
pristine__

we should have 119929 distinct recording_msids

2019-10-14 28755, 2019

10:49 AM
pristine__

but we have 129124 distinct recording_msids

2019-10-14 28708, 2019

10:50 AM
alastairp

ruaok: sounds good. thanks

2019-10-14 28711, 2019

10:50 AM
pristine__

so aorund 600 are duplicate

2019-10-14 28723, 2019

10:50 AM
pristine__

it is because of the query we use

2019-10-14 28744, 2019

10:50 AM
pristine__

the ideal way is to only fetch the distinct recording_msids

2019-10-14 28749, 2019

10:50 AM
pristine__

and then perform a join

2019-10-14 28758, 2019

10:50 AM
pristine__

which can be expensive

2019-10-14 28705, 2019

10:51 AM
ruaok starts digesting pristine__'s thread

2019-10-14 28705, 2019

10:51 AM
pristine__

as I see....

2019-10-14 28729, 2019

10:51 AM
pristine__

Spark needs lot of optimization in joins

2019-10-14 28740, 2019

10:51 AM
pristine__

but

2019-10-14 28704, 2019

10:52 AM
pristine__

primary key has to be distinct, according to basic norms of dbms...

2019-10-14 28722, 2019

10:52 AM
pristine__

so I think we should do something about it

2019-10-14 28731, 2019

10:52 AM
pristine__

we can optimize the join later

2019-10-14 28749, 2019

10:52 AM
pristine__

but data accuracy is a must

2019-10-14 28752, 2019

10:52 AM
pristine__

.

2019-10-14 28720, 2019

10:53 AM
ruaok

so, lots of things to address here.

2019-10-14 28725, 2019

10:53 AM
pristine__

:)

2019-10-14 28752, 2019

10:53 AM
ruaok

first off, messybrainz is intended to be messy. it has tons of duplicates and loads of crap.

2019-10-14 28738, 2019

10:54 AM
ruaok

which is why it is key to used MBIDs, rather than MSIDs for the collab filtering stuff -- as we discussed over the summer.

2019-10-14 28700, 2019

10:55 AM
pristine__

we will need the mapping for that. true

2019-10-14 28705, 2019

10:55 AM
ruaok

which is why the MSID <-> MBID mapping is so important.

2019-10-14 28735, 2019

10:55 AM
ruaok

and you were given the instructions to proceed with these flimsy assumptions, hoping that we would have a mapping before too long that would allow us to swap out the MSIDs for MBIDs.

2019-10-14 28724, 2019

10:56 AM
ruaok

so, how does that change this conversation?

2019-10-14 28719, 2019

10:57 AM
ruaok

instead of worrying about querying MSIDs in the best way possible, we should instead think about how to translate from MSID -> MBID as one of the precursor steps to the CF algorithm.

2019-10-14 28722, 2019

10:57 AM
pristine__

I know. Just wanted to tell you what all we need to look upon. We discussed the that we need a mapping, yup, so here are the cases that are a part of it. Thought it would be good to tell you the specifics :)

2019-10-14 28746, 2019

10:57 AM
ruaok

thank you for that -- those specifics give me nightmares. :)

2019-10-14 28759, 2019

10:57 AM
pristine__

oh. sorry :)

2019-10-14 28706, 2019

10:58 AM
pristine__

umm...

2019-10-14 28712, 2019

10:58 AM
pristine__

so ya...the mapping

2019-10-14 28715, 2019

10:58 AM
ruaok

not your fault -- this just the nature of messybrainz. :)

2019-10-14 28731, 2019

10:58 AM
alastairp

out of interest, how much data in listenbrainz comes with some other id (spotify or mbid?)

2019-10-14 28732, 2019

10:58 AM
ruaok

what I can do is work on a mapping this week that would be very... preliminary.

2019-10-14 28744, 2019

10:58 AM
pristine__

once we have it, I can start on it.

2019-10-14 28746, 2019

10:58 AM
ruaok

alastairp: I don't have an answer for you yet.

2019-10-14 28750, 2019

10:58 AM
pristine__

no hurry :)

2019-10-14 28753, 2019

10:58 AM
ruaok

pristine__: exactly.

2019-10-14 28709, 2019

10:59 AM
ruaok

so, for now assume that you will get three tables:

2019-10-14 28728, 2019

10:59 AM
ruaok

msid_mbid_artist_mapping (msid, mbid)

2019-10-14 28736, 2019

10:59 AM
pristine__

I was just querying data to understand the specifics. MBIDs are too less. yeah

2019-10-14 28740, 2019

10:59 AM
ruaok

msid_mbid_recording_mapping (msid, mbid)

2019-10-14 28752, 2019

10:59 AM
ruaok

and then one for release, I think.

2019-10-14 28704, 2019

11:00 AM
pristine__

yup

2019-10-14 28727, 2019

11:00 AM
pristine__

sounds good.

2019-10-14 28728, 2019

11:00 AM
ruaok

so, can you start working on a new translation layer as another pre-process step to the CF alg?

2019-10-14 28712, 2019

11:01 AM
pristine__

ummm.....yes. I was working on some cleaning stuff so that new people don't have a hard time, but yeah can start with it

2019-10-14 28744, 2019

11:01 AM
pristine__

and hopefully will start with the tests soon :)

2019-10-14 28749, 2019

11:01 AM
pristine__

so..

2019-10-14 28734, 2019

11:02 AM
pristine__

I won't open any PR on correcting the primary key stuff

2019-10-14 28748, 2019

11:02 AM
pristine__

mapping will solve the problem :)

2019-10-14 28733, 2019

11:03 AM
ruaok

well, it will make things worse before it makes them better, but yes this is the right path forward.

2019-10-14 28757, 2019

11:03 AM
pristine__

lol. I understand.

2019-10-14 28716, 2019

11:04 AM
pristine__

but the mapping can and would give us better results

2019-10-14 28744, 2019

11:04 AM
pristine__

like for instance we will not be using *artist-name* anymore.

2019-10-14 28748, 2019

11:04 AM
pristine__

so ya....nice

2019-10-14 28753, 2019

11:04 AM
ruaok

it *should*. but the mapping will be sparse initially, so fewer results.

2019-10-14 28712, 2019

11:05 AM
ruaok

yes, exactly. something else, not your doing, will use artist names.

2019-10-14 28726, 2019

11:05 AM
ruaok

and the blame will go on my shoulder. something that reosarevok can pick on me for the next decade. :)

2019-10-14 28757, 2019

11:05 AM
pristine__

If we get *nice* fewer results, that will like a very good news, will pave path for the rest of it.

2019-10-14 28700, 2019

11:06 AM
pristine__

lol

2019-10-14 28711, 2019

11:06 AM
reosarevok

Yaaaaaay

2019-10-14 28736, 2019

11:06 AM
zas

I'm upgrading discourse