#metabrainz

/

0:25 AM
jesus2099 joined the channel

2021-07-01 18240, 2021

0:51 AM
jesus2099 has quit

2021-07-01 18222, 2021

1:05 AM
ritiek has quit

2021-07-01 18220, 2021

1:25 AM
ritiek joined the channel

2021-07-01 18220, 2021

2:54 AM
ritiek has quit

2021-07-01 18258, 2021

3:41 AM
BrainzGit

[bookbrainz-site] 14akashgp09 opened pull request #659 (03series-entity…series-delete-handler): feat(series): deletion handler https://github.com/bookbrainz/bookbrainz-site/pul…

2021-07-01 18229, 2021

3:43 AM
BrainzGit

[listenbrainz-server] 14jdaok opened pull request #1538 (03master…Pinned-Recording-Modal): Pinned recording modal https://github.com/metabrainz/listenbrainz-server…

2021-07-01 18231, 2021

6:34 AM
texke has quit

2021-07-01 18234, 2021

6:39 AM
texke joined the channel

2021-07-01 18239, 2021

7:05 AM
d4rkie joined the channel

2021-07-01 18248, 2021

7:07 AM
d4rk-ph0enix has quit

2021-07-01 18212, 2021

7:50 AM
ZaphodBeeblebrox joined the channel

2021-07-01 18212, 2021

7:50 AM
ZaphodBeeblebrox has quit

2021-07-01 18212, 2021

7:50 AM
ZaphodBeeblebrox joined the channel

2021-07-01 18247, 2021

7:51 AM
akashgp09 joined the channel

2021-07-01 18219, 2021

7:53 AM
CatQuest has quit

2021-07-01 18224, 2021

8:14 AM
ruaok

moooin!

2021-07-01 18232, 2021

8:14 AM
ruaok

time or the accounting gualg for me...

2021-07-01 18219, 2021

8:16 AM
ruaok

zas: alastairp: monkey: yvanzo: invoices please

2021-07-01 18202, 2021

8:33 AM
Zastai joined the channel

2021-07-01 18229, 2021

8:34 AM
alastairp

lucifer: great catch

2021-07-01 18235, 2021

8:34 AM
alastairp

ruaok: on the way

2021-07-01 18250, 2021

8:35 AM
alastairp

lucifer: right, I was aware that they were going to be different. So the trick now is to work out exactly where they're different. unfortunately part of this might involve debugging internal gaia representations, let me have a quick look to see if I can find a good way to do this

2021-07-01 18210, 2021

8:36 AM
alastairp

I'll be mostly away today if you want to do something else, but we could pick it up tomorrow

2021-07-01 18216, 2021

8:36 AM
lucifer

morning!

2021-07-01 18256, 2021

8:36 AM
lucifer

alastairp: Hi! indeed they can be different but the difference in the accuracy field is huge, 0.89% vs 90%.

2021-07-01 18221, 2021

8:37 AM
alastairp

so one of them is clearly doing the wrong thing, then?

2021-07-01 18226, 2021

8:37 AM
lucifer

(unless its an error in how we display it and its actuall 89%)

2021-07-01 18241, 2021

8:37 AM
alastairp

given that they're off by a factor of 100, that sounds very possible

2021-07-01 18242, 2021

8:37 AM
lucifer

the values is the table are similar though.

2021-07-01 18206, 2021

8:38 AM
lucifer

indeed, the values in the table determine the accuracy right?

2021-07-01 18212, 2021

8:38 AM
alastairp

yes, right

2021-07-01 18220, 2021

8:38 AM
alastairp

sum of (num correct/num possible)

2021-07-01 18227, 2021

8:38 AM
alastairp

or somesuch

2021-07-01 18250, 2021

8:38 AM
lucifer

makes sense, display error probably then. i'll see if i can fix it.

2021-07-01 18204, 2021

8:39 AM
alastairp

ok, so. one other thing now - we should have 2 files representing the model. the gaia one ends in .history, I'm not sure how the sklearn one is

2021-07-01 18207, 2021

8:39 AM
lucifer

then we can work on the bigger problem of making sklearn and gaia converge :)

2021-07-01 18229, 2021

8:39 AM
alastairp

if it's just 1% it could just be an issue of how it split the data randomly when training

2021-07-01 18252, 2021

8:39 AM
alastairp

we have a way of generating highlevel files from lowlevel + model: hl_calc for gaia, and there are some scripts for sklearn

2021-07-01 18211, 2021

8:40 AM
lucifer

i see, let me see if i can find the .history equivalent.

2021-07-01 18212, 2021

8:40 AM
alastairp

so one next trick would be to take another random sample of lowlevel files and run them through bth

2021-07-01 18220, 2021

8:40 AM
alastairp

both

2021-07-01 18228, 2021

8:40 AM
alastairp

and look at the resulting hl files and see how consistent they are

2021-07-01 18232, 2021

8:40 AM
lucifer

also gaia took ~8 mins to run but sklearn almost ~30 mins.

2021-07-01 18258, 2021

8:40 AM
alastairp

gaia uses multi processing I believe, does sklearn?

2021-07-01 18222, 2021

8:41 AM
lucifer

it probably supports it, not sure if we configure it to use it though

2021-07-01 18235, 2021

8:41 AM
alastairp

with default settings, gaia sweeps through about 700 combinations of parameters (different values of C, gamma, preprocessing steps)

2021-07-01 18245, 2021

8:41 AM
alastairp

and each combination is independent

2021-07-01 18258, 2021

8:41 AM
alastairp

sklearn definitely has helpers for this kind of stuff

2021-07-01 18219, 2021

8:42 AM
lucifer

yes makes sense

2021-07-01 18255, 2021

8:44 AM
alastairp

I'm going out for a moment, back soon

2021-07-01 18216, 2021

8:46 AM
lucifer

👍

2021-07-01 18206, 2021

9:28 AM
ritiek joined the channel

2021-07-01 18228, 2021

9:32 AM
ritiek has quit

2021-07-01 18229, 2021

10:37 AM
yyoung

reosarevok: Seems CI is not triggered in PRs?

2021-07-01 18205, 2021

10:38 AM
reosarevok

Huh. Does seem that way

2021-07-01 18212, 2021

10:45 AM
lucifer

https://status.circleci.com/ Github Webhooks are down for CircleCI probably that's why.

2021-07-01 18239, 2021

10:46 AM
lucifer

down for everybody actually https://www.githubstatus.com/

2021-07-01 18214, 2021

10:47 AM
reosarevok

Yeah, that's also probably why we don't have BranzGit messages here

2021-07-01 18254, 2021

10:47 AM
lucifer

yup, makes sense.

2021-07-01 18244, 2021

11:02 AM
lucifer

ruaok: FYI, spark cluster is unusable currently. I am trying to request user similarity based on artists but the executors crash. I have been trying to debug it for some time now.

2021-07-01 18201, 2021

11:03 AM
ruaok

k

2021-07-01 18214, 2021

11:11 AM
BrainzGit

[musicbrainz-server] 14reosarevok opened pull request #2156 (03master…MBS-11756): MBS-11756: Collapse artist roles when there are too many https://github.com/metabrainz/musicbrainz-server/…

2021-07-01 18248, 2021

11:11 AM
yyoung

reosarevok: Selenium tests timed out, can you rerun it? https://ci.metabrainz.org/job/musicbrainz-server/…

2021-07-01 18245, 2021

11:52 AM
Zastai

lucifer: with the .89 vs 90, could it be as simple as a different decimal point leading to issues (e.g. one always using '.', the other using a locale-specific one that might end up being ',')?

2021-07-01 18226, 2021

11:54 AM
lucifer

Zastai: might be possible but i don't think we do localization in AB. i think its rather related to one tool storing it in on a 0 to 1 scale while other directly outputing in %.

2021-07-01 18251, 2021

13:00 PM
Zastai

makes sense

2021-07-01 18254, 2021

13:40 PM
lucifer

ruaok: i think the issue is fixed now. user similarity values based on artists have been generated.

2021-07-01 18216, 2021

13:42 PM
lucifer

and it is causing top similar page to ISE, doing a release should fix it.

2021-07-01 18223, 2021

13:43 PM
lucifer

any other PRs you want to merge ruaok, if not I'll release now.

2021-07-01 18252, 2021

13:44 PM
ruaok

nothing, go for it.

2021-07-01 18208, 2021

13:45 PM
lucifer

👍

2021-07-01 18233, 2021

13:45 PM
BrainzGit

[listenbrainz-server] release 03v-2021-07-01.0 has been published by 14github-actions[bot]: https://github.com/metabrainz/listenbrainz-server…

2021-07-01 18229, 2021

13:50 PM
lucifer

top similar loads but not showing as expected.

2021-07-01 18207, 2021

13:57 PM
lucifer

ruaok: https://beta-api.listenbrainz.org/1/user/rob/simi…

2021-07-01 18235, 2021

13:57 PM
lucifer

global similarity using artists.

2021-07-01 18259, 2021

13:57 PM
akshaaatt[m]

lucifer: Hola!

2021-07-01 18201, 2021

13:59 PM
ruaok

woah, very different results.

2021-07-01 18218, 2021

13:59 PM
lucifer

hi akshaaatt[m]

2021-07-01 18224, 2021

14:00 PM
lucifer

yup, indeed

2021-07-01 18256, 2021

14:01 PM
akshaaatt[m]

lucifer: I was focusing on two things recently, 1) Trying to see if there were an alternate available for our current barcode scanner, and for now the conclusion states that let things be the way they are. 2) I was thinking and designing how to devise the search activities and further

2021-07-01 18217, 2021

14:02 PM
akshaaatt[m]

So I have some good idea regarding 2

2021-07-01 18246, 2021

14:02 PM
akshaaatt[m]

Will start work on a new branch rn do you suggest we focus on the previous PRs first?

2021-07-01 18254, 2021

14:02 PM
akshaaatt[m]

rn or*

2021-07-01 18232, 2021

14:03 PM
akshaaatt[m]

I am comfortable as per your directions :)

2021-07-01 18249, 2021

14:04 PM
Sophist_UK has quit

2021-07-01 18251, 2021

14:05 PM
lucifer

i think either is fine. regarding open PRs, i think its mostly done from your side.

2021-07-01 18252, 2021

14:05 PM
lucifer

i would suggest that we first work on adding settings to hide stuff and the remaining onboarding stuff.

2021-07-01 18209, 2021

14:06 PM
akshaaatt[m]

Right!

2021-07-01 18218, 2021

14:06 PM
akshaaatt[m]

I shall add that to the onboarding PR then?

2021-07-01 18204, 2021

14:07 PM
lucifer

sure onboarding stuff can go to that PR. settings to the webview one methinks?

2021-07-01 18220, 2021

14:09 PM
akshaaatt[m]

Yeah right! That was what I was saying then. We could add the settings as soon as these PRs were dealt with

2021-07-01 18237, 2021

14:09 PM
akshaaatt[m]

But they are just settings I think shouldn't be a problem.

2021-07-01 18250, 2021

14:09 PM
akshaaatt[m]

so I^

2021-07-01 18247, 2021

14:10 PM
lucifer

yup

2021-07-01 18251, 2021

15:03 PM
ruaok

param and/or lucifer: can you please look at the react/javascript portions of #1538?

2021-07-01 18219, 2021

15:05 PM
lucifer

i am not much familiar with it but sure i'll take a look later today.

2021-07-01 18230, 2021

15:05 PM
param

yeah, happy to help

2021-07-01 18235, 2021

15:05 PM
ruaok

ah, sorry. param said he's be able to do it.

2021-07-01 18243, 2021

15:05 PM
ruaok

ok,cool, thanks param,

2021-07-01 18251, 2021

15:06 PM
ruaok

lucifer: #1535 is ready to merge I think. wanna sanity check it so that #1538 can be rebased?

2021-07-01 18222, 2021

15:09 PM
lucifer

one less encounter with js :). thanks param

2021-07-01 18247, 2021

15:09 PM
lucifer

ruaok: sure on it

2021-07-01 18206, 2021

15:10 PM
ruaok

a man after my own ♥️

2021-07-01 18217, 2021

15:17 PM
lucifer

lol XD

2021-07-01 18226, 2021

15:17 PM
lucifer

1535 approved, should i merge?

2021-07-01 18229, 2021

15:17 PM
ruaok

plz!

2021-07-01 18244, 2021

15:17 PM
BrainzGit

[listenbrainz-server] 14amCap1712 merged pull request #1535 (03master…add-blurb-content-limit-to-pin-recordings): Add blurb content limit to pin recordings https://github.com/metabrainz/listenbrainz-server…

2021-07-01 18246, 2021

15:19 PM
ruaok

lucifer: so the listenebrainz outdated dumps email is wrong. it says 464 is 16 days old. all the files are june 26. odd.

2021-07-01 18201, 2021

15:20 PM
ruaok

oh, I see.

2021-07-01 18212, 2021

15:20 PM
lucifer

the dump-id was of 15th.

2021-07-01 18215, 2021

15:20 PM
ruaok

it goes by dump date, not by file date.

2021-07-01 18219, 2021

15:20 PM
lucifer

right

2021-07-01 18245, 2021

15:20 PM
lucifer

i had seen the mail in the morning and checked the cron logs, the full dump is underway currently.

2021-07-01 18218, 2021

15:21 PM
ruaok

maybe the check needs to be extended by a day.

2021-07-01 18240, 2021

15:21 PM
lucifer

yeah i think that makes sense.

2021-07-01 18259, 2021

15:22 PM
BrainzGit

[listenbrainz-server] 14MonkeyDo opened pull request #1539 (03master…monkey-idle-brainzplayer): BrainzPlayer datasources improvements https://github.com/metabrainz/listenbrainz-server…

2021-07-01 18204, 2021

15:25 PM
monkey

Is anyone using test.LB for anything specific? I'd like to deploy this PR ^ so that we can test it live

2021-07-01 18203, 2021

15:26 PM
monkey

It should improve the player, reduce verbosity and make it clear to users they need to connect a music service

2021-07-01 18248, 2021

15:30 PM
BrainzGit

[listenbrainz-server] 14mayhem opened pull request #1540 (03master…extend-outdated-dumps-check): Extend complaining about outdated monthly dumps by one day. https://github.com/metabrainz/listenbrainz-server…

2021-07-01 18214, 2021

15:31 PM
ruaok quietly slinks away

2021-07-01 18225, 2021

15:33 PM
lucifer

monkey: no one is using test.lb currently.

2021-07-01 18243, 2021

15:33 PM
monkey

Okidoke, thanks :)

2021-07-01 18229, 2021

15:34 PM
lucifer

(i know because i had updated it last :))

2021-07-01 18239, 2021

15:36 PM
lucifer

ruaok: that was a really tough one. phew. finally completed reviewing it.

2021-07-01 18256, 2021

15:36 PM
ruaok

lol. soo many characters to descibe a one character fix.

2021-07-01 18210, 2021

15:37 PM
lucifer

😂

2021-07-01 18219, 2021

15:37 PM
ruaok

the test failure seem unrelated.

2021-07-01 18238, 2021

15:37 PM
lucifer

yup, i re-ran tests.

2021-07-01 18257, 2021

15:37 PM
lucifer

the failure is related to the old resource is closed issue probably.

2021-07-01 18233, 2021

15:43 PM
BrainzGit

[listenbrainz-server] 14mayhem merged pull request #1540 (03master…extend-outdated-dumps-check): Extend complaining about outdated monthly dumps by one day. https://github.com/metabrainz/listenbrainz-server…

2021-07-01 18212, 2021

16:04 PM
ruaok

lucifer: the parquet files for the new spark dumps... should they be organized by year folders and then all the listens for a single day in a single file?

2021-07-01 18233, 2021

16:04 PM
ruaok

or however those concepts translate. :)

2021-07-01 18208, 2021

16:05 PM
lucifer

any way would be fine, we are going to control to read logic as well.

2021-07-01 18250, 2021

16:05 PM
lucifer

let me check a few things and see if one way is better than others.

2021-07-01 18255, 2021

16:05 PM
ruaok

ok

2021-07-01 18201, 2021

16:09 PM
lucifer

there's LB-722

2021-07-01 18202, 2021

16:09 PM
BrainzBot

LB-722: Restructure data in hdfs to allow easier updates https://tickets.metabrainz.org/browse/LB-722

2021-07-01 18230, 2021

16:09 PM
lucifer

and another thing is that hdfs is less efficient when reading small files.

2021-07-01 18213, 2021

16:10 PM
lucifer

structuring by years makes sense to me, below that we should take a look into what's the average size of a day's listens file.

2021-07-01 18217, 2021

16:11 PM
lucifer

iirc, i have seen many 2MB files in spark. hdfs default block size is 128 MB so a lot of space is wasted.

2021-07-01 18206, 2021

16:12 PM
lucifer

trying to keep the usual file size between 64-128 MB would be good.

2021-07-01 18240, 2021

16:12 PM
lucifer

(size of each paraquet file)

2021-07-01 18259, 2021

16:24 PM
ruaok

what if we just write files with monotonically increasing numbers, all just under 128MB?

2021-07-01 18212, 2021

16:25 PM
ruaok

numbers as filenames, that is.

2021-07-01 18224, 2021

16:25 PM
ruaok

should they be in sorted order?

2021-07-01 18236, 2021

16:26 PM
lucifer

i think that would work. we could ask spark to read the files in descending order and stop when an entire block is out of range of the time period.

2021-07-01 18213, 2021

16:27 PM
lucifer

yes sorted would be good, we can minimize loading unneeded files this way.

2021-07-01 18205, 2021

16:28 PM
ruaok

ok, hopefully I can get that started tomorrow.

2021-07-01 18243, 2021

16:28 PM
lucifer

cool, thanks!