lucifer: right, I was aware that they were going to be different. So the trick now is to work out exactly where they're different. unfortunately part of this might involve debugging internal gaia representations, let me have a quick look to see if I can find a good way to do this
2021-07-01 18210, 2021
alastairp
I'll be mostly away today if you want to do something else, but we could pick it up tomorrow
2021-07-01 18216, 2021
lucifer
morning!
2021-07-01 18256, 2021
lucifer
alastairp: Hi! indeed they can be different but the difference in the accuracy field is huge, 0.89% vs 90%.
2021-07-01 18221, 2021
alastairp
so one of them is clearly doing the wrong thing, then?
2021-07-01 18226, 2021
lucifer
(unless its an error in how we display it and its actuall 89%)
2021-07-01 18241, 2021
alastairp
given that they're off by a factor of 100, that sounds very possible
2021-07-01 18242, 2021
lucifer
the values is the table are similar though.
2021-07-01 18206, 2021
lucifer
indeed, the values in the table determine the accuracy right?
2021-07-01 18212, 2021
alastairp
yes, right
2021-07-01 18220, 2021
alastairp
sum of (num correct/num possible)
2021-07-01 18227, 2021
alastairp
or somesuch
2021-07-01 18250, 2021
lucifer
makes sense, display error probably then. i'll see if i can fix it.
2021-07-01 18204, 2021
alastairp
ok, so. one other thing now - we should have 2 files representing the model. the gaia one ends in .history, I'm not sure how the sklearn one is
2021-07-01 18207, 2021
lucifer
then we can work on the bigger problem of making sklearn and gaia converge :)
2021-07-01 18229, 2021
alastairp
if it's just 1% it could just be an issue of how it split the data randomly when training
2021-07-01 18252, 2021
alastairp
we have a way of generating highlevel files from lowlevel + model: hl_calc for gaia, and there are some scripts for sklearn
2021-07-01 18211, 2021
lucifer
i see, let me see if i can find the .history equivalent.
2021-07-01 18212, 2021
alastairp
so one next trick would be to take another random sample of lowlevel files and run them through bth
2021-07-01 18220, 2021
alastairp
both
2021-07-01 18228, 2021
alastairp
and look at the resulting hl files and see how consistent they are
2021-07-01 18232, 2021
lucifer
also gaia took ~8 mins to run but sklearn almost ~30 mins.
2021-07-01 18258, 2021
alastairp
gaia uses multi processing I believe, does sklearn?
2021-07-01 18222, 2021
lucifer
it probably supports it, not sure if we configure it to use it though
2021-07-01 18235, 2021
alastairp
with default settings, gaia sweeps through about 700 combinations of parameters (different values of C, gamma, preprocessing steps)
2021-07-01 18245, 2021
alastairp
and each combination is independent
2021-07-01 18258, 2021
alastairp
sklearn definitely has helpers for this kind of stuff
Yeah, that's also probably why we don't have BranzGit messages here
2021-07-01 18254, 2021
lucifer
yup, makes sense.
2021-07-01 18244, 2021
lucifer
ruaok: FYI, spark cluster is unusable currently. I am trying to request user similarity based on artists but the executors crash. I have been trying to debug it for some time now.
lucifer: with the .89 vs 90, could it be as simple as a different decimal point leading to issues (e.g. one always using '.', the other using a locale-specific one that might end up being ',')?
2021-07-01 18226, 2021
lucifer
Zastai: might be possible but i don't think we do localization in AB. i think its rather related to one tool storing it in on a 0 to 1 scale while other directly outputing in %.
2021-07-01 18251, 2021
Zastai
makes sense
2021-07-01 18254, 2021
lucifer
ruaok: i think the issue is fixed now. user similarity values based on artists have been generated.
2021-07-01 18216, 2021
lucifer
and it is causing top similar page to ISE, doing a release should fix it.
2021-07-01 18223, 2021
lucifer
any other PRs you want to merge ruaok, if not I'll release now.
lucifer: I was focusing on two things recently, 1) Trying to see if there were an alternate available for our current barcode scanner, and for now the conclusion states that let things be the way they are. 2) I was thinking and designing how to devise the search activities and further
2021-07-01 18217, 2021
akshaaatt[m]
So I have some good idea regarding 2
2021-07-01 18246, 2021
akshaaatt[m]
Will start work on a new branch rn do you suggest we focus on the previous PRs first?
2021-07-01 18254, 2021
akshaaatt[m]
rn or*
2021-07-01 18232, 2021
akshaaatt[m]
I am comfortable as per your directions :)
2021-07-01 18249, 2021
Sophist_UK has quit
2021-07-01 18251, 2021
lucifer
i think either is fine. regarding open PRs, i think its mostly done from your side.
2021-07-01 18252, 2021
lucifer
i would suggest that we first work on adding settings to hide stuff and the remaining onboarding stuff.
2021-07-01 18209, 2021
akshaaatt[m]
Right!
2021-07-01 18218, 2021
akshaaatt[m]
I shall add that to the onboarding PR then?
2021-07-01 18204, 2021
lucifer
sure onboarding stuff can go to that PR. settings to the webview one methinks?
2021-07-01 18220, 2021
akshaaatt[m]
Yeah right! That was what I was saying then. We could add the settings as soon as these PRs were dealt with
2021-07-01 18237, 2021
akshaaatt[m]
But they are just settings I think shouldn't be a problem.
2021-07-01 18250, 2021
akshaaatt[m]
so I^
2021-07-01 18247, 2021
lucifer
yup
2021-07-01 18251, 2021
ruaok
param and/or lucifer: can you please look at the react/javascript portions of #1538?
2021-07-01 18219, 2021
lucifer
i am not much familiar with it but sure i'll take a look later today.
2021-07-01 18230, 2021
param
yeah, happy to help
2021-07-01 18235, 2021
ruaok
ah, sorry. param said he's be able to do it.
2021-07-01 18243, 2021
ruaok
ok,cool, thanks param,
2021-07-01 18251, 2021
ruaok
lucifer: #1535 is ready to merge I think. wanna sanity check it so that #1538 can be rebased?
Is anyone using test.LB for anything specific? I'd like to deploy this PR ^ so that we can test it live
2021-07-01 18203, 2021
monkey
It should improve the player, reduce verbosity and make it clear to users they need to connect a music service
2021-07-01 18248, 2021
BrainzGit
[listenbrainz-server] 14mayhem opened pull request #1540 (03master…extend-outdated-dumps-check): Extend complaining about outdated monthly dumps by one day. https://github.com/metabrainz/listenbrainz-server…
2021-07-01 18214, 2021
ruaok quietly slinks away
2021-07-01 18225, 2021
lucifer
monkey: no one is using test.lb currently.
2021-07-01 18243, 2021
monkey
Okidoke, thanks :)
2021-07-01 18229, 2021
lucifer
(i know because i had updated it last :))
2021-07-01 18239, 2021
lucifer
ruaok: that was a really tough one. phew. finally completed reviewing it.
2021-07-01 18256, 2021
ruaok
lol. soo many characters to descibe a one character fix.
2021-07-01 18210, 2021
lucifer
😂
2021-07-01 18219, 2021
ruaok
the test failure seem unrelated.
2021-07-01 18238, 2021
lucifer
yup, i re-ran tests.
2021-07-01 18257, 2021
lucifer
the failure is related to the old resource is closed issue probably.
2021-07-01 18233, 2021
BrainzGit
[listenbrainz-server] 14mayhem merged pull request #1540 (03master…extend-outdated-dumps-check): Extend complaining about outdated monthly dumps by one day. https://github.com/metabrainz/listenbrainz-server…
2021-07-01 18212, 2021
ruaok
lucifer: the parquet files for the new spark dumps... should they be organized by year folders and then all the listens for a single day in a single file?
2021-07-01 18233, 2021
ruaok
or however those concepts translate. :)
2021-07-01 18208, 2021
lucifer
any way would be fine, we are going to control to read logic as well.
2021-07-01 18250, 2021
lucifer
let me check a few things and see if one way is better than others.