lucifer: right, I was aware that they were going to be different. So the trick now is to work out exactly where they're different. unfortunately part of this might involve debugging internal gaia representations, let me have a quick look to see if I can find a good way to do this
I'll be mostly away today if you want to do something else, but we could pick it up tomorrow
lucifer
morning!
alastairp: Hi! indeed they can be different but the difference in the accuracy field is huge, 0.89% vs 90%.
alastairp
so one of them is clearly doing the wrong thing, then?
lucifer
(unless its an error in how we display it and its actuall 89%)
alastairp
given that they're off by a factor of 100, that sounds very possible
lucifer
the values is the table are similar though.
indeed, the values in the table determine the accuracy right?
alastairp
yes, right
sum of (num correct/num possible)
or somesuch
lucifer
makes sense, display error probably then. i'll see if i can fix it.
alastairp
ok, so. one other thing now - we should have 2 files representing the model. the gaia one ends in .history, I'm not sure how the sklearn one is
lucifer
then we can work on the bigger problem of making sklearn and gaia converge :)
alastairp
if it's just 1% it could just be an issue of how it split the data randomly when training
we have a way of generating highlevel files from lowlevel + model: hl_calc for gaia, and there are some scripts for sklearn
lucifer
i see, let me see if i can find the .history equivalent.
alastairp
so one next trick would be to take another random sample of lowlevel files and run them through bth
both
and look at the resulting hl files and see how consistent they are
lucifer
also gaia took ~8 mins to run but sklearn almost ~30 mins.
alastairp
gaia uses multi processing I believe, does sklearn?
lucifer
it probably supports it, not sure if we configure it to use it though
alastairp
with default settings, gaia sweeps through about 700 combinations of parameters (different values of C, gamma, preprocessing steps)
and each combination is independent
sklearn definitely has helpers for this kind of stuff
Yeah, that's also probably why we don't have BranzGit messages here
lucifer
yup, makes sense.
ruaok: FYI, spark cluster is unusable currently. I am trying to request user similarity based on artists but the executors crash. I have been trying to debug it for some time now.
lucifer: with the .89 vs 90, could it be as simple as a different decimal point leading to issues (e.g. one always using '.', the other using a locale-specific one that might end up being ',')?
lucifer
Zastai: might be possible but i don't think we do localization in AB. i think its rather related to one tool storing it in on a 0 to 1 scale while other directly outputing in %.
Zastai
makes sense
lucifer
ruaok: i think the issue is fixed now. user similarity values based on artists have been generated.
and it is causing top similar page to ISE, doing a release should fix it.
any other PRs you want to merge ruaok, if not I'll release now.
lucifer: I was focusing on two things recently, 1) Trying to see if there were an alternate available for our current barcode scanner, and for now the conclusion states that let things be the way they are. 2) I was thinking and designing how to devise the search activities and further
So I have some good idea regarding 2
Will start work on a new branch rn do you suggest we focus on the previous PRs first?
rn or*
I am comfortable as per your directions :)
Sophist_UK has quit
lucifer
i think either is fine. regarding open PRs, i think its mostly done from your side.
i would suggest that we first work on adding settings to hide stuff and the remaining onboarding stuff.
akshaaatt[m]
Right!
I shall add that to the onboarding PR then?
lucifer
sure onboarding stuff can go to that PR. settings to the webview one methinks?
akshaaatt[m]
Yeah right! That was what I was saying then. We could add the settings as soon as these PRs were dealt with
But they are just settings I think shouldn't be a problem.
so I^
lucifer
yup
ruaok
param and/or lucifer: can you please look at the react/javascript portions of #1538?
lucifer
i am not much familiar with it but sure i'll take a look later today.
param
yeah, happy to help
ruaok
ah, sorry. param said he's be able to do it.
ok,cool, thanks param,
lucifer: #1535 is ready to merge I think. wanna sanity check it so that #1538 can be rebased?
Is anyone using test.LB for anything specific? I'd like to deploy this PR ^ so that we can test it live
It should improve the player, reduce verbosity and make it clear to users they need to connect a music service
BrainzGit
[listenbrainz-server] 14mayhem opened pull request #1540 (03master…extend-outdated-dumps-check): Extend complaining about outdated monthly dumps by one day. https://github.com/metabrainz/listenbrainz-serv...
ruaok quietly slinks away
lucifer
monkey: no one is using test.lb currently.
monkey
Okidoke, thanks :)
lucifer
(i know because i had updated it last :))
ruaok: that was a really tough one. phew. finally completed reviewing it.
ruaok
lol. soo many characters to descibe a one character fix.
lucifer
😂
ruaok
the test failure seem unrelated.
lucifer
yup, i re-ran tests.
the failure is related to the old resource is closed issue probably.
BrainzGit
[listenbrainz-server] 14mayhem merged pull request #1540 (03master…extend-outdated-dumps-check): Extend complaining about outdated monthly dumps by one day. https://github.com/metabrainz/listenbrainz-serv...
ruaok
lucifer: the parquet files for the new spark dumps... should they be organized by year folders and then all the listens for a single day in a single file?
or however those concepts translate. :)
lucifer
any way would be fine, we are going to control to read logic as well.
let me check a few things and see if one way is better than others.